crux.models package

Submodules

crux.models.dataset module

Module contains Dataset model.

class crux.models.dataset.Dataset(id=None, owner_identity_id=None, contact_identity_id=None, name=None, description=None, website=None, created_at=None, modified_at=None, connection=None, raw_response=None, tags=None)

Bases: crux.models.model.CruxModel

Dataset Model.

add_label(label_key, label_value)

Adds label to Dataset.

Parameters:
  • label_key (str) – Label Key for Dataset.
  • label_value (str) – Label Value for Dataset.
Returns:

True if labels are added.

Return type:

bool

add_permission(identity_id='_subscribed_', permission='Read', resource_paths=None, resource_objects=None, resource_ids=None)

Adds permission to all or specific Dataset resources.

Parameters:
  • identity_id (str) – Identity Id to be set. Defaults to _subscribed_.
  • permission (str) – Permission to be set. Defaults to Read.
  • resource_paths (list of str) – List of resource paths on which the permission should be applied. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.
  • resource_objects (list of crux.models.Resource) – List of resource objects on which the permission should be applied. Overrides resource_paths. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.
  • resource_ids (list of str) – List of resource ids on which permission should be applied. Overrides resource_pathss and resource_objects. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.
Returns:

True if permission is applied.

Return type:

bool

contact_identity_id

Gets the Contact Identity ID.

Type:str
create_file(path, tags=None, description=None)

Creates File resource in Dataset.

Parameters:
  • path (str) – Path of the file resource.
  • tags (list of str) – Tags of the file resource. Defaults to None.
  • description (str) – Description of the file resource. Defaults to None.
Returns:

File Object.

Return type:

crux.models.File

create_folder(path, folder='/', tags=None, description=None)

Creates Folder resource in Dataset.

Parameters:
  • path (str) – Path of the Folder resource.
  • folder (str) – Parent folder of the Folder resource. Defaults to /.
  • tags (list of str) – Tags of the Folder resource. Defaults to None.
  • description (str) – Description of the Folder resource. Defaults to None.
Returns:

Folder Object.

Return type:

crux.models.Folder

create_query(path, config, tags=None, description=None)

Creates Query resource in Dataset.

Parameters:
  • path (str) – Query resource Path.
  • config (dict) – Query configuration.
  • tags (list of str) – Tags of the Query resource. Defaults to None.
  • description (str) – Description of the Query resource. Defaults to None.
Returns:

Query Object.

Return type:

crux.models.Query

create_table(path, config, tags=None, description=None)

Creates Table resource in Dataset.

Parameters:
  • path (str) – Table resource Path.
  • config (dict) – Table Schema Configuration.
  • tags (list of str) – Tags of the Table resource. Defaults to None.
  • description (str) – Description of the Table resource. Defaults to None.
Returns:

Table Object

Return type:

crux.models.Table

created_at

Gets the Dataset created_at.

Type:str
delete()

Deletes the dataset.

Returns:True if dataset is deleted.
Return type:bool
delete_label(label_key)

Deletes label from Dataset.

Parameters:label_key (str) – Label Key for Dataset.
Returns:True if labels are deleted.
Return type:bool
delete_permission(identity_id='_subscribed_', permission='Read', resource_paths=None, resource_objects=None, resource_ids=None)

Method which deletes permission from all or specific Dataset resources.

Parameters:
  • identity_id (str) – Identity Id for the deletion. Defaults to _subscribed_
  • permission (str) – Permission for the deletion. Defaults to Read
  • resource_paths (list of crux.models.Resource) – List of resource path from which the permission should be deleted. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.
  • resource_objects (list of crux.models.Resource) – List of resource objects from which the permission should be deleted. Overrides resource_paths. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.
  • resource_ids (list of crux.models.Resource) – List of resource ids from which the permission should be deleted. Overrides resource_paths and resource_objects. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.
Returns:

True if it is able to delete the permission.

Return type:

bool

description

Gets the Dataset Description.

Type:str
download_files(folder, local_path)

Downloads the resources recursively.

Parameters:
  • folder (str) – Crux Dataset Folder from where the file resources should be recursively downloaded.
  • local_path (str) – Local OS Path where the file resources should be downloaded.
Returns:

List of location of download files.

Return type:

list (str)

Raises:
  • ValueError – If Folder or local_path is None.
  • OSError – If local_path is an invalid directory location.
find_resources_by_label(predicates, max_per_page=1000)

Method which searches the resouces for given labels in Dataset

Each predicate can be either:

  • Lexicographical equal
  • Lexicographical less than
  • Lexicographical less than or equal to
  • Lexicographical greater than
  • Lexicographical greater than or equal to
  • A list of OR predicates
  • A list of AND predicates
predicates = [
    {"op": "eq", "key": "key1", "val": "abcd"},
    {"op": "ne", "key": "key1", "val": "zzzz"},
    {"op": "lt", "key": "key1", "val": "abd"},
    {"op": "gt", "key": "key1", "val": "abc"},
    {"op": "lte", "key": "key1", "val": "abd"},
    {"op": "gte", "key": "key1", "val": "abc"},
    {"op": "or", "in":
        [
            {"op": "eq", "key": "key1", "val": "abcd"},
            # more OR predicates...
        ]
    },
    {"op": "and", "in":
        [
            {"op": "eq", "key": "key1", "val": "abcd"},
            # more AND predicates...
        ]
    }
]
Parameters:
  • predicates (list of dict) – List of dictionary predicates for finding resources.
  • max_per_page (int) – Pagination limit. Defaults to 1000.
Returns:

List of resource matching the query parameters.

Return type:

list (crux.models.Resource)

Example

from crux import Crux

conn = Crux()
dataset_object = conn.get_dataset(id="dataset_id")
predicates=[
    {"op":"eq","key":"test_label1","val":"test_value1"}
]
resource_objects = dataset_object.find_resources_by_label(
    predicates=predicates
)
classmethod from_dict(a_dict)

Transforms Dataset Dictionary to Dataset object.

Parameters:a_dict (dict) – Dataset Dictionary.
Returns:Dataset Object.
Return type:crux.models.Dataset
get_file(path)

Gets the File resource object.

Parameters:path (str) – File resource path.
Returns:File Object.
Return type:crux.models.File
get_folder(path)

Gets the Folder resource object.

Parameters:path (str) – Folder resource path.
Returns:Folder Object.
Return type:crux.models.Folder
get_label(label_key)

Gets label value of Dataset.

Parameters:label_key (str) – Label Key for Dataset.
Returns:Label Object.
Return type:crux.models.Label
get_query(path)

Gets the Query resource object.

Parameters:path (str) – Query resource path.
Returns:Query Object.
Return type:crux.models.Query
get_stitch_job(job_id)

Stitch Job Details.

Parameters:job_id (str) – Job ID of the Stitch Job.
Returns:StitchJob object.
Return type:crux.models.StitchJob
get_table(path)

Method which gets the Table resource

Parameters:path – Table resource path
Returns:Table Object
Return type:crux.models.Table
id

Gets the Dataset ID.

Type:str
list_files(sort=None, folder='/', offset=0, limit=100)

Lists the files.

Parameters:
  • sort (str) – Sets whether to sort or not. Defaults to None.
  • folder (str) – Folder for which resource should be listed. Defaults to /.
  • offset (int) – Sets the offset. Defaults to 0.
  • limit (int) – Sets the limit. Defaults to 100.
Returns:

List of File objects.

Return type:

list (crux.models.File)

list_resources(folder='/', offset=0, limit=1, include_folders=False, sort=None)

Lists the resources in Dataset.

Parameters:
  • folder (str) – Folder for which resource should be listed. Defaults to /.
  • offset (int) – Sets the offset. Defaults to 0.
  • limit (int) – Sets the limit. Defaults to 1.
  • include_folders (bool) – Sets whether to include folders or not. Defaults to False.
  • sort (str) – Sets whether to sort or not. Defaults to None.
Returns:

List of File resource objects.

Return type:

list (crux.models.Resource)

load_table_from_file(source_file, dest_table, append=False)

Loads table from file resource.

Parameters:
  • source_file (str or file) – Source File Path in string or File Object.
  • dest_table (str or crux.models.Table) – Destination File Path in string or Table Object.
  • append (bool) – Sets whether to append to existing table. Defaults to False.
Returns:

LoadJob Object.

Return type:

crux.models.LoadJob

Raises:

TypeError – If source_file or dest_table is not file or string object.

modified_at

Gets the Dataset modified_at.

Type:str
name

Gets the Dataset Name.

Type:str
owner_identity_id

Gets the Owner Identity ID.

Type:str
provenance

Compute or Get the provenance.

Type:str
stitch(source_resources, destination_resource, labels=None, tags=None, description=None)

Method which stitches multiple Avro resources into single Avro resource

Parameters:
  • source_resources (list of str or file) – List of resource paths which are to be stitched.
  • destination_resource (str) – Resource Path to load the stitched output
  • labels (dict) – Key/Value labels that should be applied to stitched resource
  • tags (list of str) – List of tags to be applied on destination resource. Taken into consideration if resource is required to be created.
  • description (str) – Description to be applied created destination. Taken into consideration if resource is required to be created.
Returns:

File object of destination resource.

Job ID for background running job.

Return type:

tuple (crux.models.File, str)

tags

Gets the tags.

Type:str
to_dict()

Transforms Dataset object to Dataset Dictionary.

Returns:Dataset Dictionary.
Return type:dict
update(name=None, description=None, tags=None)

Updates the metadata of dataset.

Parameters:
  • name (str) – Name of the dataset. Defaults to None.
  • description (str) – Description of the dataset. Defaults to None.
  • tags (list of str) – List of tags. Defaults to None.
Returns:

True, if dataset is updated.

Return type:

bool

Raises:
  • ValueError – It is raised if name, description or tags are unset.
  • TypeError – It is raised if tags is not of type list.
upload_file(src, dest, media_type=None, description=None, tags=None)

Uploads the File.

Parameters:
  • src (str or file) – Local OS path whose content is to be uploaded to file resource.
  • dest (str) – File resource path.
  • media_type (str) – Content type of the file. Defaults to None.
  • description (str) – Description of the file. Defaults to None.
  • tags (list of str) – Tags to be attached to the file resource.
Returns:

File Object.

Return type:

crux.models.File

upload_files(local_path, folder, media_type=None, description=None, tags=None)

Uploads the resources recursively.

Parameters:
  • local_path (str) – Local OS Path from where the file resources should be uploaded.
  • media_type (str) – Content Types of File resources to be uploaded. Defaults to None.
  • folder (str) – Crux Dataset Folder where file resources should be recursively uploaded.
  • description (str) – Description to be set on uploaded resources. Defaults to None.
  • tags (list of str) – Tags to be set on uploaded resources. Defaults to None.
Returns:

List of uploaded file objects.

Return type:

list (crux.models.File)

Raises:
  • ValueError – If folder or local_path is None.
  • OSError – If local_path is an invalid directory location.
upload_query(sql_file, path, description=None, tags=None)

Uploads the Query File.

Parameters:
  • path (str) – Query resource path.
  • sql_file (str) – Local OS SQL file to be uploaded as query resource.
  • description (str) – Description for the Query resource. Defaults to None.
  • tags (list of str) – Tags for the Query resource. Defaults to None.
Returns:

Query Object.

Return type:

crux.models.Query

website

Gets the Dataset Website.

Type:str

crux.models.file module

Module contains File model.

class crux.models.file.File(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

File Model.

download(dest, chunk_size=10485760)

Downloads the file resource.

Parameters:
  • dest (str or file) – Local OS path at which file resource will be downloaded.
  • chunk_size (int) – Number of bytes to be read in memory.
Returns:

True if it is downloaded.

Return type:

bool

Raises:

TypeError – If dest is not a file like or string type.

iter_content(chunk_size=10485760)

Streams the file resource.

Parameters:chunk_size (int) – Chunk Size for the stream.
Yields:bytes – Bytes of file resource.
Raises:ValueError – If chunk_size is not multiple of 256 KiB.
to_dict()

Transforms File object to File Dictionary.

Returns:File Dictionary.
Return type:dict
upload(src, media_type=None)

Uploads the content to empty file resource.

Parameters:
  • src (str or file) – Local OS path whose content is to be uploaded.
  • media_type (str) – Content type of the file. Defaults to None.
Returns
File: File model object.
Raises:TypeError – If src type is invalid.

crux.models.folder module

Module contains File model.

class crux.models.folder.Folder(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Folder Model.

add_permission(identity_id='_subscribed_', permission='Read', recursive=False)

Adds permission to the Folder resource.

Parameters:
  • identity_id (str) – Identity Id to be set. Defaults to _subscribed_.
  • permission (str) – Permission to be set. Defaults to Read.
  • recursive (bool) – If recursive is set to True, it will recursive apply permission to all resources under the folder resource.
Returns:

If recursive is set then it returns True.

If recursive is unset then it returns Permission object.

Return type:

bool or crux.models.Permission

delete_permission(identity_id='_subscribed_', permission='Read', recursive=False)

Deletes permission from Folder resource.

Parameters:
  • identity_id (str) – Identity Id for the deletion. Defaults to _subscribed_.
  • permission (str) – Permission for deletion. Defaults to Read.
  • recursive (bool) – If recursive is set to True, it will recursively delete permission from all resources under the folder resource. Defaults to False.
Returns:

True if it is able to delete it.

Return type:

bool

to_dict()

Transforms Folder object to Folder Dictionary.

Returns:Folder Dictionary.
Return type:dict

crux.models.identity module

Module contains Identity model.

class crux.models.identity.Identity(identity_id=None, parent_identity_id=None, description=None, company_name=None, first_name=None, last_name=None, role=None, phone=None, email=None, type=None, website=None, landing_page=None, connection=None, raw_response=None)

Bases: crux.models.model.CruxModel

Identity Model.

company_name

Gets the Company name.

Type:str
description

Gets the Description.

Type:str
email

Gets the Email.

Type:str
first_name

Gets the First name.

Type:str
classmethod from_dict(a_dict)

Transforms Identity Dictionary to Identity object.

Parameters:a_dict (dict) – Identity Dictionary.
Returns:Identity Object.
Return type:crux.models.Identity
identity_id

Gets the Identity Id.

Type:str
landing_page

Gets the Landing Page.

Type:str
last_name

Gets the Last name.

Type:str
parent_identity_id

Gets the Parent Identity Id.

Type:str
phone

Gets the phone.

Type:str
role

Gets the Role.

Type:str
to_dict()

Transforms Identity object to Identity Dictionary.

Returns:Identity Dictionary.
Return type:dict
type

Gets the Type.

Type:str
website

Gets the Website.

Type:str

crux.models.job module

Module contains AbstractJob, Job, LoadJob Model.

class crux.models.job.AbstractJob

Bases: crux.models.model.CruxModel

AbstractJob Model.

class crux.models.job.Job(job_id=None, status=None, statistics=None, connection=None)

Bases: crux.models.job.AbstractJob

Job Model.

classmethod from_dict(a_dict)

Transforms Job Dictionary to Job object.

Parameters:a_dict (dict) – Job Dictionary.
Returns:Job Object.
Return type:crux.models.Job
class crux.models.job.Load(input_files=None, input_file_bytes=None, output_rows=None, output_bytes=None, bad_records=None)

Bases: object

Job Load Model

classmethod from_dict(a_dict)

Transforms Job Load Dictionary to Job Load object.

Parameters:a_dict (dict) – Job Load Dictionary.
Returns:Job Load Object.
Return type:crux.models.job.Load
class crux.models.job.LoadJob(job_id=None, job_url=None)

Bases: crux.models.job.AbstractJob

LoadJob Model.

classmethod from_dict(a_dict)

Transforms LoadJob Dictionary to LoadJob object.

Parameters:a_dict (dict) – LoadJob Dictionary.
Returns:LoadJob Object.
Return type:crux.models.LoadJob
job_id

Gets the Job Id.

Type:str
job_url

Gets the Job URL.

Type:str
class crux.models.job.Statistics(creation_time=None, start_time=None, end_time=None, load=None)

Bases: object

Job Statistic Model.

classmethod from_dict(a_dict)

Transforms Job Statistics Dictionary to Job Statistics object.

Parameters:a_dict (dict) – Job Statistics Dictionary.
Returns:Job Statistics Object.
Return type:crux.models.job.Statistics
class crux.models.job.Status(state=None)

Bases: object

Job Status Model.

classmethod from_dict(a_dict)

Transforms Job Status Dictionary to Job Status object.

Parameters:a_dict (dict) – Job Status Dictionary.
Returns:Job Status Object.
Return type:crux.models.job.Status
class crux.models.job.StitchJob(job_id=None, status=None)

Bases: crux.models.job.AbstractJob

Stitch Job Model.

classmethod from_dict(a_dict)

Transforms Stitch Job Dictionary to Stitch Job object.

Parameters:a_dict (dict) – Stitch Job Dictionary.
Returns:Stitch Job Object.
Return type:crux.models.job.StitchJob

crux.models.label module

Module contains Label model.

class crux.models.label.Label(label_key=None, label_value=None)

Bases: crux.models.model.CruxModel

Label Model.

classmethod from_dict(a_dict)

Transforms Label Dictionary to Label object.

Parameters:a_dict (dict) – Label Dictionary.
Returns:Label Object.
Return type:crux.models.Label
to_dict()

Transforms Label object to Label Dictionary.

Returns:Label Dictionary.
Return type:dict

crux.models.model module

Module defines abstract CruxModel.

class crux.models.model.CruxModel

Bases: object

Absract Crux Model.

to_dict()

Absract to_dict method.

to_str()

Absract to_str method.

crux.models.permission module

Module contains Permission model.

class crux.models.permission.Permission(target_id=None, identity_id=None, permission_name=None)

Bases: crux.models.model.CruxModel

Permission Model.

classmethod from_dict(a_dict)

Transforms Dataset Dictionary to Dataset object.

Parameters:a_dict (dict) – Dataset Dictionary.
Returns:Permission Object.
Return type:crux.models.Permission
identity_id

Gets the Identity ID.

Type:str
permission_name

Gets the Permission Name.

Type:str
target_id

Gets the Target ID.

Type:str
to_dict()

Transforms Dataset object to Dataset Dictionary.

Returns:Dataset Dictionary.
Return type:dict

crux.models.query module

Module contains Query model.

class crux.models.query.Query(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Query Model.

download(dest, format='csv', params=None)

Method which streams the Query

Parameters:
  • dest (str) – Local OS path at which resource will be downloaded.
  • media_type (str) – Output format of the query. Defaults to csv.
  • params (dict) – Run parameters. Defaults to None.
Returns:

True if it is downloaded.

Return type:

bool

run(format='csv', params=None, chunk_size=10485760, decode_unicode=False)

Method which streams the Query

Parameters:
  • format (str) – Output format of the query. Defaults to csv.
  • params (dict) – Run parameters. Defaults to None.
  • chunk_size (int) – Chunk Size for the stream
  • decode_unicode (bool) – If decode_unicode is True,content will be decoded using the best available encoding based on the response. Defaults to False.
Yields:

bytes – Bytes of content.

Raises:

ValueError – If chunk size is not multiple of 256 KiB.

to_dict()

Transforms Query object to Query dictionary.

Returns:Query dictionary.
Return type:dict

crux.models.resource module

Module contains Resource model.

class crux.models.resource.MediaType

Bases: enum.Enum

MediaType Enumeration Model.

AVRO = 'avro/binary'
CSV = 'text/csv'
JSON = 'application/json'
NDJSON = 'application/x-ndjson'
PARQUET = 'application/parquet'
detect = <bound method MediaType.detect of <enum 'MediaType'>>
class crux.models.resource.Resource(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.model.CruxModel

Resource Model.

add_label(label_key, label_value)

Adds label to Resource.

Parameters:
  • label_key (str) – Label Key for Resource.
  • label_value (str) – Label Value for Resource.
Returns:

True if label is added, False otherwise.

Return type:

bool

add_permission(identity_id='_subscribed_', permission='Read')

Adds permission to the resource.

Parameters:
  • identity_id – Identity Id to be set. Defaults to _subscribed_.
  • permission – Permission to be set. Defaults to Read.
Returns:

Permission Object.

Return type:

crux.models.Permission

as_of

Gets the as_of.

Type:str
config

Gets the config.

Type:str
created_at

Gets created_at.

Type:str
dataset_id

Gets the Dataset ID.

Type:str
delete()

Deletes Resource from Dataset.

Returns:True if it is deleted.
Return type:bool
delete_label(label_key)

Deletes label from Resource.

Parameters:label_key (str) – Label Key for Resource.
Returns:True if label is deleted, False otherwise.
Return type:bool
delete_permission(identity_id='_subscribed_', permission='Read')

Deletes permission from the resource.

Parameters:
  • identity_id (str) – Identity Id for the deletion. Defaults to _subscribed_.
  • permission (str) – Permission for the deletion. Defaults to Read.
Returns:

True if it is able to delete it.

Return type:

bool

description

Gets the Resource Description.

Type:str
folder

Compute or Get the folder name.

Type:str
folder_id

Gets the Folder ID.

Type:str
classmethod from_dict(a_dict)

Transforms Resource Dictionary to Resource object.

Parameters:a_dict (dict) – Resource Dictionary.
Returns:Resource Object.
Return type:crux.models.Resource
id

Gets the Resource ID.

Type:str
labels

Gets the Resource labels.

Type:dict
list_permissions()

Lists the permission on the resource.

Returns:List of Permission Objects.
Return type:list (crux.models.Permission)
media_type

Gets the Resource Description.

Type:str
modified_at

Gets modified_at.

Type:str
name

Gets the Resource Name.

Type:str
path

Compute or Get the resource path.

Type:str
provenance

Gets the Provenance.

Type:str
refresh()

Refresh Resource model from API backend.

Returns:
True, if it is able to refresh the model,
False otherwise.
Return type:bool
size

Gets the size.

Type:int
storage_id

Gets the Storage ID.

Type:str
tags

Gets the Resource Tags.

Type:list of str
to_dict()

Transforms Resource object to Resource Dictionary.

Returns:Resource Dictionary.
Return type:dict
type

Gets the Resource Type.

Type:str
update(name=None, description=None, tags=None)

Updates the metadata for Resource.

Parameters:
  • name (str) – Name of resource. Defaults to None.
  • description (str) – Description of the resource. Defaults to None.
  • tags (list of str) – List of tags. Defaults to None.
Returns:

True, if resource is updated.

Return type:

bool

Raises:
  • ValueError – It is raised if name, description or tags are unset.
  • TypeError – It is raised if tags are not of type List.

crux.models.table module

Module contains Table model.

class crux.models.table.Table(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Table model.

download(dest, media_type, chunk_size=10485760)

Downloads the table resource.

Parameters:
  • dest (str or file) – Local OS path at which file resource will be downloaded.
  • media_type (str) – Content Type for download.
  • chunk_size (int) – Number of bytes to be read in memory.
Returns:

True if it is downloaded.

Return type:

bool

Raises:

TypeError – If dest is not a file like or string type.

to_dict()

Transforms Table object to Table Dictionary.

Returns:Table Dictionary.
Return type:dict

Module contents

Module containing models that represent objects returned by the API.

class crux.models.Identity(identity_id=None, parent_identity_id=None, description=None, company_name=None, first_name=None, last_name=None, role=None, phone=None, email=None, type=None, website=None, landing_page=None, connection=None, raw_response=None)

Bases: crux.models.model.CruxModel

Identity Model.

company_name

Gets the Company name.

Type:str
description

Gets the Description.

Type:str
email

Gets the Email.

Type:str
first_name

Gets the First name.

Type:str
classmethod from_dict(a_dict)

Transforms Identity Dictionary to Identity object.

Parameters:a_dict (dict) – Identity Dictionary.
Returns:Identity Object.
Return type:crux.models.Identity
identity_id

Gets the Identity Id.

Type:str
landing_page

Gets the Landing Page.

Type:str
last_name

Gets the Last name.

Type:str
parent_identity_id

Gets the Parent Identity Id.

Type:str
phone

Gets the phone.

Type:str
role

Gets the Role.

Type:str
to_dict()

Transforms Identity object to Identity Dictionary.

Returns:Identity Dictionary.
Return type:dict
type

Gets the Type.

Type:str
website

Gets the Website.

Type:str
class crux.models.Permission(target_id=None, identity_id=None, permission_name=None)

Bases: crux.models.model.CruxModel

Permission Model.

classmethod from_dict(a_dict)

Transforms Dataset Dictionary to Dataset object.

Parameters:a_dict (dict) – Dataset Dictionary.
Returns:Permission Object.
Return type:crux.models.Permission
identity_id

Gets the Identity ID.

Type:str
permission_name

Gets the Permission Name.

Type:str
target_id

Gets the Target ID.

Type:str
to_dict()

Transforms Dataset object to Dataset Dictionary.

Returns:Dataset Dictionary.
Return type:dict
class crux.models.LoadJob(job_id=None, job_url=None)

Bases: crux.models.job.AbstractJob

LoadJob Model.

classmethod from_dict(a_dict)

Transforms LoadJob Dictionary to LoadJob object.

Parameters:a_dict (dict) – LoadJob Dictionary.
Returns:LoadJob Object.
Return type:crux.models.LoadJob
job_id

Gets the Job Id.

Type:str
job_url

Gets the Job URL.

Type:str
class crux.models.StitchJob(job_id=None, status=None)

Bases: crux.models.job.AbstractJob

Stitch Job Model.

classmethod from_dict(a_dict)

Transforms Stitch Job Dictionary to Stitch Job object.

Parameters:a_dict (dict) – Stitch Job Dictionary.
Returns:Stitch Job Object.
Return type:crux.models.job.StitchJob
class crux.models.Job(job_id=None, status=None, statistics=None, connection=None)

Bases: crux.models.job.AbstractJob

Job Model.

classmethod from_dict(a_dict)

Transforms Job Dictionary to Job object.

Parameters:a_dict (dict) – Job Dictionary.
Returns:Job Object.
Return type:crux.models.Job
class crux.models.Resource(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.model.CruxModel

Resource Model.

add_label(label_key, label_value)

Adds label to Resource.

Parameters:
  • label_key (str) – Label Key for Resource.
  • label_value (str) – Label Value for Resource.
Returns:

True if label is added, False otherwise.

Return type:

bool

add_permission(identity_id='_subscribed_', permission='Read')

Adds permission to the resource.

Parameters:
  • identity_id – Identity Id to be set. Defaults to _subscribed_.
  • permission – Permission to be set. Defaults to Read.
Returns:

Permission Object.

Return type:

crux.models.Permission

as_of

Gets the as_of.

Type:str
config

Gets the config.

Type:str
created_at

Gets created_at.

Type:str
dataset_id

Gets the Dataset ID.

Type:str
delete()

Deletes Resource from Dataset.

Returns:True if it is deleted.
Return type:bool
delete_label(label_key)

Deletes label from Resource.

Parameters:label_key (str) – Label Key for Resource.
Returns:True if label is deleted, False otherwise.
Return type:bool
delete_permission(identity_id='_subscribed_', permission='Read')

Deletes permission from the resource.

Parameters:
  • identity_id (str) – Identity Id for the deletion. Defaults to _subscribed_.
  • permission (str) – Permission for the deletion. Defaults to Read.
Returns:

True if it is able to delete it.

Return type:

bool

description

Gets the Resource Description.

Type:str
folder

Compute or Get the folder name.

Type:str
folder_id

Gets the Folder ID.

Type:str
classmethod from_dict(a_dict)

Transforms Resource Dictionary to Resource object.

Parameters:a_dict (dict) – Resource Dictionary.
Returns:Resource Object.
Return type:crux.models.Resource
id

Gets the Resource ID.

Type:str
labels

Gets the Resource labels.

Type:dict
list_permissions()

Lists the permission on the resource.

Returns:List of Permission Objects.
Return type:list (crux.models.Permission)
media_type

Gets the Resource Description.

Type:str
modified_at

Gets modified_at.

Type:str
name

Gets the Resource Name.

Type:str
path

Compute or Get the resource path.

Type:str
provenance

Gets the Provenance.

Type:str
refresh()

Refresh Resource model from API backend.

Returns:
True, if it is able to refresh the model,
False otherwise.
Return type:bool
size

Gets the size.

Type:int
storage_id

Gets the Storage ID.

Type:str
tags

Gets the Resource Tags.

Type:list of str
to_dict()

Transforms Resource object to Resource Dictionary.

Returns:Resource Dictionary.
Return type:dict
type

Gets the Resource Type.

Type:str
update(name=None, description=None, tags=None)

Updates the metadata for Resource.

Parameters:
  • name (str) – Name of resource. Defaults to None.
  • description (str) – Description of the resource. Defaults to None.
  • tags (list of str) – List of tags. Defaults to None.
Returns:

True, if resource is updated.

Return type:

bool

Raises:
  • ValueError – It is raised if name, description or tags are unset.
  • TypeError – It is raised if tags are not of type List.
class crux.models.File(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

File Model.

download(dest, chunk_size=10485760)

Downloads the file resource.

Parameters:
  • dest (str or file) – Local OS path at which file resource will be downloaded.
  • chunk_size (int) – Number of bytes to be read in memory.
Returns:

True if it is downloaded.

Return type:

bool

Raises:

TypeError – If dest is not a file like or string type.

iter_content(chunk_size=10485760)

Streams the file resource.

Parameters:chunk_size (int) – Chunk Size for the stream.
Yields:bytes – Bytes of file resource.
Raises:ValueError – If chunk_size is not multiple of 256 KiB.
to_dict()

Transforms File object to File Dictionary.

Returns:File Dictionary.
Return type:dict
upload(src, media_type=None)

Uploads the content to empty file resource.

Parameters:
  • src (str or file) – Local OS path whose content is to be uploaded.
  • media_type (str) – Content type of the file. Defaults to None.
Returns
File: File model object.
Raises:TypeError – If src type is invalid.
class crux.models.Folder(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Folder Model.

add_permission(identity_id='_subscribed_', permission='Read', recursive=False)

Adds permission to the Folder resource.

Parameters:
  • identity_id (str) – Identity Id to be set. Defaults to _subscribed_.
  • permission (str) – Permission to be set. Defaults to Read.
  • recursive (bool) – If recursive is set to True, it will recursive apply permission to all resources under the folder resource.
Returns:

If recursive is set then it returns True.

If recursive is unset then it returns Permission object.

Return type:

bool or crux.models.Permission

delete_permission(identity_id='_subscribed_', permission='Read', recursive=False)

Deletes permission from Folder resource.

Parameters:
  • identity_id (str) – Identity Id for the deletion. Defaults to _subscribed_.
  • permission (str) – Permission for deletion. Defaults to Read.
  • recursive (bool) – If recursive is set to True, it will recursively delete permission from all resources under the folder resource. Defaults to False.
Returns:

True if it is able to delete it.

Return type:

bool

to_dict()

Transforms Folder object to Folder Dictionary.

Returns:Folder Dictionary.
Return type:dict
class crux.models.Table(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Table model.

download(dest, media_type, chunk_size=10485760)

Downloads the table resource.

Parameters:
  • dest (str or file) – Local OS path at which file resource will be downloaded.
  • media_type (str) – Content Type for download.
  • chunk_size (int) – Number of bytes to be read in memory.
Returns:

True if it is downloaded.

Return type:

bool

Raises:

TypeError – If dest is not a file like or string type.

to_dict()

Transforms Table object to Table Dictionary.

Returns:Table Dictionary.
Return type:dict
class crux.models.Dataset(id=None, owner_identity_id=None, contact_identity_id=None, name=None, description=None, website=None, created_at=None, modified_at=None, connection=None, raw_response=None, tags=None)

Bases: crux.models.model.CruxModel

Dataset Model.

add_label(label_key, label_value)

Adds label to Dataset.

Parameters:
  • label_key (str) – Label Key for Dataset.
  • label_value (str) – Label Value for Dataset.
Returns:

True if labels are added.

Return type:

bool

add_permission(identity_id='_subscribed_', permission='Read', resource_paths=None, resource_objects=None, resource_ids=None)

Adds permission to all or specific Dataset resources.

Parameters:
  • identity_id (str) – Identity Id to be set. Defaults to _subscribed_.
  • permission (str) – Permission to be set. Defaults to Read.
  • resource_paths (list of str) – List of resource paths on which the permission should be applied. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.
  • resource_objects (list of crux.models.Resource) – List of resource objects on which the permission should be applied. Overrides resource_paths. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.
  • resource_ids (list of str) – List of resource ids on which permission should be applied. Overrides resource_pathss and resource_objects. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.
Returns:

True if permission is applied.

Return type:

bool

contact_identity_id

Gets the Contact Identity ID.

Type:str
create_file(path, tags=None, description=None)

Creates File resource in Dataset.

Parameters:
  • path (str) – Path of the file resource.
  • tags (list of str) – Tags of the file resource. Defaults to None.
  • description (str) – Description of the file resource. Defaults to None.
Returns:

File Object.

Return type:

crux.models.File

create_folder(path, folder='/', tags=None, description=None)

Creates Folder resource in Dataset.

Parameters:
  • path (str) – Path of the Folder resource.
  • folder (str) – Parent folder of the Folder resource. Defaults to /.
  • tags (list of str) – Tags of the Folder resource. Defaults to None.
  • description (str) – Description of the Folder resource. Defaults to None.
Returns:

Folder Object.

Return type:

crux.models.Folder

create_query(path, config, tags=None, description=None)

Creates Query resource in Dataset.

Parameters:
  • path (str) – Query resource Path.
  • config (dict) – Query configuration.
  • tags (list of str) – Tags of the Query resource. Defaults to None.
  • description (str) – Description of the Query resource. Defaults to None.
Returns:

Query Object.

Return type:

crux.models.Query

create_table(path, config, tags=None, description=None)

Creates Table resource in Dataset.

Parameters:
  • path (str) – Table resource Path.
  • config (dict) – Table Schema Configuration.
  • tags (list of str) – Tags of the Table resource. Defaults to None.
  • description (str) – Description of the Table resource. Defaults to None.
Returns:

Table Object

Return type:

crux.models.Table

created_at

Gets the Dataset created_at.

Type:str
delete()

Deletes the dataset.

Returns:True if dataset is deleted.
Return type:bool
delete_label(label_key)

Deletes label from Dataset.

Parameters:label_key (str) – Label Key for Dataset.
Returns:True if labels are deleted.
Return type:bool
delete_permission(identity_id='_subscribed_', permission='Read', resource_paths=None, resource_objects=None, resource_ids=None)

Method which deletes permission from all or specific Dataset resources.

Parameters:
  • identity_id (str) – Identity Id for the deletion. Defaults to _subscribed_
  • permission (str) – Permission for the deletion. Defaults to Read
  • resource_paths (list of crux.models.Resource) – List of resource path from which the permission should be deleted. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.
  • resource_objects (list of crux.models.Resource) – List of resource objects from which the permission should be deleted. Overrides resource_paths. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.
  • resource_ids (list of crux.models.Resource) – List of resource ids from which the permission should be deleted. Overrides resource_paths and resource_objects. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.
Returns:

True if it is able to delete the permission.

Return type:

bool

description

Gets the Dataset Description.

Type:str
download_files(folder, local_path)

Downloads the resources recursively.

Parameters:
  • folder (str) – Crux Dataset Folder from where the file resources should be recursively downloaded.
  • local_path (str) – Local OS Path where the file resources should be downloaded.
Returns:

List of location of download files.

Return type:

list (str)

Raises:
  • ValueError – If Folder or local_path is None.
  • OSError – If local_path is an invalid directory location.
find_resources_by_label(predicates, max_per_page=1000)

Method which searches the resouces for given labels in Dataset

Each predicate can be either:

  • Lexicographical equal
  • Lexicographical less than
  • Lexicographical less than or equal to
  • Lexicographical greater than
  • Lexicographical greater than or equal to
  • A list of OR predicates
  • A list of AND predicates
predicates = [
    {"op": "eq", "key": "key1", "val": "abcd"},
    {"op": "ne", "key": "key1", "val": "zzzz"},
    {"op": "lt", "key": "key1", "val": "abd"},
    {"op": "gt", "key": "key1", "val": "abc"},
    {"op": "lte", "key": "key1", "val": "abd"},
    {"op": "gte", "key": "key1", "val": "abc"},
    {"op": "or", "in":
        [
            {"op": "eq", "key": "key1", "val": "abcd"},
            # more OR predicates...
        ]
    },
    {"op": "and", "in":
        [
            {"op": "eq", "key": "key1", "val": "abcd"},
            # more AND predicates...
        ]
    }
]
Parameters:
  • predicates (list of dict) – List of dictionary predicates for finding resources.
  • max_per_page (int) – Pagination limit. Defaults to 1000.
Returns:

List of resource matching the query parameters.

Return type:

list (crux.models.Resource)

Example

from crux import Crux

conn = Crux()
dataset_object = conn.get_dataset(id="dataset_id")
predicates=[
    {"op":"eq","key":"test_label1","val":"test_value1"}
]
resource_objects = dataset_object.find_resources_by_label(
    predicates=predicates
)
classmethod from_dict(a_dict)

Transforms Dataset Dictionary to Dataset object.

Parameters:a_dict (dict) – Dataset Dictionary.
Returns:Dataset Object.
Return type:crux.models.Dataset
get_file(path)

Gets the File resource object.

Parameters:path (str) – File resource path.
Returns:File Object.
Return type:crux.models.File
get_folder(path)

Gets the Folder resource object.

Parameters:path (str) – Folder resource path.
Returns:Folder Object.
Return type:crux.models.Folder
get_label(label_key)

Gets label value of Dataset.

Parameters:label_key (str) – Label Key for Dataset.
Returns:Label Object.
Return type:crux.models.Label
get_query(path)

Gets the Query resource object.

Parameters:path (str) – Query resource path.
Returns:Query Object.
Return type:crux.models.Query
get_stitch_job(job_id)

Stitch Job Details.

Parameters:job_id (str) – Job ID of the Stitch Job.
Returns:StitchJob object.
Return type:crux.models.StitchJob
get_table(path)

Method which gets the Table resource

Parameters:path – Table resource path
Returns:Table Object
Return type:crux.models.Table
id

Gets the Dataset ID.

Type:str
list_files(sort=None, folder='/', offset=0, limit=100)

Lists the files.

Parameters:
  • sort (str) – Sets whether to sort or not. Defaults to None.
  • folder (str) – Folder for which resource should be listed. Defaults to /.
  • offset (int) – Sets the offset. Defaults to 0.
  • limit (int) – Sets the limit. Defaults to 100.
Returns:

List of File objects.

Return type:

list (crux.models.File)

list_resources(folder='/', offset=0, limit=1, include_folders=False, sort=None)

Lists the resources in Dataset.

Parameters:
  • folder (str) – Folder for which resource should be listed. Defaults to /.
  • offset (int) – Sets the offset. Defaults to 0.
  • limit (int) – Sets the limit. Defaults to 1.
  • include_folders (bool) – Sets whether to include folders or not. Defaults to False.
  • sort (str) – Sets whether to sort or not. Defaults to None.
Returns:

List of File resource objects.

Return type:

list (crux.models.Resource)

load_table_from_file(source_file, dest_table, append=False)

Loads table from file resource.

Parameters:
  • source_file (str or file) – Source File Path in string or File Object.
  • dest_table (str or crux.models.Table) – Destination File Path in string or Table Object.
  • append (bool) – Sets whether to append to existing table. Defaults to False.
Returns:

LoadJob Object.

Return type:

crux.models.LoadJob

Raises:

TypeError – If source_file or dest_table is not file or string object.

modified_at

Gets the Dataset modified_at.

Type:str
name

Gets the Dataset Name.

Type:str
owner_identity_id

Gets the Owner Identity ID.

Type:str
provenance

Compute or Get the provenance.

Type:str
stitch(source_resources, destination_resource, labels=None, tags=None, description=None)

Method which stitches multiple Avro resources into single Avro resource

Parameters:
  • source_resources (list of str or file) – List of resource paths which are to be stitched.
  • destination_resource (str) – Resource Path to load the stitched output
  • labels (dict) – Key/Value labels that should be applied to stitched resource
  • tags (list of str) – List of tags to be applied on destination resource. Taken into consideration if resource is required to be created.
  • description (str) – Description to be applied created destination. Taken into consideration if resource is required to be created.
Returns:

File object of destination resource.

Job ID for background running job.

Return type:

tuple (crux.models.File, str)

tags

Gets the tags.

Type:str
to_dict()

Transforms Dataset object to Dataset Dictionary.

Returns:Dataset Dictionary.
Return type:dict
update(name=None, description=None, tags=None)

Updates the metadata of dataset.

Parameters:
  • name (str) – Name of the dataset. Defaults to None.
  • description (str) – Description of the dataset. Defaults to None.
  • tags (list of str) – List of tags. Defaults to None.
Returns:

True, if dataset is updated.

Return type:

bool

Raises:
  • ValueError – It is raised if name, description or tags are unset.
  • TypeError – It is raised if tags is not of type list.
upload_file(src, dest, media_type=None, description=None, tags=None)

Uploads the File.

Parameters:
  • src (str or file) – Local OS path whose content is to be uploaded to file resource.
  • dest (str) – File resource path.
  • media_type (str) – Content type of the file. Defaults to None.
  • description (str) – Description of the file. Defaults to None.
  • tags (list of str) – Tags to be attached to the file resource.
Returns:

File Object.

Return type:

crux.models.File

upload_files(local_path, folder, media_type=None, description=None, tags=None)

Uploads the resources recursively.

Parameters:
  • local_path (str) – Local OS Path from where the file resources should be uploaded.
  • media_type (str) – Content Types of File resources to be uploaded. Defaults to None.
  • folder (str) – Crux Dataset Folder where file resources should be recursively uploaded.
  • description (str) – Description to be set on uploaded resources. Defaults to None.
  • tags (list of str) – Tags to be set on uploaded resources. Defaults to None.
Returns:

List of uploaded file objects.

Return type:

list (crux.models.File)

Raises:
  • ValueError – If folder or local_path is None.
  • OSError – If local_path is an invalid directory location.
upload_query(sql_file, path, description=None, tags=None)

Uploads the Query File.

Parameters:
  • path (str) – Query resource path.
  • sql_file (str) – Local OS SQL file to be uploaded as query resource.
  • description (str) – Description for the Query resource. Defaults to None.
  • tags (list of str) – Tags for the Query resource. Defaults to None.
Returns:

Query Object.

Return type:

crux.models.Query

website

Gets the Dataset Website.

Type:str
class crux.models.Query(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Query Model.

download(dest, format='csv', params=None)

Method which streams the Query

Parameters:
  • dest (str) – Local OS path at which resource will be downloaded.
  • media_type (str) – Output format of the query. Defaults to csv.
  • params (dict) – Run parameters. Defaults to None.
Returns:

True if it is downloaded.

Return type:

bool

run(format='csv', params=None, chunk_size=10485760, decode_unicode=False)

Method which streams the Query

Parameters:
  • format (str) – Output format of the query. Defaults to csv.
  • params (dict) – Run parameters. Defaults to None.
  • chunk_size (int) – Chunk Size for the stream
  • decode_unicode (bool) – If decode_unicode is True,content will be decoded using the best available encoding based on the response. Defaults to False.
Yields:

bytes – Bytes of content.

Raises:

ValueError – If chunk size is not multiple of 256 KiB.

to_dict()

Transforms Query object to Query dictionary.

Returns:Query dictionary.
Return type:dict
class crux.models.Label(label_key=None, label_value=None)

Bases: crux.models.model.CruxModel

Label Model.

classmethod from_dict(a_dict)

Transforms Label Dictionary to Label object.

Parameters:a_dict (dict) – Label Dictionary.
Returns:Label Object.
Return type:crux.models.Label
to_dict()

Transforms Label object to Label Dictionary.

Returns:Label Dictionary.
Return type:dict