crux.models package

Submodules

crux.models.dataset module

Module contains Dataset model.

class crux.models.dataset.Dataset(id=None, owner_identity_id=None, contact_identity_id=None, name=None, description=None, website=None, created_at=None, modified_at=None, connection=None, raw_response=None, tags=None)

Bases: crux.models.model.CruxModel

Dataset Model.

add_label(label_key, label_value)

Adds label to Dataset.

Parameters
  • label_key (str) – Label Key for Dataset.

  • label_value (str) – Label Value for Dataset.

Returns

True if labels are added.

Return type

bool

add_permission(identity_id, permission)

Adds permission to the Dataset.

Parameters
  • identity_id – Identity Id to be set.

  • permission – Permission to be set.

Returns

Permission Object.

Return type

crux.models.Permission

add_permission_to_resources(identity_id, permission, resource_paths=None, resource_objects=None, resource_ids=None)

Adds permission to all or specific Dataset resources.

Parameters
  • identity_id (str) – Identity Id to be set.

  • permission (str) – Permission to be set.

  • resource_paths (list of str) – List of resource paths on which the permission should be applied. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.

  • resource_objects (list of crux.models.Resource) – List of resource objects on which the permission should be applied. Overrides resource_paths. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.

  • resource_ids (list of str) – List of resource ids on which permission should be applied. Overrides resource_pathss and resource_objects. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.

Returns

True if permission is applied.

Return type

bool

contact_identity_id

Gets the Contact Identity ID.

Type

str

create_file(path, tags=None, description=None)

Creates File resource in Dataset.

Parameters
  • path (str) – Path of the file resource.

  • tags (list of str) – Tags of the file resource. Defaults to None.

  • description (str) – Description of the file resource. Defaults to None.

Returns

File Object.

Return type

crux.models.File

create_folder(path, folder='/', tags=None, description=None)

Creates Folder resource in Dataset.

Parameters
  • path (str) – Path of the Folder resource.

  • folder (str) – Parent folder of the Folder resource. Defaults to /.

  • tags (list of str) – Tags of the Folder resource. Defaults to None.

  • description (str) – Description of the Folder resource. Defaults to None.

Returns

Folder Object.

Return type

crux.models.Folder

create_query(path, config, tags=None, description=None)

Creates Query resource in Dataset.

Parameters
  • path (str) – Query resource Path.

  • config (dict) – Query configuration.

  • tags (list of str) – Tags of the Query resource. Defaults to None.

  • description (str) – Description of the Query resource. Defaults to None.

Returns

Query Object.

Return type

crux.models.Query

create_table(path, config, tags=None, description=None)

Creates Table resource in Dataset.

Parameters
  • path (str) – Table resource Path.

  • config (dict) – Table Schema Configuration.

  • tags (list of str) – Tags of the Table resource. Defaults to None.

  • description (str) – Description of the Table resource. Defaults to None.

Returns

Table Object

Return type

crux.models.Table

created_at

Gets the Dataset created_at.

Type

str

delete()

Deletes the dataset.

Returns

True if dataset is deleted.

Return type

bool

delete_label(label_key)

Deletes label from Dataset.

Parameters

label_key (str) – Label Key for Dataset.

Returns

True if labels are deleted.

Return type

bool

delete_permission(identity_id, permission)

Deletes permission from the Dataset.

Parameters
  • identity_id (str) – Identity Id for the deletion.

  • permission (str) – Permission for the deletion.

Returns

True if it is able to delete it.

Return type

bool

delete_permission_from_resources(identity_id, permission, resource_paths=None, resource_objects=None, resource_ids=None)

Method which deletes permission from all or specific Dataset resources.

Parameters
  • identity_id (str) – Identity Id for the deletion.

  • permission (str) – Permission for the deletion.

  • resource_paths (list of crux.models.Resource) – List of resource path from which the permission should be deleted. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.

  • resource_objects (list of crux.models.Resource) – List of resource objects from which the permission should be deleted. Overrides resource_paths. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.

  • resource_ids (list of crux.models.Resource) – List of resource ids from which the permission should be deleted. Overrides resource_paths and resource_objects. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.

Returns

True if it is able to delete the permission.

Return type

bool

description

Gets the Dataset Description.

Type

str

download_files(folder, local_path)

Downloads the resources recursively.

Parameters
  • folder (str) – Crux Dataset Folder from where the file resources should be recursively downloaded.

  • local_path (str) – Local OS Path where the file resources should be downloaded.

Returns

List of location of download files.

Return type

list (str)

Raises
  • ValueError – If Folder or local_path is None.

  • OSError – If local_path is an invalid directory location.

find_resources_by_label(predicates, max_per_page=1000)

Method which searches the resouces for given labels in Dataset

Each predicate can be either:

  • Lexicographical equal

  • Lexicographical less than

  • Lexicographical less than or equal to

  • Lexicographical greater than

  • Lexicographical greater than or equal to

  • A list of OR predicates

  • A list of AND predicates

predicates = [
    {"op": "eq", "key": "key1", "val": "abcd"},
    {"op": "ne", "key": "key1", "val": "zzzz"},
    {"op": "lt", "key": "key1", "val": "abd"},
    {"op": "gt", "key": "key1", "val": "abc"},
    {"op": "lte", "key": "key1", "val": "abd"},
    {"op": "gte", "key": "key1", "val": "abc"},
    {"op": "or", "in":
        [
            {"op": "eq", "key": "key1", "val": "abcd"},
            # more OR predicates...
        ]
    },
    {"op": "and", "in":
        [
            {"op": "eq", "key": "key1", "val": "abcd"},
            # more AND predicates...
        ]
    }
]
Parameters
  • predicates (list of dict) – List of dictionary predicates for finding resources.

  • max_per_page (int) – Pagination limit. Defaults to 1000.

Returns

List of resource matching the query parameters.

Return type

list (crux.models.Resource)

Example

from crux import Crux

conn = Crux()
dataset_object = conn.get_dataset(id="dataset_id")
predicates=[
    {"op":"eq","key":"test_label1","val":"test_value1"}
]
resource_objects = dataset_object.find_resources_by_label(
    predicates=predicates
)
classmethod from_dict(a_dict)

Transforms Dataset Dictionary to Dataset object.

Parameters

a_dict (dict) – Dataset Dictionary.

Returns

Dataset Object.

Return type

crux.models.Dataset

get_file(path)

Gets the File resource object.

Parameters

path (str) – File resource path.

Returns

File Object.

Return type

crux.models.File

get_folder(path)

Gets the Folder resource object.

Parameters

path (str) – Folder resource path.

Returns

Folder Object.

Return type

crux.models.Folder

get_label(label_key)

Gets label value of Dataset.

Parameters

label_key (str) – Label Key for Dataset.

Returns

Label Object.

Return type

crux.models.Label

get_query(path)

Gets the Query resource object.

Parameters

path (str) – Query resource path.

Returns

Query Object.

Return type

crux.models.Query

get_stitch_job(job_id)

Stitch Job Details.

Parameters

job_id (str) – Job ID of the Stitch Job.

Returns

StitchJob object.

Return type

crux.models.StitchJob

get_table(path)

Method which gets the Table resource

Parameters

path – Table resource path

Returns

Table Object

Return type

crux.models.Table

id

Gets the Dataset ID.

Type

str

list_files(sort=None, folder='/', offset=0, limit=100)

Lists the files.

Parameters
  • sort (str) – Sets whether to sort or not. Defaults to None.

  • folder (str) – Folder for which resource should be listed. Defaults to /.

  • offset (int) – Sets the offset. Defaults to 0.

  • limit (int) – Sets the limit. Defaults to 100.

Returns

List of File objects.

Return type

list (crux.models.File)

list_permissions()

Lists the permission on the Dataset.

Returns

List of Permission Objects.

Return type

list (crux.models.Permission)

list_resources(folder='/', offset=0, limit=1, include_folders=False, sort=None)

Lists the resources in Dataset.

Parameters
  • folder (str) – Folder for which resource should be listed. Defaults to /.

  • offset (int) – Sets the offset. Defaults to 0.

  • limit (int) – Sets the limit. Defaults to 1.

  • include_folders (bool) – Sets whether to include folders or not. Defaults to False.

  • sort (str) – Sets whether to sort or not. Defaults to None.

Returns

List of File resource objects.

Return type

list (crux.models.Resource)

load_table_from_file(source_file, dest_table, append=False)

Loads table from file resource.

Parameters
  • source_file (str or file) – Source File Path in string or File Object.

  • dest_table (str or crux.models.Table) – Destination File Path in string or Table Object.

  • append (bool) – Sets whether to append to existing table. Defaults to False.

Returns

LoadJob Object.

Return type

crux.models.LoadJob

Raises

TypeError – If source_file or dest_table is not file or string object.

modified_at

Gets the Dataset modified_at.

Type

str

name

Gets the Dataset Name.

Type

str

owner_identity_id

Gets the Owner Identity ID.

Type

str

provenance

Compute or Get the provenance.

Type

str

stitch(source_resources, destination_resource, labels=None, tags=None, description=None)

Method which stitches multiple Avro resources into single Avro resource

Parameters
  • source_resources (list of str or file) – List of resource paths which are to be stitched.

  • destination_resource (str) – Resource Path to load the stitched output

  • labels (dict) – Key/Value labels that should be applied to stitched resource

  • tags (list of str) – List of tags to be applied on destination resource. Taken into consideration if resource is required to be created.

  • description (str) – Description to be applied created destination. Taken into consideration if resource is required to be created.

Returns

File object of destination resource.

Job ID for background running job.

Return type

tuple (crux.models.File, str)

tags

Gets the tags.

Type

str

to_dict()

Transforms Dataset object to Dataset Dictionary.

Returns

Dataset Dictionary.

Return type

dict

update(name=None, description=None, tags=None)

Updates the metadata of dataset.

Parameters
  • name (str) – Name of the dataset. Defaults to None.

  • description (str) – Description of the dataset. Defaults to None.

  • tags (list of str) – List of tags. Defaults to None.

Returns

True, if dataset is updated.

Return type

bool

Raises
  • ValueError – It is raised if name, description or tags are unset.

  • TypeError – It is raised if tags is not of type list.

upload_file(src, dest, media_type=None, description=None, tags=None)

Uploads the File.

Parameters
  • src (str or file) – Local OS path whose content is to be uploaded to file resource.

  • dest (str) – File resource path.

  • media_type (str) – Content type of the file. Defaults to None.

  • description (str) – Description of the file. Defaults to None.

  • tags (list of str) – Tags to be attached to the file resource.

Returns

File Object.

Return type

crux.models.File

upload_files(local_path, folder, media_type=None, description=None, tags=None)

Uploads the resources recursively.

Parameters
  • local_path (str) – Local OS Path from where the file resources should be uploaded.

  • media_type (str) – Content Types of File resources to be uploaded. Defaults to None.

  • folder (str) – Crux Dataset Folder where file resources should be recursively uploaded.

  • description (str) – Description to be set on uploaded resources. Defaults to None.

  • tags (list of str) – Tags to be set on uploaded resources. Defaults to None.

Returns

List of uploaded file objects.

Return type

list (crux.models.File)

Raises
  • ValueError – If folder or local_path is None.

  • OSError – If local_path is an invalid directory location.

upload_query(sql_file, path, description=None, tags=None)

Uploads the Query File.

Parameters
  • path (str) – Query resource path.

  • sql_file (str) – Local OS SQL file to be uploaded as query resource.

  • description (str) – Description for the Query resource. Defaults to None.

  • tags (list of str) – Tags for the Query resource. Defaults to None.

Returns

Query Object.

Return type

crux.models.Query

website

Gets the Dataset Website.

Type

str

crux.models.file module

Module contains File model.

class crux.models.file.File(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

File Model.

download(dest, chunk_size=10485760)

Downloads the file resource.

Parameters
  • dest (str or file) – Local OS path at which file resource will be downloaded.

  • chunk_size (int) – Number of bytes to be read in memory.

Returns

True if it is downloaded.

Return type

bool

Raises

TypeError – If dest is not a file like or string type.

iter_content(chunk_size=10485760)

Streams the file resource.

Parameters

chunk_size (int) – Chunk Size for the stream.

Yields

bytes – Bytes of file resource.

Raises

ValueError – If chunk_size is not multiple of 256 KiB.

to_dict()

Transforms File object to File Dictionary.

Returns

File Dictionary.

Return type

dict

upload(src, media_type=None)

Uploads the content to empty file resource.

Parameters
  • src (str or file) – Local OS path whose content is to be uploaded.

  • media_type (str) – Content type of the file. Defaults to None.

Returns

File: File model object.

Raises

TypeError – If src type is invalid.

crux.models.folder module

Module contains File model.

class crux.models.folder.Folder(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Folder Model.

add_permission(identity_id, permission, recursive=False)

Adds permission to the Folder resource.

Parameters
  • identity_id (str) – Identity Id to be set.

  • permission (str) – Permission to be set.

  • recursive (bool) – If recursive is set to True, it will recursive apply permission to all resources under the folder resource.

Returns

If recursive is set then it returns True.

If recursive is unset then it returns Permission object.

Return type

bool or crux.models.Permission

delete_permission(identity_id, permission, recursive=False)

Deletes permission from Folder resource.

Parameters
  • identity_id (str) – Identity Id for the deletion.

  • permission (str) – Permission for deletion.

  • recursive (bool) – If recursive is set to True, it will recursively delete permission from all resources under the folder resource. Defaults to False.

Returns

True if it is able to delete it.

Return type

bool

to_dict()

Transforms Folder object to Folder Dictionary.

Returns

Folder Dictionary.

Return type

dict

crux.models.identity module

Module contains Identity model.

class crux.models.identity.Identity(identity_id=None, parent_identity_id=None, description=None, company_name=None, first_name=None, last_name=None, role=None, phone=None, email=None, type=None, website=None, landing_page=None, connection=None, raw_response=None)

Bases: crux.models.model.CruxModel

Identity Model.

company_name

Gets the Company name.

Type

str

description

Gets the Description.

Type

str

email

Gets the Email.

Type

str

first_name

Gets the First name.

Type

str

classmethod from_dict(a_dict)

Transforms Identity Dictionary to Identity object.

Parameters

a_dict (dict) – Identity Dictionary.

Returns

Identity Object.

Return type

crux.models.Identity

identity_id

Gets the Identity Id.

Type

str

landing_page

Gets the Landing Page.

Type

str

last_name

Gets the Last name.

Type

str

parent_identity_id

Gets the Parent Identity Id.

Type

str

phone

Gets the phone.

Type

str

role

Gets the Role.

Type

str

to_dict()

Transforms Identity object to Identity Dictionary.

Returns

Identity Dictionary.

Return type

dict

type

Gets the Type.

Type

str

website

Gets the Website.

Type

str

crux.models.job module

Module contains AbstractJob, Job, LoadJob Model.

class crux.models.job.AbstractJob

Bases: crux.models.model.CruxModel

AbstractJob Model.

class crux.models.job.Job(job_id=None, status=None, statistics=None, connection=None)

Bases: crux.models.job.AbstractJob

Job Model.

classmethod from_dict(a_dict)

Transforms Job Dictionary to Job object.

Parameters

a_dict (dict) – Job Dictionary.

Returns

Job Object.

Return type

crux.models.Job

class crux.models.job.Load(input_files=None, input_file_bytes=None, output_rows=None, output_bytes=None, bad_records=None)

Bases: object

Job Load Model

classmethod from_dict(a_dict)

Transforms Job Load Dictionary to Job Load object.

Parameters

a_dict (dict) – Job Load Dictionary.

Returns

Job Load Object.

Return type

crux.models.job.Load

class crux.models.job.LoadJob(job_id=None, job_url=None)

Bases: crux.models.job.AbstractJob

LoadJob Model.

classmethod from_dict(a_dict)

Transforms LoadJob Dictionary to LoadJob object.

Parameters

a_dict (dict) – LoadJob Dictionary.

Returns

LoadJob Object.

Return type

crux.models.LoadJob

job_id

Gets the Job Id.

Type

str

job_url

Gets the Job URL.

Type

str

class crux.models.job.Statistics(creation_time=None, start_time=None, end_time=None, load=None)

Bases: object

Job Statistic Model.

classmethod from_dict(a_dict)

Transforms Job Statistics Dictionary to Job Statistics object.

Parameters

a_dict (dict) – Job Statistics Dictionary.

Returns

Job Statistics Object.

Return type

crux.models.job.Statistics

class crux.models.job.Status(state=None)

Bases: object

Job Status Model.

classmethod from_dict(a_dict)

Transforms Job Status Dictionary to Job Status object.

Parameters

a_dict (dict) – Job Status Dictionary.

Returns

Job Status Object.

Return type

crux.models.job.Status

class crux.models.job.StitchJob(job_id=None, status=None)

Bases: crux.models.job.AbstractJob

Stitch Job Model.

classmethod from_dict(a_dict)

Transforms Stitch Job Dictionary to Stitch Job object.

Parameters

a_dict (dict) – Stitch Job Dictionary.

Returns

Stitch Job Object.

Return type

crux.models.job.StitchJob

crux.models.label module

Module contains Label model.

class crux.models.label.Label(label_key=None, label_value=None)

Bases: crux.models.model.CruxModel

Label Model.

classmethod from_dict(a_dict)

Transforms Label Dictionary to Label object.

Parameters

a_dict (dict) – Label Dictionary.

Returns

Label Object.

Return type

crux.models.Label

to_dict()

Transforms Label object to Label Dictionary.

Returns

Label Dictionary.

Return type

dict

crux.models.model module

Module defines abstract CruxModel.

class crux.models.model.CruxModel

Bases: object

Absract Crux Model.

to_dict()

Absract to_dict method.

to_str()

Absract to_str method.

crux.models.permission module

Module contains Permission model.

class crux.models.permission.Permission(target_id=None, identity_id=None, permission_name=None)

Bases: crux.models.model.CruxModel

Permission Model.

classmethod from_dict(a_dict)

Transforms Dataset Dictionary to Dataset object.

Parameters

a_dict (dict) – Dataset Dictionary.

Returns

Permission Object.

Return type

crux.models.Permission

identity_id

Gets the Identity ID.

Type

str

permission_name

Gets the Permission Name.

Type

str

target_id

Gets the Target ID.

Type

str

to_dict()

Transforms Dataset object to Dataset Dictionary.

Returns

Dataset Dictionary.

Return type

dict

crux.models.query module

Module contains Query model.

class crux.models.query.Query(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Query Model.

download(dest, format='csv', params=None)

Method which streams the Query

Parameters
  • dest (str) – Local OS path at which resource will be downloaded.

  • media_type (str) – Output format of the query. Defaults to csv.

  • params (dict) – Run parameters. Defaults to None.

Returns

True if it is downloaded.

Return type

bool

run(format='csv', params=None, chunk_size=10485760, decode_unicode=False)

Method which streams the Query

Parameters
  • format (str) – Output format of the query. Defaults to csv.

  • params (dict) – Run parameters. Defaults to None.

  • chunk_size (int) – Chunk Size for the stream

  • decode_unicode (bool) – If decode_unicode is True,content will be decoded using the best available encoding based on the response. Defaults to False.

Yields

bytes – Bytes of content.

Raises

ValueError – If chunk size is not multiple of 256 KiB.

to_dict()

Transforms Query object to Query dictionary.

Returns

Query dictionary.

Return type

dict

crux.models.resource module

Module contains Resource model.

class crux.models.resource.MediaType

Bases: enum.Enum

MediaType Enumeration Model.

AVRO = 'avro/binary'
CSV = 'text/csv'
JSON = 'application/json'
NDJSON = 'application/x-ndjson'
PARQUET = 'application/parquet'
detect = <bound method MediaType.detect of <enum 'MediaType'>>
class crux.models.resource.Resource(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.model.CruxModel

Resource Model.

add_label(label_key, label_value)

Adds label to Resource.

Parameters
  • label_key (str) – Label Key for Resource.

  • label_value (str) – Label Value for Resource.

Returns

True if label is added, False otherwise.

Return type

bool

add_labels(labels_dict)

Adds multiple labels to Resource.

Parameters

label_dict (dict) – Labels (key/value pairs) to add to the Resource.

Returns

True if the labels were added, False otherwise.

Return type

bool

add_permission(identity_id, permission)

Adds permission to the resource.

Parameters
  • identity_id – Identity Id to be set.

  • permission – Permission to be set.

Returns

Permission Object.

Return type

crux.models.Permission

as_of

Gets the as_of.

Type

str

config

Gets the config.

Type

str

created_at

Gets created_at.

Type

str

dataset_id

Gets the Dataset ID.

Type

str

delete()

Deletes Resource from Dataset.

Returns

True if it is deleted.

Return type

bool

delete_label(label_key)

Deletes label from Resource.

Parameters

label_key (str) – Label Key for Resource.

Returns

True if label is deleted, False otherwise.

Return type

bool

delete_permission(identity_id, permission)

Deletes permission from the resource.

Parameters
  • identity_id (str) – Identity Id for the deletion.

  • permission (str) – Permission for the deletion.

Returns

True if it is able to delete it.

Return type

bool

description

Gets the Resource Description.

Type

str

folder

Compute or Get the folder name.

Type

str

folder_id

Gets the Folder ID.

Type

str

classmethod from_dict(a_dict)

Transforms Resource Dictionary to Resource object.

Parameters

a_dict (dict) – Resource Dictionary.

Returns

Resource Object.

Return type

crux.models.Resource

id

Gets the Resource ID.

Type

str

labels

Gets the Resource labels.

Type

dict

list_permissions()

Lists the permission on the resource.

Returns

List of Permission Objects.

Return type

list (crux.models.Permission)

media_type

Gets the Resource Description.

Type

str

modified_at

Gets modified_at.

Type

str

name

Gets the Resource Name.

Type

str

path

Compute or Get the resource path.

Type

str

provenance

Gets the Provenance.

Type

str

refresh()

Refresh Resource model from API backend.

Returns

True, if it is able to refresh the model,

False otherwise.

Return type

bool

size

Gets the size.

Type

int

storage_id

Gets the Storage ID.

Type

str

tags

Gets the Resource Tags.

Type

list of str

to_dict()

Transforms Resource object to Resource Dictionary.

Returns

Resource Dictionary.

Return type

dict

type

Gets the Resource Type.

Type

str

update(name=None, description=None, tags=None, provenance=None)

Updates the metadata for Resource.

Parameters
  • name (str) – Name of resource. Defaults to None.

  • description (str) – Description of the resource. Defaults to None.

  • tags (list of str) – List of tags. Defaults to None.

  • provenance (str) – Provenance for a resource. Defaults to None.

Returns

True, if resource is updated.

Return type

bool

Raises
  • ValueError – It is raised if name, description or tags are unset.

  • TypeError – It is raised if tags are not of type List.

crux.models.table module

Module contains Table model.

class crux.models.table.Table(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Table model.

download(dest, media_type, chunk_size=10485760)

Downloads the table resource.

Parameters
  • dest (str or file) – Local OS path at which file resource will be downloaded.

  • media_type (str) – Content Type for download.

  • chunk_size (int) – Number of bytes to be read in memory.

Returns

True if it is downloaded.

Return type

bool

Raises

TypeError – If dest is not a file like or string type.

to_dict()

Transforms Table object to Table Dictionary.

Returns

Table Dictionary.

Return type

dict

Module contents

Module containing models that represent objects returned by the API.

class crux.models.Identity(identity_id=None, parent_identity_id=None, description=None, company_name=None, first_name=None, last_name=None, role=None, phone=None, email=None, type=None, website=None, landing_page=None, connection=None, raw_response=None)

Bases: crux.models.model.CruxModel

Identity Model.

company_name

Gets the Company name.

Type

str

description

Gets the Description.

Type

str

email

Gets the Email.

Type

str

first_name

Gets the First name.

Type

str

classmethod from_dict(a_dict)

Transforms Identity Dictionary to Identity object.

Parameters

a_dict (dict) – Identity Dictionary.

Returns

Identity Object.

Return type

crux.models.Identity

identity_id

Gets the Identity Id.

Type

str

landing_page

Gets the Landing Page.

Type

str

last_name

Gets the Last name.

Type

str

parent_identity_id

Gets the Parent Identity Id.

Type

str

phone

Gets the phone.

Type

str

role

Gets the Role.

Type

str

to_dict()

Transforms Identity object to Identity Dictionary.

Returns

Identity Dictionary.

Return type

dict

type

Gets the Type.

Type

str

website

Gets the Website.

Type

str

class crux.models.Permission(target_id=None, identity_id=None, permission_name=None)

Bases: crux.models.model.CruxModel

Permission Model.

classmethod from_dict(a_dict)

Transforms Dataset Dictionary to Dataset object.

Parameters

a_dict (dict) – Dataset Dictionary.

Returns

Permission Object.

Return type

crux.models.Permission

identity_id

Gets the Identity ID.

Type

str

permission_name

Gets the Permission Name.

Type

str

target_id

Gets the Target ID.

Type

str

to_dict()

Transforms Dataset object to Dataset Dictionary.

Returns

Dataset Dictionary.

Return type

dict

class crux.models.LoadJob(job_id=None, job_url=None)

Bases: crux.models.job.AbstractJob

LoadJob Model.

classmethod from_dict(a_dict)

Transforms LoadJob Dictionary to LoadJob object.

Parameters

a_dict (dict) – LoadJob Dictionary.

Returns

LoadJob Object.

Return type

crux.models.LoadJob

job_id

Gets the Job Id.

Type

str

job_url

Gets the Job URL.

Type

str

class crux.models.StitchJob(job_id=None, status=None)

Bases: crux.models.job.AbstractJob

Stitch Job Model.

classmethod from_dict(a_dict)

Transforms Stitch Job Dictionary to Stitch Job object.

Parameters

a_dict (dict) – Stitch Job Dictionary.

Returns

Stitch Job Object.

Return type

crux.models.job.StitchJob

class crux.models.Job(job_id=None, status=None, statistics=None, connection=None)

Bases: crux.models.job.AbstractJob

Job Model.

classmethod from_dict(a_dict)

Transforms Job Dictionary to Job object.

Parameters

a_dict (dict) – Job Dictionary.

Returns

Job Object.

Return type

crux.models.Job

class crux.models.Resource(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.model.CruxModel

Resource Model.

add_label(label_key, label_value)

Adds label to Resource.

Parameters
  • label_key (str) – Label Key for Resource.

  • label_value (str) – Label Value for Resource.

Returns

True if label is added, False otherwise.

Return type

bool

add_labels(labels_dict)

Adds multiple labels to Resource.

Parameters

label_dict (dict) – Labels (key/value pairs) to add to the Resource.

Returns

True if the labels were added, False otherwise.

Return type

bool

add_permission(identity_id, permission)

Adds permission to the resource.

Parameters
  • identity_id – Identity Id to be set.

  • permission – Permission to be set.

Returns

Permission Object.

Return type

crux.models.Permission

as_of

Gets the as_of.

Type

str

config

Gets the config.

Type

str

created_at

Gets created_at.

Type

str

dataset_id

Gets the Dataset ID.

Type

str

delete()

Deletes Resource from Dataset.

Returns

True if it is deleted.

Return type

bool

delete_label(label_key)

Deletes label from Resource.

Parameters

label_key (str) – Label Key for Resource.

Returns

True if label is deleted, False otherwise.

Return type

bool

delete_permission(identity_id, permission)

Deletes permission from the resource.

Parameters
  • identity_id (str) – Identity Id for the deletion.

  • permission (str) – Permission for the deletion.

Returns

True if it is able to delete it.

Return type

bool

description

Gets the Resource Description.

Type

str

folder

Compute or Get the folder name.

Type

str

folder_id

Gets the Folder ID.

Type

str

classmethod from_dict(a_dict)

Transforms Resource Dictionary to Resource object.

Parameters

a_dict (dict) – Resource Dictionary.

Returns

Resource Object.

Return type

crux.models.Resource

id

Gets the Resource ID.

Type

str

labels

Gets the Resource labels.

Type

dict

list_permissions()

Lists the permission on the resource.

Returns

List of Permission Objects.

Return type

list (crux.models.Permission)

media_type

Gets the Resource Description.

Type

str

modified_at

Gets modified_at.

Type

str

name

Gets the Resource Name.

Type

str

path

Compute or Get the resource path.

Type

str

provenance

Gets the Provenance.

Type

str

refresh()

Refresh Resource model from API backend.

Returns

True, if it is able to refresh the model,

False otherwise.

Return type

bool

size

Gets the size.

Type

int

storage_id

Gets the Storage ID.

Type

str

tags

Gets the Resource Tags.

Type

list of str

to_dict()

Transforms Resource object to Resource Dictionary.

Returns

Resource Dictionary.

Return type

dict

type

Gets the Resource Type.

Type

str

update(name=None, description=None, tags=None, provenance=None)

Updates the metadata for Resource.

Parameters
  • name (str) – Name of resource. Defaults to None.

  • description (str) – Description of the resource. Defaults to None.

  • tags (list of str) – List of tags. Defaults to None.

  • provenance (str) – Provenance for a resource. Defaults to None.

Returns

True, if resource is updated.

Return type

bool

Raises
  • ValueError – It is raised if name, description or tags are unset.

  • TypeError – It is raised if tags are not of type List.

class crux.models.File(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

File Model.

download(dest, chunk_size=10485760)

Downloads the file resource.

Parameters
  • dest (str or file) – Local OS path at which file resource will be downloaded.

  • chunk_size (int) – Number of bytes to be read in memory.

Returns

True if it is downloaded.

Return type

bool

Raises

TypeError – If dest is not a file like or string type.

iter_content(chunk_size=10485760)

Streams the file resource.

Parameters

chunk_size (int) – Chunk Size for the stream.

Yields

bytes – Bytes of file resource.

Raises

ValueError – If chunk_size is not multiple of 256 KiB.

to_dict()

Transforms File object to File Dictionary.

Returns

File Dictionary.

Return type

dict

upload(src, media_type=None)

Uploads the content to empty file resource.

Parameters
  • src (str or file) – Local OS path whose content is to be uploaded.

  • media_type (str) – Content type of the file. Defaults to None.

Returns

File: File model object.

Raises

TypeError – If src type is invalid.

class crux.models.Folder(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Folder Model.

add_permission(identity_id, permission, recursive=False)

Adds permission to the Folder resource.

Parameters
  • identity_id (str) – Identity Id to be set.

  • permission (str) – Permission to be set.

  • recursive (bool) – If recursive is set to True, it will recursive apply permission to all resources under the folder resource.

Returns

If recursive is set then it returns True.

If recursive is unset then it returns Permission object.

Return type

bool or crux.models.Permission

delete_permission(identity_id, permission, recursive=False)

Deletes permission from Folder resource.

Parameters
  • identity_id (str) – Identity Id for the deletion.

  • permission (str) – Permission for deletion.

  • recursive (bool) – If recursive is set to True, it will recursively delete permission from all resources under the folder resource. Defaults to False.

Returns

True if it is able to delete it.

Return type

bool

to_dict()

Transforms Folder object to Folder Dictionary.

Returns

Folder Dictionary.

Return type

dict

class crux.models.Table(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Table model.

download(dest, media_type, chunk_size=10485760)

Downloads the table resource.

Parameters
  • dest (str or file) – Local OS path at which file resource will be downloaded.

  • media_type (str) – Content Type for download.

  • chunk_size (int) – Number of bytes to be read in memory.

Returns

True if it is downloaded.

Return type

bool

Raises

TypeError – If dest is not a file like or string type.

to_dict()

Transforms Table object to Table Dictionary.

Returns

Table Dictionary.

Return type

dict

class crux.models.Dataset(id=None, owner_identity_id=None, contact_identity_id=None, name=None, description=None, website=None, created_at=None, modified_at=None, connection=None, raw_response=None, tags=None)

Bases: crux.models.model.CruxModel

Dataset Model.

add_label(label_key, label_value)

Adds label to Dataset.

Parameters
  • label_key (str) – Label Key for Dataset.

  • label_value (str) – Label Value for Dataset.

Returns

True if labels are added.

Return type

bool

add_permission(identity_id, permission)

Adds permission to the Dataset.

Parameters
  • identity_id – Identity Id to be set.

  • permission – Permission to be set.

Returns

Permission Object.

Return type

crux.models.Permission

add_permission_to_resources(identity_id, permission, resource_paths=None, resource_objects=None, resource_ids=None)

Adds permission to all or specific Dataset resources.

Parameters
  • identity_id (str) – Identity Id to be set.

  • permission (str) – Permission to be set.

  • resource_paths (list of str) – List of resource paths on which the permission should be applied. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.

  • resource_objects (list of crux.models.Resource) – List of resource objects on which the permission should be applied. Overrides resource_paths. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.

  • resource_ids (list of str) – List of resource ids on which permission should be applied. Overrides resource_pathss and resource_objects. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will apply the permission to whole dataset.

Returns

True if permission is applied.

Return type

bool

contact_identity_id

Gets the Contact Identity ID.

Type

str

create_file(path, tags=None, description=None)

Creates File resource in Dataset.

Parameters
  • path (str) – Path of the file resource.

  • tags (list of str) – Tags of the file resource. Defaults to None.

  • description (str) – Description of the file resource. Defaults to None.

Returns

File Object.

Return type

crux.models.File

create_folder(path, folder='/', tags=None, description=None)

Creates Folder resource in Dataset.

Parameters
  • path (str) – Path of the Folder resource.

  • folder (str) – Parent folder of the Folder resource. Defaults to /.

  • tags (list of str) – Tags of the Folder resource. Defaults to None.

  • description (str) – Description of the Folder resource. Defaults to None.

Returns

Folder Object.

Return type

crux.models.Folder

create_query(path, config, tags=None, description=None)

Creates Query resource in Dataset.

Parameters
  • path (str) – Query resource Path.

  • config (dict) – Query configuration.

  • tags (list of str) – Tags of the Query resource. Defaults to None.

  • description (str) – Description of the Query resource. Defaults to None.

Returns

Query Object.

Return type

crux.models.Query

create_table(path, config, tags=None, description=None)

Creates Table resource in Dataset.

Parameters
  • path (str) – Table resource Path.

  • config (dict) – Table Schema Configuration.

  • tags (list of str) – Tags of the Table resource. Defaults to None.

  • description (str) – Description of the Table resource. Defaults to None.

Returns

Table Object

Return type

crux.models.Table

created_at

Gets the Dataset created_at.

Type

str

delete()

Deletes the dataset.

Returns

True if dataset is deleted.

Return type

bool

delete_label(label_key)

Deletes label from Dataset.

Parameters

label_key (str) – Label Key for Dataset.

Returns

True if labels are deleted.

Return type

bool

delete_permission(identity_id, permission)

Deletes permission from the Dataset.

Parameters
  • identity_id (str) – Identity Id for the deletion.

  • permission (str) – Permission for the deletion.

Returns

True if it is able to delete it.

Return type

bool

delete_permission_from_resources(identity_id, permission, resource_paths=None, resource_objects=None, resource_ids=None)

Method which deletes permission from all or specific Dataset resources.

Parameters
  • identity_id (str) – Identity Id for the deletion.

  • permission (str) – Permission for the deletion.

  • resource_paths (list of crux.models.Resource) – List of resource path from which the permission should be deleted. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.

  • resource_objects (list of crux.models.Resource) – List of resource objects from which the permission should be deleted. Overrides resource_paths. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.

  • resource_ids (list of crux.models.Resource) – List of resource ids from which the permission should be deleted. Overrides resource_paths and resource_objects. If none of resource_paths, resource_objects or resource_ids parameter is set, then it will delete the permission from whole dataset.

Returns

True if it is able to delete the permission.

Return type

bool

description

Gets the Dataset Description.

Type

str

download_files(folder, local_path)

Downloads the resources recursively.

Parameters
  • folder (str) – Crux Dataset Folder from where the file resources should be recursively downloaded.

  • local_path (str) – Local OS Path where the file resources should be downloaded.

Returns

List of location of download files.

Return type

list (str)

Raises
  • ValueError – If Folder or local_path is None.

  • OSError – If local_path is an invalid directory location.

find_resources_by_label(predicates, max_per_page=1000)

Method which searches the resouces for given labels in Dataset

Each predicate can be either:

  • Lexicographical equal

  • Lexicographical less than

  • Lexicographical less than or equal to

  • Lexicographical greater than

  • Lexicographical greater than or equal to

  • A list of OR predicates

  • A list of AND predicates

predicates = [
    {"op": "eq", "key": "key1", "val": "abcd"},
    {"op": "ne", "key": "key1", "val": "zzzz"},
    {"op": "lt", "key": "key1", "val": "abd"},
    {"op": "gt", "key": "key1", "val": "abc"},
    {"op": "lte", "key": "key1", "val": "abd"},
    {"op": "gte", "key": "key1", "val": "abc"},
    {"op": "or", "in":
        [
            {"op": "eq", "key": "key1", "val": "abcd"},
            # more OR predicates...
        ]
    },
    {"op": "and", "in":
        [
            {"op": "eq", "key": "key1", "val": "abcd"},
            # more AND predicates...
        ]
    }
]
Parameters
  • predicates (list of dict) – List of dictionary predicates for finding resources.

  • max_per_page (int) – Pagination limit. Defaults to 1000.

Returns

List of resource matching the query parameters.

Return type

list (crux.models.Resource)

Example

from crux import Crux

conn = Crux()
dataset_object = conn.get_dataset(id="dataset_id")
predicates=[
    {"op":"eq","key":"test_label1","val":"test_value1"}
]
resource_objects = dataset_object.find_resources_by_label(
    predicates=predicates
)
classmethod from_dict(a_dict)

Transforms Dataset Dictionary to Dataset object.

Parameters

a_dict (dict) – Dataset Dictionary.

Returns

Dataset Object.

Return type

crux.models.Dataset

get_file(path)

Gets the File resource object.

Parameters

path (str) – File resource path.

Returns

File Object.

Return type

crux.models.File

get_folder(path)

Gets the Folder resource object.

Parameters

path (str) – Folder resource path.

Returns

Folder Object.

Return type

crux.models.Folder

get_label(label_key)

Gets label value of Dataset.

Parameters

label_key (str) – Label Key for Dataset.

Returns

Label Object.

Return type

crux.models.Label

get_query(path)

Gets the Query resource object.

Parameters

path (str) – Query resource path.

Returns

Query Object.

Return type

crux.models.Query

get_stitch_job(job_id)

Stitch Job Details.

Parameters

job_id (str) – Job ID of the Stitch Job.

Returns

StitchJob object.

Return type

crux.models.StitchJob

get_table(path)

Method which gets the Table resource

Parameters

path – Table resource path

Returns

Table Object

Return type

crux.models.Table

id

Gets the Dataset ID.

Type

str

list_files(sort=None, folder='/', offset=0, limit=100)

Lists the files.

Parameters
  • sort (str) – Sets whether to sort or not. Defaults to None.

  • folder (str) – Folder for which resource should be listed. Defaults to /.

  • offset (int) – Sets the offset. Defaults to 0.

  • limit (int) – Sets the limit. Defaults to 100.

Returns

List of File objects.

Return type

list (crux.models.File)

list_permissions()

Lists the permission on the Dataset.

Returns

List of Permission Objects.

Return type

list (crux.models.Permission)

list_resources(folder='/', offset=0, limit=1, include_folders=False, sort=None)

Lists the resources in Dataset.

Parameters
  • folder (str) – Folder for which resource should be listed. Defaults to /.

  • offset (int) – Sets the offset. Defaults to 0.

  • limit (int) – Sets the limit. Defaults to 1.

  • include_folders (bool) – Sets whether to include folders or not. Defaults to False.

  • sort (str) – Sets whether to sort or not. Defaults to None.

Returns

List of File resource objects.

Return type

list (crux.models.Resource)

load_table_from_file(source_file, dest_table, append=False)

Loads table from file resource.

Parameters
  • source_file (str or file) – Source File Path in string or File Object.

  • dest_table (str or crux.models.Table) – Destination File Path in string or Table Object.

  • append (bool) – Sets whether to append to existing table. Defaults to False.

Returns

LoadJob Object.

Return type

crux.models.LoadJob

Raises

TypeError – If source_file or dest_table is not file or string object.

modified_at

Gets the Dataset modified_at.

Type

str

name

Gets the Dataset Name.

Type

str

owner_identity_id

Gets the Owner Identity ID.

Type

str

provenance

Compute or Get the provenance.

Type

str

stitch(source_resources, destination_resource, labels=None, tags=None, description=None)

Method which stitches multiple Avro resources into single Avro resource

Parameters
  • source_resources (list of str or file) – List of resource paths which are to be stitched.

  • destination_resource (str) – Resource Path to load the stitched output

  • labels (dict) – Key/Value labels that should be applied to stitched resource

  • tags (list of str) – List of tags to be applied on destination resource. Taken into consideration if resource is required to be created.

  • description (str) – Description to be applied created destination. Taken into consideration if resource is required to be created.

Returns

File object of destination resource.

Job ID for background running job.

Return type

tuple (crux.models.File, str)

tags

Gets the tags.

Type

str

to_dict()

Transforms Dataset object to Dataset Dictionary.

Returns

Dataset Dictionary.

Return type

dict

update(name=None, description=None, tags=None)

Updates the metadata of dataset.

Parameters
  • name (str) – Name of the dataset. Defaults to None.

  • description (str) – Description of the dataset. Defaults to None.

  • tags (list of str) – List of tags. Defaults to None.

Returns

True, if dataset is updated.

Return type

bool

Raises
  • ValueError – It is raised if name, description or tags are unset.

  • TypeError – It is raised if tags is not of type list.

upload_file(src, dest, media_type=None, description=None, tags=None)

Uploads the File.

Parameters
  • src (str or file) – Local OS path whose content is to be uploaded to file resource.

  • dest (str) – File resource path.

  • media_type (str) – Content type of the file. Defaults to None.

  • description (str) – Description of the file. Defaults to None.

  • tags (list of str) – Tags to be attached to the file resource.

Returns

File Object.

Return type

crux.models.File

upload_files(local_path, folder, media_type=None, description=None, tags=None)

Uploads the resources recursively.

Parameters
  • local_path (str) – Local OS Path from where the file resources should be uploaded.

  • media_type (str) – Content Types of File resources to be uploaded. Defaults to None.

  • folder (str) – Crux Dataset Folder where file resources should be recursively uploaded.

  • description (str) – Description to be set on uploaded resources. Defaults to None.

  • tags (list of str) – Tags to be set on uploaded resources. Defaults to None.

Returns

List of uploaded file objects.

Return type

list (crux.models.File)

Raises
  • ValueError – If folder or local_path is None.

  • OSError – If local_path is an invalid directory location.

upload_query(sql_file, path, description=None, tags=None)

Uploads the Query File.

Parameters
  • path (str) – Query resource path.

  • sql_file (str) – Local OS SQL file to be uploaded as query resource.

  • description (str) – Description for the Query resource. Defaults to None.

  • tags (list of str) – Tags for the Query resource. Defaults to None.

Returns

Query Object.

Return type

crux.models.Query

website

Gets the Dataset Website.

Type

str

class crux.models.Query(id=None, dataset_id=None, folder_id=None, folder=None, name=None, size=None, type=None, config=None, provenance=None, as_of=None, created_at=None, modified_at=None, storage_id=None, description=None, media_type=None, tags=None, labels=None, connection=None, raw_response=None)

Bases: crux.models.resource.Resource

Query Model.

download(dest, format='csv', params=None)

Method which streams the Query

Parameters
  • dest (str) – Local OS path at which resource will be downloaded.

  • media_type (str) – Output format of the query. Defaults to csv.

  • params (dict) – Run parameters. Defaults to None.

Returns

True if it is downloaded.

Return type

bool

run(format='csv', params=None, chunk_size=10485760, decode_unicode=False)

Method which streams the Query

Parameters
  • format (str) – Output format of the query. Defaults to csv.

  • params (dict) – Run parameters. Defaults to None.

  • chunk_size (int) – Chunk Size for the stream

  • decode_unicode (bool) – If decode_unicode is True,content will be decoded using the best available encoding based on the response. Defaults to False.

Yields

bytes – Bytes of content.

Raises

ValueError – If chunk size is not multiple of 256 KiB.

to_dict()

Transforms Query object to Query dictionary.

Returns

Query dictionary.

Return type

dict

class crux.models.Label(label_key=None, label_value=None)

Bases: crux.models.model.CruxModel

Label Model.

classmethod from_dict(a_dict)

Transforms Label Dictionary to Label object.

Parameters

a_dict (dict) – Label Dictionary.

Returns

Label Object.

Return type

crux.models.Label

to_dict()

Transforms Label object to Label Dictionary.

Returns

Label Dictionary.

Return type

dict