kartothek.core.cube.cube module

class kartothek.core.cube.cube.Cube(dimension_columns: Optional[Union[Iterable[str], str]], partition_columns: Optional[Union[Iterable[str], str]], uuid_prefix, seed_dataset='seed', index_columns=None, suppress_index_on=None)[source]

Bases: object

OLAP-like cube that fuses multiple datasets.

Parameters
  • dimension_columns (Tuple[str, ..]) – Columns that span dimensions. This will imply index columns for the seed dataset, unless the automatic index creation is suppressed via suppress_index_on.

  • partition_columns (Tuple[str, ..]) – Columns that are used to partition the data. They also create (implicit) primary indices.

  • uuid_prefix (str) – All datasets that are part of the cube will have UUIDs of form 'uuid_prefix++ktk_cube_dataset_id'.

  • seed_dataset (str) – Dataset that present the ground-truth regarding cells present in the cube.

  • index_columns (Tuple[str, ..]) – Columns for which secondary indices will be created. They may also be part of non-seed datasets.

  • suppress_index_on (Tuple[str, ..]) – Suppress auto-creation of an index on the given dimension columns. Must be a subset of dimension_columns (other columns are not subject to automatic index creation).

copy(**kwargs)[source]

Create a new cube specification w/ changed attributes.

This will not trigger any IO operation, but only affects the cube specification.

Parameters

kwargs (Dict[str, Any]) – Attributes that should be changed.

Returns

cube – New abstract cube.

Return type

Cube

ktk_dataset_uuid(ktk_cube_dataset_id)[source]

Get Kartothek dataset UUID for given dataset UUID, so the prefix is included.

Parameters

ktk_cube_dataset_id (str) – Dataset ID w/o prefix

Returns

ktk_dataset_uuid – Prefixed dataset UUID for Kartothek.

Return type

str

Raises

ValueError – If ktk_cube_dataset_id is not a string or if it is not a valid UUID.

property ktk_index_columns

Set of all available index columns through Kartothek, primary and secondary.