kartothek.core.cube.cube module¶

class kartothek.core.cube.cube.Cube(dimension_columns: Optional[Union[Iterable[str], str]], partition_columns: Optional[Union[Iterable[str], str]], uuid_prefix, seed_dataset='seed', index_columns=None, suppress_index_on=None)[source]¶

Bases: object

OLAP-like cube that fuses multiple datasets.

Parameters

dimension_columns (Tuple[str, ..]) – Columns that span dimensions. This will imply index columns for the seed dataset, unless the automatic index creation is suppressed via suppress_index_on.
partition_columns (Tuple[str, ..]) – Columns that are used to partition the data. They also create (implicit) primary indices.
uuid_prefix (str) – All datasets that are part of the cube will have UUIDs of form 'uuid_prefix++ktk_cube_dataset_id'.
seed_dataset (str) – Dataset that present the ground-truth regarding cells present in the cube.
index_columns (Tuple[str, ..]) – Columns for which secondary indices will be created. They may also be part of non-seed datasets.
suppress_index_on (Tuple[str, ..]) – Suppress auto-creation of an index on the given dimension columns. Must be a subset of dimension_columns (other columns are not subject to automatic index creation).

copy(**kwargs)[source]¶

Create a new cube specification w/ changed attributes.

This will not trigger any IO operation, but only affects the cube specification.

Parameters: kwargs (Dict[str, Any]) – Attributes that should be changed.
Returns: cube – New abstract cube.
Return type: Cube

ktk_dataset_uuid(ktk_cube_dataset_id)[source]¶

Get Kartothek dataset UUID for given dataset UUID, so the prefix is included.

Parameters: ktk_cube_dataset_id (str) – Dataset ID w/o prefix
Returns: ktk_dataset_uuid – Prefixed dataset UUID for Kartothek.
Return type: str
Raises: ValueError – If ktk_cube_dataset_id is not a string or if it is not a valid UUID.

property ktk_index_columns¶: Set of all available index columns through Kartothek, primary and secondary.