kartothek.core.cube.cube module¶
-
class
kartothek.core.cube.cube.
Cube
(dimension_columns: Optional[Union[Iterable[str], str]], partition_columns: Optional[Union[Iterable[str], str]], uuid_prefix, seed_dataset='seed', index_columns=None, suppress_index_on=None)[source]¶ Bases:
object
OLAP-like cube that fuses multiple datasets.
- Parameters
dimension_columns (Tuple[str, ..]) – Columns that span dimensions. This will imply index columns for the seed dataset, unless the automatic index creation is suppressed via
suppress_index_on
.partition_columns (Tuple[str, ..]) – Columns that are used to partition the data. They also create (implicit) primary indices.
uuid_prefix (str) – All datasets that are part of the cube will have UUIDs of form
'uuid_prefix++ktk_cube_dataset_id'
.seed_dataset (str) – Dataset that present the ground-truth regarding cells present in the cube.
index_columns (Tuple[str, ..]) – Columns for which secondary indices will be created. They may also be part of non-seed datasets.
suppress_index_on (Tuple[str, ..]) – Suppress auto-creation of an index on the given dimension columns. Must be a subset of
dimension_columns
(other columns are not subject to automatic index creation).
-
copy
(**kwargs)[source]¶ Create a new cube specification w/ changed attributes.
This will not trigger any IO operation, but only affects the cube specification.
-
ktk_dataset_uuid
(ktk_cube_dataset_id)[source]¶ Get Kartothek dataset UUID for given dataset UUID, so the prefix is included.
- Parameters
ktk_cube_dataset_id (str) – Dataset ID w/o prefix
- Returns
ktk_dataset_uuid – Prefixed dataset UUID for Kartothek.
- Return type
- Raises
ValueError – If
ktk_cube_dataset_id
is not a string or if it is not a valid UUID.
-
property
ktk_index_columns
¶ Set of all available index columns through Kartothek, primary and secondary.