kartothek.api.consistency module¶
Methods to check preserved cube for consistency.
-
kartothek.api.consistency.
check_datasets
(datasets: Dict[str, kartothek.core.dataset.DatasetMetadata], cube: kartothek.core.cube.cube.Cube) → Dict[str, kartothek.core.dataset.DatasetMetadata][source]¶ Apply sanity checks to persisteted Karothek datasets.
The following checks will be applied:
seed dataset present
metadata version correct
only the cube-specific table is present
partition keys are correct
no overlapping payload columns exists
datatypes are consistent
dimension columns are present everywhere
required index structures are present (more are allowed)
PartitionIndex
for every partition keyfor seed dataset,
ExplicitSecondaryIndex
for every dimension columnfor all datasets,
ExplicitSecondaryIndex
for every index column
- Parameters
datasets – Datasets.
cube – Cube specification.
- Returns
datasets – Same as input, but w/ partition indices loaded.
- Return type
Dict[str, DatasetMetadata]
- Raises
ValueError – If sanity check failed.
-
kartothek.api.consistency.
get_cube_payload
(datasets: Dict[str, kartothek.core.dataset.DatasetMetadata], cube: kartothek.core.cube.cube.Cube) → Set[str][source]¶ Get payload columns of the whole cube.
- Parameters
datasets – Datasets.
cube – Cube specification.
- Returns
payload – Payload columns.
- Return type
Set[str]