kartothek.utils.ktk_adapters module¶
Methods to make working with Kartothek easier.
-
kartothek.utils.ktk_adapters.
get_dataset_columns
(dataset)[source]¶ Get columns present in a Kartothek_Cube-compatible Kartothek dataset.
- Parameters
dataset (kartothek.core.dataset.DatasetMetadata) – Dataset to get the columns from.
- Returns
columns – Usable columns.
- Return type
Set[str]
-
kartothek.utils.ktk_adapters.
get_dataset_keys
(dataset)[source]¶ Get store keys that belong to the given Kartothek dataset.
- Parameters
dataset (kartothek.core.dataset.DatasetMetadata) – Datasets to scan for keys.
- Returns
keys – Storage keys.
- Return type
Set[str]
-
kartothek.utils.ktk_adapters.
get_dataset_schema
(dataset)[source]¶ Get schema from a Kartothek_Cube-compatible Kartothek dataset.
- Parameters
dataset (kartothek.core.dataset.DatasetMetadata) – Dataset to get the schema from.
- Returns
schema – Schema data.
- Return type
Deprecated since version 5.3: This will be removed in 6.0. The get_dataset_schema keyword is deprecated and will be removed.
-
kartothek.utils.ktk_adapters.
get_partition_dataframe
(dataset, cube)[source]¶ Create DataFrame that represent the partioning of the dataset.
The row index named
"partition"
include the partition labels, the columns are the physical partition columns.- Parameters
dataset (kartothek.core.dataset.DatasetMetadata) – Dataset to analyze, with partition indices pre-loaded.
cube (kartothek.core.cube.cube.Cube) – Cube spec.
- Returns
df – DataFrame with partition data.
- Return type
-
kartothek.utils.ktk_adapters.
get_physical_partition_stats
(metapartitions, store)[source]¶ Get statistics for partition.
Hint
To get the metapartitions pre-aligned, use
concat_partitions_on_primary_index=True
during dispatch.- Parameters
metapartitions (Iterable[kartothek.io_components.metapartition.MetaPartition]) – Iterable of metapartitions belonging to the same physical partition.
store (Union[simplekv.KeyValueStore, Callable[[], simplekv.KeyValueStore]]) – KV store.
- Returns
stats – Statistics for the current partition.
- Return type
-
kartothek.utils.ktk_adapters.
metadata_factory_from_dataset
(dataset, with_schema=True, store=None)[source]¶ Create
DatasetMetadata
fromDatasetMetadata
.- Parameters
dataset (kartothek.core.dataset.DatasetMetadata) – Already loaded dataset.
with_schema (bool) – If dataset was loaded with
load_schema
.store (Optional[Callable[[], simplekv.KeyValueStore]]) – Optional store factory.
- Returns
factory – Metadata factory w/ caches pre-filled.
- Return type