kartothek.core.factory module¶
-
class
kartothek.core.factory.
DatasetFactory
(dataset_uuid: str, store_factory: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]], load_schema: bool = True, load_all_indices: bool = False, load_dataset_metadata: bool = True)[source]¶ Bases:
kartothek.core.dataset.DatasetMetadataBase
Container holding metadata caching storage access.
-
property
dataset_metadata
¶
-
load_all_indices
(store: Any = None, load_partition_indices: bool = True) → T[source]¶ Load all registered indices into memory.
Note: External indices need to be preloaded before they can be queried.
- Parameters
store – Object that implements the .get method for file/object loading.
load_partition_indices – Flag if filename indices should be loaded. Default is True.
- Returns
dataset_metadata – Mutated metadata object with the loaded indices.
- Return type
-
load_index
(column, store=None) → T[source]¶ Load an index into memory.
Note: External indices need to be preloaded before they can be queried.
- Parameters
column – Name of the column for which the index should be loaded.
store – Object that implements the .get method for file/object loading.
- Returns
dataset_metadata – Mutated metadata object with the loaded index.
- Return type
-
load_partition_indices
() → T[source]¶ Load all filename encoded indices into RAM. File encoded indices can be extracted from datasets with partitions stored in a format like
`dataset_uuid/table/IndexCol=IndexValue/SecondIndexCol=Value/partition_label.parquet`
Which results in an in-memory index holding the information
{ "IndexCol": { IndexValue: ["partition_label"] }, "SecondIndexCol": { Value: ["partition_label"] } }
Deprecated since version 5.3: This will be removed in 6.0. The load_partition_indices keyword is deprecated and will be removed.
-
property
store
¶
-
property