kartothek.core.factory module

class kartothek.core.factory.DatasetFactory(dataset_uuid: str, store_factory: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]], load_schema: bool = True, load_all_indices: bool = False, load_dataset_metadata: bool = True)[source]

Bases: kartothek.core.dataset.DatasetMetadataBase

Container holding metadata caching storage access.

property dataset_metadata
invalidate()None[source]
load_all_indices(store: Any = None, load_partition_indices: bool = True)T[source]

Load all registered indices into memory.

Note: External indices need to be preloaded before they can be queried.

Parameters
  • store – Object that implements the .get method for file/object loading.

  • load_partition_indices – Flag if filename indices should be loaded. Default is True.

Returns

dataset_metadata – Mutated metadata object with the loaded indices.

Return type

DatasetMetadata

load_index(column, store=None)T[source]

Load an index into memory.

Note: External indices need to be preloaded before they can be queried.

Parameters
  • column – Name of the column for which the index should be loaded.

  • store – Object that implements the .get method for file/object loading.

Returns

dataset_metadata – Mutated metadata object with the loaded index.

Return type

DatasetMetadata

load_partition_indices()T[source]

Load all filename encoded indices into RAM. File encoded indices can be extracted from datasets with partitions stored in a format like

`dataset_uuid/table/IndexCol=IndexValue/SecondIndexCol=Value/partition_label.parquet`

Which results in an in-memory index holding the information

{
    "IndexCol": {
        IndexValue: ["partition_label"]
    },
    "SecondIndexCol": {
        Value: ["partition_label"]
    }
}

Deprecated since version 5.3: This will be removed in 6.0. The load_partition_indices keyword is deprecated and will be removed.

property store