kartothek.api.discover module

Tooling to quickly discover datasets in a given blob store.

kartothek.api.discover.discover_cube(uuid_prefix: str, store: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]], filter_ktk_cube_dataset_ids: Optional[Union[Iterable[str], str]] = None)Tuple[kartothek.core.cube.cube.Cube, Dict[str, kartothek.core.dataset.DatasetMetadata]][source]

Recover cube information from store.

Parameters
  • uuid_prefix – Dataset UUID prefix.

  • store – KV store.

  • filter_ktk_cube_dataset_ids – Optional selection of datasets to include.

Returns

  • cube (Cube) – Cube specification.

  • datasets (Dict[str, DatasetMetadata]) – All discovered datasets.

kartothek.api.discover.discover_datasets(cube: kartothek.core.cube.cube.Cube, store: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]], filter_ktk_cube_dataset_ids: Optional[Union[Iterable[str], str]] = None)Dict[str, kartothek.core.dataset.DatasetMetadata][source]

Get all known datasets that belong to a give cube.

Parameters
  • cube – Cube specification.

  • store – KV store.

  • filter_ktk_cube_dataset_ids – Optional selection of datasets to include.

Returns

datasets – All discovered datasets.

Return type

Dict[str, DatasetMetadata]

Raises

ValueError – In case no valid cube could be discovered.

kartothek.api.discover.discover_datasets_unchecked(uuid_prefix: str, store: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]], filter_ktk_cube_dataset_ids: Optional[Union[Iterable[str], str]] = None)Dict[str, kartothek.core.dataset.DatasetMetadata][source]

Get all known datasets that may belong to a give cube w/o applying any checks.

Warning

The results are not checked for validity. Found datasets may be incompatible w/ the given cube. Use check_datasets() to check the results, or go for discover_datasets() in the first place.

Parameters
  • uuid_prefix – Dataset UUID prefix.

  • store – KV store.

  • filter_ktk_cube_dataset_ids – Optional selection of datasets to include.

Returns

datasets – All discovered datasets. Empty Dict if no dataset is found

Return type

Dict[str, DatasetMetadata]

kartothek.api.discover.discover_ktk_cube_dataset_ids(uuid_prefix: str, store: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]])Set[str][source]

Get ktk_cube dataset ids for all datasets.

Parameters
  • uuid_prefix – Dataset UUID prefix.

  • store – KV store.

Returns

names – The ktk_cube dataset ids

Return type

Set[str]