kartothek.api.discover module¶
Tooling to quickly discover datasets in a given blob store.
-
kartothek.api.discover.
discover_cube
(uuid_prefix: str, store: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]], filter_ktk_cube_dataset_ids: Optional[Union[Iterable[str], str]] = None) → Tuple[kartothek.core.cube.cube.Cube, Dict[str, kartothek.core.dataset.DatasetMetadata]][source]¶ Recover cube information from store.
- Parameters
uuid_prefix – Dataset UUID prefix.
store – KV store.
filter_ktk_cube_dataset_ids – Optional selection of datasets to include.
- Returns
cube (Cube) – Cube specification.
datasets (Dict[str, DatasetMetadata]) – All discovered datasets.
-
kartothek.api.discover.
discover_datasets
(cube: kartothek.core.cube.cube.Cube, store: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]], filter_ktk_cube_dataset_ids: Optional[Union[Iterable[str], str]] = None) → Dict[str, kartothek.core.dataset.DatasetMetadata][source]¶ Get all known datasets that belong to a give cube.
- Parameters
cube – Cube specification.
store – KV store.
filter_ktk_cube_dataset_ids – Optional selection of datasets to include.
- Returns
datasets – All discovered datasets.
- Return type
Dict[str, DatasetMetadata]
- Raises
ValueError – In case no valid cube could be discovered.
-
kartothek.api.discover.
discover_datasets_unchecked
(uuid_prefix: str, store: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]], filter_ktk_cube_dataset_ids: Optional[Union[Iterable[str], str]] = None) → Dict[str, kartothek.core.dataset.DatasetMetadata][source]¶ Get all known datasets that may belong to a give cube w/o applying any checks.
Warning
The results are not checked for validity. Found datasets may be incompatible w/ the given cube. Use
check_datasets()
to check the results, or go fordiscover_datasets()
in the first place.- Parameters
uuid_prefix – Dataset UUID prefix.
store – KV store.
filter_ktk_cube_dataset_ids – Optional selection of datasets to include.
- Returns
datasets – All discovered datasets. Empty Dict if no dataset is found
- Return type
Dict[str, DatasetMetadata]
-
kartothek.api.discover.
discover_ktk_cube_dataset_ids
(uuid_prefix: str, store: Union[str, simplekv.KeyValueStore, Callable[], simplekv.KeyValueStore]]) → Set[str][source]¶ Get ktk_cube dataset ids for all datasets.
- Parameters
uuid_prefix – Dataset UUID prefix.
store – KV store.
- Returns
names – The ktk_cube dataset ids
- Return type
Set[str]