kartothek.io_components.cube.copy module¶
-
kartothek.io_components.cube.copy.
get_copy_keys
(cube: kartothek.core.cube.cube.Cube, src_store: Union[Callable[], simplekv.KeyValueStore], simplekv.KeyValueStore], tgt_store: Union[Callable[], simplekv.KeyValueStore], simplekv.KeyValueStore], overwrite: bool, datasets: Optional[Union[Iterable[str], Dict[str, kartothek.core.dataset.DatasetMetadata]]] = None)[source]¶ Get and check keys that should be copied from one store to another.
- Parameters
cube – Cube specification.
src_store – Source KV store.
tgt_store – Target KV store.
overwrite – If possibly existing datasets in the target store should be overwritten.
datasets – Datasets to copy, must all be part of the cube. May be either the result of
discover_datasets()
, an iterable of Ktk_cube dataset ID orNone
(in which case entire cube will be copied).
- Returns
keys – Set of keys to copy.
- Return type
Set[str]
- Raises
RuntimeError – In case the copy would not pass successfully or if there is no cube in
src_store
.:
-
kartothek.io_components.cube.copy.
get_datasets_to_copy
(cube: kartothek.core.cube.cube.Cube, src_store: Union[Callable[], simplekv.KeyValueStore], simplekv.KeyValueStore], tgt_store: Union[Callable[], simplekv.KeyValueStore], simplekv.KeyValueStore], overwrite: bool, datasets: Optional[Union[Iterable[str], Dict[str, kartothek.core.dataset.DatasetMetadata]]] = None) → Dict[str, kartothek.core.dataset.DatasetMetadata][source]¶ Determine all dataset names of a given cube that should be copied and apply addtional consistency checks. Copying only a specific set of datasets is possible by providing a list of dataset names via the parameter datasets.
- Parameters
cube – Cube specification.
src_store – Source KV store.
tgt_store – Target KV store.
overwrite – If possibly existing datasets in the target store should be overwritten.
datasets – Datasets to copy, must all be part of the cube. May be either the result of
discover_datasets()
, an iterable of Ktk_cube dataset ID orNone
(in which case entire cube will be copied).
- Returns
all_datasets – All datasets that should be copied.
- Return type
Dict[str, DatasetMetadata]