kartothek.io_components.cube.copy module

kartothek.io_components.cube.copy.get_copy_keys(cube: kartothek.core.cube.cube.Cube, src_store: Union[Callable[], simplekv.KeyValueStore], simplekv.KeyValueStore], tgt_store: Union[Callable[], simplekv.KeyValueStore], simplekv.KeyValueStore], overwrite: bool, datasets: Optional[Union[Iterable[str], Dict[str, kartothek.core.dataset.DatasetMetadata]]] = None)[source]

Get and check keys that should be copied from one store to another.

Parameters
  • cube – Cube specification.

  • src_store – Source KV store.

  • tgt_store – Target KV store.

  • overwrite – If possibly existing datasets in the target store should be overwritten.

  • datasets – Datasets to copy, must all be part of the cube. May be either the result of discover_datasets(), an iterable of Ktk_cube dataset ID or None (in which case entire cube will be copied).

Returns

keys – Set of keys to copy.

Return type

Set[str]

Raises

RuntimeError – In case the copy would not pass successfully or if there is no cube in src_store.:

kartothek.io_components.cube.copy.get_datasets_to_copy(cube: kartothek.core.cube.cube.Cube, src_store: Union[Callable[], simplekv.KeyValueStore], simplekv.KeyValueStore], tgt_store: Union[Callable[], simplekv.KeyValueStore], simplekv.KeyValueStore], overwrite: bool, datasets: Optional[Union[Iterable[str], Dict[str, kartothek.core.dataset.DatasetMetadata]]] = None)Dict[str, kartothek.core.dataset.DatasetMetadata][source]

Determine all dataset names of a given cube that should be copied and apply addtional consistency checks. Copying only a specific set of datasets is possible by providing a list of dataset names via the parameter datasets.

Parameters
  • cube – Cube specification.

  • src_store – Source KV store.

  • tgt_store – Target KV store.

  • overwrite – If possibly existing datasets in the target store should be overwritten.

  • datasets – Datasets to copy, must all be part of the cube. May be either the result of discover_datasets(), an iterable of Ktk_cube dataset ID or None (in which case entire cube will be copied).

Returns

all_datasets – All datasets that should be copied.

Return type

Dict[str, DatasetMetadata]