kartothek.cli package¶
Module contents¶
Kartothek CLI code.
Important
This module does not contain any public APIs.
Kartothek comes with a CLI tool named kartothek_cube. To use it, create an YAML file that contains a dictionary of storefact
stores (keys are names of the store and the values are dicts that contain the store config). Kartothek uses a YAML
file called skv.yml and a store called dataset by default, but you may pass --skv and --store to change
these. An example file could look like:
dataset:
type: hazure
account_name: my_account_name
account_key: foobar
container: my_container
use_sas: False
create_if_missing: False
The CLI uses Dask to parallelize some operations and defaults to the number of CPU cores. You can control the number
of threads using -j.
In the following section you find a list description of all kartothek_cube operations.
kartothek_cube¶
Execute certain operations on the given Kartothek cube.
If possible, the operations will be performed in parallel on the current machine.
kartothek_cube [OPTIONS] CUBE COMMAND [ARGS]...
Options
-
--skv<skv>¶ Storefact config file.
- Default
skv.yml
-
--store<store>¶ Store to use.
- Default
dataset
-
-j,--n_threads<n_threads>¶ Number of threads to use (use 0 for number of cores).
- Default
0
-
--color<color>¶ Whether to use colorized outputs or not. Use
always,auto(default), oroff.- Default
auto
- Options
always | auto | off
Arguments
-
CUBE¶ Required argument
copy¶
Copy cube from one store to another.
kartothek_cube CUBE copy [OPTIONS]
Options
-
--tgt_store<tgt_store>¶ Required Target store to use.
-
--overwrite,--no-overwrite¶ Flags if potentially present cubes in
tgt_storeare overwritten. If--no-overwriteis given (default) and a cube is already present, the operation will fail.- Default
False
-
--cleanup,--no-cleanup¶ Flags if in case of an overwrite operation, the cube in
tgt_storewill first be removed so no previously tracked files will be present after the copy operation.- Default
True
-
--include<include>¶ Comma separated list of dataset-id to be copied. e.g.,
--include enrich,enrich_clalso supports glob patterns
-
--exclude<exclude>¶ Copy all datasets except items in this comma separated list. e.g.,
--exclude enrich,enrich_clalso supports glob patterns
delete¶
Delete cube from store.
kartothek_cube CUBE delete [OPTIONS]
Options
-
--include<include>¶ Comma separated list of dataset-id to be deleted. e.g.,
--include enrich,enrich_clalso supports glob patterns
-
--exclude<exclude>¶ Delete all datasets except items in this comma separated list. e.g.,
--exclude enrich,enrich_clalso supports glob patterns
index¶
Build index for given columns.
kartothek_cube CUBE index [OPTIONS] DATASET COLUMNS
Arguments
-
DATASET¶ Required argument
-
COLUMNS¶ Required argument
stats¶
Collect technical statistic from cube.
kartothek_cube CUBE stats [OPTIONS]
Options
-
--include<include>¶ Comma separated list of dataset-id to be scanned. e.g.,
--include enrich,enrich_clalso supports glob patterns
-
--exclude<exclude>¶ Scan all datasets except items in this comma separated list. e.g.,
--exclude enrich,enrich_clalso supports glob patterns