kartothek.cli package¶
Module contents¶
Kartothek CLI code.
Important
This module does not contain any public APIs.
Kartothek comes with a CLI tool named kartothek_cube
. To use it, create an YAML file that contains a dictionary of storefact
stores (keys are names of the store and the values are dicts that contain the store config). Kartothek
uses a YAML
file called skv.yml
and a store called dataset
by default, but you may pass --skv
and --store
to change
these. An example file could look like:
dataset:
type: hazure
account_name: my_account_name
account_key: foobar
container: my_container
use_sas: False
create_if_missing: False
The CLI uses Dask to parallelize some operations and defaults to the number of CPU cores. You can control the number
of threads using -j
.
In the following section you find a list description of all kartothek_cube
operations.
kartothek_cube¶
Execute certain operations on the given Kartothek cube.
If possible, the operations will be performed in parallel on the current machine.
kartothek_cube [OPTIONS] CUBE COMMAND [ARGS]...
Options
-
--skv
<skv>
¶ Storefact config file.
- Default
skv.yml
-
--store
<store>
¶ Store to use.
- Default
dataset
-
-j
,
--n_threads
<n_threads>
¶ Number of threads to use (use 0 for number of cores).
- Default
0
-
--color
<color>
¶ Whether to use colorized outputs or not. Use
always
,auto
(default), oroff
.- Default
auto
- Options
always | auto | off
Arguments
-
CUBE
¶
Required argument
copy¶
Copy cube from one store to another.
kartothek_cube CUBE copy [OPTIONS]
Options
-
--tgt_store
<tgt_store>
¶ Required Target store to use.
-
--overwrite
,
--no-overwrite
¶
Flags if potentially present cubes in
tgt_store
are overwritten. If--no-overwrite
is given (default) and a cube is already present, the operation will fail.- Default
False
-
--cleanup
,
--no-cleanup
¶
Flags if in case of an overwrite operation, the cube in
tgt_store
will first be removed so no previously tracked files will be present after the copy operation.- Default
True
-
--include
<include>
¶ Comma separated list of dataset-id to be copied. e.g.,
--include enrich,enrich_cl
also supports glob patterns
-
--exclude
<exclude>
¶ Copy all datasets except items in this comma separated list. e.g.,
--exclude enrich,enrich_cl
also supports glob patterns
delete¶
Delete cube from store.
kartothek_cube CUBE delete [OPTIONS]
Options
-
--include
<include>
¶ Comma separated list of dataset-id to be deleted. e.g.,
--include enrich,enrich_cl
also supports glob patterns
-
--exclude
<exclude>
¶ Delete all datasets except items in this comma separated list. e.g.,
--exclude enrich,enrich_cl
also supports glob patterns
index¶
Build index for given columns.
kartothek_cube CUBE index [OPTIONS] DATASET COLUMNS
Arguments
-
DATASET
¶
Required argument
-
COLUMNS
¶
Required argument
stats¶
Collect technical statistic from cube.
kartothek_cube CUBE stats [OPTIONS]
Options
-
--include
<include>
¶ Comma separated list of dataset-id to be scanned. e.g.,
--include enrich,enrich_cl
also supports glob patterns
-
--exclude
<exclude>
¶ Scan all datasets except items in this comma separated list. e.g.,
--exclude enrich,enrich_cl
also supports glob patterns