kartothek.cli package¶

Module contents¶

Kartothek CLI code.

Important

This module does not contain any public APIs.

Kartothek comes with a CLI tool named kartothek_cube. To use it, create an YAML file that contains a dictionary of storefact stores (keys are names of the store and the values are dicts that contain the store config). Kartothek uses a YAML file called skv.yml and a store called dataset by default, but you may pass --skv and --store to change these. An example file could look like:

dataset:
   type: hazure
   account_name: my_account_name
   account_key: foobar
   container: my_container
   use_sas: False
   create_if_missing: False

The CLI uses Dask to parallelize some operations and defaults to the number of CPU cores. You can control the number of threads using -j.

In the following section you find a list description of all kartothek_cube operations.

kartothek_cube¶

Execute certain operations on the given Kartothek cube.

If possible, the operations will be performed in parallel on the current machine.

kartothek_cube [OPTIONS] CUBE COMMAND [ARGS]...

Options

--skv <skv>¶

Storefact config file.

Default: skv.yml

--store <store>¶

Store to use.

Default: dataset

-j, --n_threads <n_threads>¶

Number of threads to use (use 0 for number of cores).

Default: 0

--color <color>¶

Whether to use colorized outputs or not. Use always, auto (default), or off.

Default: auto
Options: always | auto | off

Arguments

CUBE¶: Required argument

cleanup¶

Remove non-required files from store.

kartothek_cube CUBE cleanup [OPTIONS]

copy¶

Copy cube from one store to another.

kartothek_cube CUBE copy [OPTIONS]

Options

--tgt_store <tgt_store>¶: Required Target store to use.

--overwrite, --no-overwrite¶

Flags if potentially present cubes in tgt_store are overwritten. If --no-overwrite is given (default) and a cube is already present, the operation will fail.

Default: False

--cleanup, --no-cleanup¶

Flags if in case of an overwrite operation, the cube in tgt_store will first be removed so no previously tracked files will be present after the copy operation.

Default: True

--include <include>¶: Comma separated list of dataset-id to be copied. e.g., --include enrich,enrich_cl also supports glob patterns

--exclude <exclude>¶: Copy all datasets except items in this comma separated list. e.g., --exclude enrich,enrich_cl also supports glob patterns

delete¶

Delete cube from store.

kartothek_cube CUBE delete [OPTIONS]

Options

--include <include>¶: Comma separated list of dataset-id to be deleted. e.g., --include enrich,enrich_cl also supports glob patterns

--exclude <exclude>¶: Delete all datasets except items in this comma separated list. e.g., --exclude enrich,enrich_cl also supports glob patterns

index¶

Build index for given columns.

kartothek_cube CUBE index [OPTIONS] DATASET COLUMNS

Arguments

DATASET¶: Required argument

COLUMNS¶: Required argument

info¶

Show certain infos about the cube.

kartothek_cube CUBE info [OPTIONS]

query¶

Interactive cube queries into IPython.

kartothek_cube CUBE query [OPTIONS]

stats¶

Collect technical statistic from cube.

kartothek_cube CUBE stats [OPTIONS]

Options

--include <include>¶: Comma separated list of dataset-id to be scanned. e.g., --include enrich,enrich_cl also supports glob patterns

--exclude <exclude>¶: Scan all datasets except items in this comma separated list. e.g., --exclude enrich,enrich_cl also supports glob patterns