kartothek.cli package

Module contents

Kartothek CLI code.

Important

This module does not contain any public APIs.

Kartothek comes with a CLI tool named kartothek_cube. To use it, create an YAML file that contains a dictionary of storefact stores (keys are names of the store and the values are dicts that contain the store config). Kartothek uses a YAML file called skv.yml and a store called dataset by default, but you may pass --skv and --store to change these. An example file could look like:

dataset:
   type: hazure
   account_name: my_account_name
   account_key: foobar
   container: my_container
   use_sas: False
   create_if_missing: False

The CLI uses Dask to parallelize some operations and defaults to the number of CPU cores. You can control the number of threads using -j.

In the following section you find a list description of all kartothek_cube operations.

kartothek_cube

Execute certain operations on the given Kartothek cube.

If possible, the operations will be performed in parallel on the current machine.

kartothek_cube [OPTIONS] CUBE COMMAND [ARGS]...

Options

--skv <skv>

Storefact config file.

Default

skv.yml

--store <store>

Store to use.

Default

dataset

-j, --n_threads <n_threads>

Number of threads to use (use 0 for number of cores).

Default

0

--color <color>

Whether to use colorized outputs or not. Use always, auto (default), or off.

Default

auto

Options

always | auto | off

Arguments

CUBE

Required argument

cleanup

Remove non-required files from store.

kartothek_cube CUBE cleanup [OPTIONS]

copy

Copy cube from one store to another.

kartothek_cube CUBE copy [OPTIONS]

Options

--tgt_store <tgt_store>

Required Target store to use.

--overwrite, --no-overwrite

Flags if potentially present cubes in tgt_store are overwritten. If --no-overwrite is given (default) and a cube is already present, the operation will fail.

Default

False

--cleanup, --no-cleanup

Flags if in case of an overwrite operation, the cube in tgt_store will first be removed so no previously tracked files will be present after the copy operation.

Default

True

--include <include>

Comma separated list of dataset-id to be copied. e.g., --include enrich,enrich_cl also supports glob patterns

--exclude <exclude>

Copy all datasets except items in this comma separated list. e.g., --exclude enrich,enrich_cl also supports glob patterns

delete

Delete cube from store.

kartothek_cube CUBE delete [OPTIONS]

Options

--include <include>

Comma separated list of dataset-id to be deleted. e.g., --include enrich,enrich_cl also supports glob patterns

--exclude <exclude>

Delete all datasets except items in this comma separated list. e.g., --exclude enrich,enrich_cl also supports glob patterns

index

Build index for given columns.

kartothek_cube CUBE index [OPTIONS] DATASET COLUMNS

Arguments

DATASET

Required argument

COLUMNS

Required argument

info

Show certain infos about the cube.

kartothek_cube CUBE info [OPTIONS]

query

Interactive cube queries into IPython.

kartothek_cube CUBE query [OPTIONS]

stats

Collect technical statistic from cube.

kartothek_cube CUBE stats [OPTIONS]

Options

--include <include>

Comma separated list of dataset-id to be scanned. e.g., --include enrich,enrich_cl also supports glob patterns

--exclude <exclude>

Scan all datasets except items in this comma separated list. e.g., --exclude enrich,enrich_cl also supports glob patterns