Command Line FeaturesΒΆ

Kartothek Cube also features a command line interface (CLI) for some cube operations. To use it, create a skv.yml file that describes storefact stores:

dataset:
   type: hfs
   path: path/to/data

Now use the kartothek_cube command to gather certain cube information:

kartothek_cube geodata info
Infos
UUID Prefix:        geodata
Dimension Columns:
  - city: string
  - day: timestamp[ns]
Partition Columns:
  - country: string
Index Columns:
 
Seed Dataset:      seed
 
Dataset: latlong
Partition Keys:
  - country: string
Partitions: 4
Metadata:
  {
    "creation_time": "2019-10-01T12:11:38.263496",
    "klee_dimension_columns": [
      "city",
      "day"
    ],
    "klee_is_seed": false,
    "klee_partition_columns": [
      "country"
    ]
  }
Dimension Columns:
  - city: string
Payload Columns:
  - latitude: double
  - longitude: double
 
Dataset: seed
Partition Keys:
  - country: string
Partitions: 3
Metadata:
  {
    "creation_time": "2019-10-01T12:11:38.206653",
    "klee_dimension_columns": [
      "city",
      "day"
    ],
    "klee_is_seed": true,
    "klee_partition_columns": [
      "country"
    ]
  }
Dimension Columns:
  - city: string
  - day: timestamp[ns]
Payload Columns:
  - avg_temp: int64
 
Dataset: time
Partitions: 1
Metadata:
  {
    "creation_time": "2019-10-01T12:11:41.734913",
    "klee_dimension_columns": [
      "city",
      "day"
    ],
    "klee_is_seed": false,
    "klee_partition_columns": [
      "country"
    ]
  }
Dimension Columns:
  - day: timestamp[ns]
Payload Columns:
  - month: int64
  - weekday: int64
  - year: int64

Some information is not available when reading the schema information and require a cube scan:

kartothek_cube geodata stats
[########################################] | 100% Completed |  0.1s
latlong
blobsize:  5,690
files:  4
partitions:  4
rows:  4
 
seed
blobsize:  4,589
files:  3
partitions:  3
rows:  8
 
time
blobsize:  3,958
files:  1
partitions:  1
rows:  366
 
__total__
blobsize:  14,237
files:  8

Use kartothek_cube --help to get a list of all commands, or see kartothek_cube.cli.