Command Line FeaturesΒΆ
Kartothek Cube also features a command line interface (CLI) for some cube operations. To use it, create a skv.yml
file that
describes storefact stores:
dataset:
type: hfs
path: path/to/data
Now use the kartothek_cube
command to gather certain cube information:
kartothek_cube geodata info
Infos UUID Prefix: geodata Dimension Columns: - city: string - day: timestamp[ns] Partition Columns: - country: string Index Columns: Seed Dataset: seed Dataset: latlong Partition Keys: - country: string Partitions: 4 Metadata: { "creation_time": "2019-10-01T12:11:38.263496", "klee_dimension_columns": [ "city", "day" ], "klee_is_seed": false, "klee_partition_columns": [ "country" ] } Dimension Columns: - city: string Payload Columns: - latitude: double - longitude: double Dataset: seed Partition Keys: - country: string Partitions: 3 Metadata: { "creation_time": "2019-10-01T12:11:38.206653", "klee_dimension_columns": [ "city", "day" ], "klee_is_seed": true, "klee_partition_columns": [ "country" ] } Dimension Columns: - city: string - day: timestamp[ns] Payload Columns: - avg_temp: int64 Dataset: time Partitions: 1 Metadata: { "creation_time": "2019-10-01T12:11:41.734913", "klee_dimension_columns": [ "city", "day" ], "klee_is_seed": false, "klee_partition_columns": [ "country" ] } Dimension Columns: - day: timestamp[ns] Payload Columns: - month: int64 - weekday: int64 - year: int64
Some information is not available when reading the schema information and require a cube scan:
kartothek_cube geodata stats
[########################################] | 100% Completed | 0.1s latlong blobsize: 5,690 files: 4 partitions: 4 rows: 4 seed blobsize: 4,589 files: 3 partitions: 3 rows: 8 time blobsize: 3,958 files: 1 partitions: 1 rows: 366 __total__ blobsize: 14,237 files: 8
Use kartothek_cube --help
to get a list of all commands, or see cli
.