Glossary

Build
Process of creating a new cube.
Cell
A unique combination of Dimension values. Will result in a single row in input and output DataFrames.
Cube
A combination of multiple datasets that model an Data Cubes-like construct. The core data structure of kartothek cube.
Dataset ID
The ID of a dataset that belongs to the cube w/o any Uuid Prefix.
Dimension
Part of the address for a certain cube Cell. Usually refered as Dimension Column. Different dimension should describe orthogonal attributes.
Dimension Column
DataFrame column that contains values for a certain Dimension.
Dimension Columns
Ordered list of all Dimension Column for a Cube.
Extend
Process of adding new datasets to an existing cube.
Index Column
Column for which additional index structures are build.
Kartothek Dataset UUID
Name that makes a dataset unique in a store, includes Uuid Prefix and Dataset ID as <UUID Prefix>++<Dataset ID>.
Logical Partition
Partition that was created by partition_by arguments to the Query.
Physical Partition
A single chunk of data that is stored to the blob store. May contain multiple Parquet files.
Partition Column
DataFrame column that contains one part that makes a Physical Partition.
Partition Columns
Ordered list of all Partition Column for a Cube.
Projection
Process of dimension reduction of a cube (like a 3D object projects a shadow on the wall). Only works if the involved payload only exists in the subdimensional space since no automatic aggregation is supported.
Seed
Dataset that provides the groundtruth about which Cell are in a Cube.
Store Factory
A callable that does not take any arguments and creates a new simplekv store when being called. Its type is Callable[[], simplekv.KeyValueStore].
Query
A request for data from the cube, including things like “payload columns”, “conditions”, and more.
Query Execution
Process of reading out data from a Cube, aka the execution of a Query.
Query Intention

The actual intention of a Query, e.g.:

  • if the user queries “all columns”, the intention includes the concrete set of columns
  • if the user does not specify the dimension columns, it should use the cube dimension column (aka “no Projection”)
Uuid Prefix
Common prefix for all datasets that belong to a Cube.