Glossary

Build

Process of creating a new cube.

Cell

A unique combination of Dimension values. Will result in a single row in input and output DataFrames.

Cube

A combination of multiple datasets that model an Data Cubes-like construct. The core data structure of kartothek cube.

Dataset ID

The ID of a dataset that belongs to the cube w/o any Uuid Prefix.

Dimension

Part of the address for a certain cube Cell. Usually refered as Dimension Column. Different dimension should describe orthogonal attributes.

Dimension Column

DataFrame column that contains values for a certain Dimension.

Dimension Columns

Ordered list of all Dimension Column for a Cube.

Extend

Process of adding new datasets to an existing cube.

Index Column

Column for which additional index structures are build.

Kartothek Dataset UUID

Name that makes a dataset unique in a store, includes Uuid Prefix and Dataset ID as <UUID Prefix>++<Dataset ID>.

Logical Partition

Partition that was created by partition_by arguments to the Query.

Physical Partition

A single chunk of data that is stored to the blob store. May contain multiple Parquet files.

Partition Column

DataFrame column that contains one part that makes a Physical Partition.

Partition Columns

Ordered list of all Partition Column for a Cube.

Projection

Process of dimension reduction of a cube (like a 3D object projects a shadow on the wall). Only works if the involved payload only exists in the subdimensional space since no automatic aggregation is supported.

Seed

Dataset that provides the groundtruth about which Cell are in a Cube.

Store Factory

A callable that does not take any arguments and creates a new simplekv store when being called. Its type is Callable[[], simplekv.KeyValueStore].

Query

A request for data from the cube, including things like “payload columns”, “conditions”, and more.

Query Execution

Process of reading out data from a Cube, aka the execution of a Query.

Query Intention

The actual intention of a Query, e.g.:

  • if the user queries “all columns”, the intention includes the concrete set of columns

  • if the user does not specify the dimension columns, it should use the cube dimension column (aka “no Projection”)

Uuid Prefix

Common prefix for all datasets that belong to a Cube.