Glossary¶

Build¶

Process of creating a new cube.

Cell¶

A unique combination of Dimension values. Will result in a single row in input and output DataFrames.

Cube¶

A combination of multiple datasets that model an Data Cubes-like construct. The core data structure of kartothek cube.

Dataset ID¶

The ID of a dataset that belongs to the cube w/o any Uuid Prefix.

Dimension¶

Part of the address for a certain cube Cell. Usually refered as Dimension Column. Different dimension should describe orthogonal attributes.

Dimension Column¶

DataFrame column that contains values for a certain Dimension.

Dimension Columns¶

Ordered list of all Dimension Column for a Cube.

Extend¶

Process of adding new datasets to an existing cube.

Index Column¶

Column for which additional index structures are build.

Kartothek Dataset UUID¶

Name that makes a dataset unique in a store, includes Uuid Prefix and Dataset ID as <UUID Prefix>++<Dataset ID>.

Logical Partition¶

Partition that was created by partition_by arguments to the Query.

Physical Partition¶

A single chunk of data that is stored to the blob store. May contain multiple Parquet files.

Partition Column¶

DataFrame column that contains one part that makes a Physical Partition.

Partition Columns¶

Ordered list of all Partition Column for a Cube.

Projection¶

Process of dimension reduction of a cube (like a 3D object projects a shadow on the wall). Only works if the involved payload only exists in the subdimensional space since no automatic aggregation is supported.

Seed¶

Dataset that provides the groundtruth about which Cell are in a Cube.

Store Factory¶

A callable that does not take any arguments and creates a new simplekv store when being called. Its type is Callable[[], simplekv.KeyValueStore].

Query¶

A request for data from the cube, including things like “payload columns”, “conditions”, and more.

Query Execution¶

Process of reading out data from a Cube, aka the execution of a Query.

Query Intention¶

The actual intention of a Query, e.g.:

if the user queries “all columns”, the intention includes the concrete set of columns
if the user does not specify the dimension columns, it should use the cube dimension column (aka “no Projection”)

Uuid Prefix¶

Common prefix for all datasets that belong to a Cube.