Process of creating a new cube.
A unique combination of Dimension values. Will result in a single row in input and output DataFrames.
A combination of multiple datasets that model an Data Cubes-like construct. The core data structure of kartothek cube.
- Dataset ID¶
The ID of a dataset that belongs to the cube w/o any Uuid Prefix.
- Dimension Column¶
DataFrame column that contains values for a certain Dimension.
- Dimension Columns¶
Process of adding new datasets to an existing cube.
- Index Column¶
Column for which additional index structures are build.
- Kartothek Dataset UUID¶
- Logical Partition¶
Partition that was created by
partition_byarguments to the Query.
- Physical Partition¶
A single chunk of data that is stored to the blob store. May contain multiple Parquet files.
- Partition Column¶
DataFrame column that contains one part that makes a Physical Partition.
- Partition Columns¶
Process of dimension reduction of a cube (like a 3D object projects a shadow on the wall). Only works if the involved payload only exists in the subdimensional space since no automatic aggregation is supported.
- Store Factory¶
A callable that does not take any arguments and creates a new simplekv store when being called. Its type is
A request for data from the cube, including things like “payload columns”, “conditions”, and more.
- Query Execution¶
- Query Intention¶
The actual intention of a Query, e.g.:
if the user queries “all columns”, the intention includes the concrete set of columns
if the user does not specify the dimension columns, it should use the cube dimension column (aka “no Projection”)
- Uuid Prefix¶
Common prefix for all datasets that belong to a Cube.