kartothek.io_components.cube.stats module

kartothek.io_components.cube.stats.collect_stats_block(metapartitions, store)[source]

Gather statistics data for multiple metapartitions.

Parameters
Returns

stats – Statistics per ktk_cube dataset ID.

Return type

Dict[str, Dict[str, int]]

kartothek.io_components.cube.stats.get_metapartitions_for_stats(datasets)[source]

Get all metapartitions that need to be scanned to gather cube stats.

Parameters

datasets (Dict[str, kartothek.core.dataset.DatasetMetadata]) – Datasets that are present.

Returns

metapartitions – Pre-aligned metapartitions (by primary index / physical partitions) and the ktk_cube dataset ID belonging to them.

Return type

Tuple[Tuple[str, Tuple[kartothek.io_components.metapartition.MetaPartition, ..]], ..]

kartothek.io_components.cube.stats.reduce_stats(stats_iter)[source]

Sum-up stats data.

Parameters

stats_iter (Iterable[Dict[str, Dict[str, int]]]) – Iterable of stats objects, either resulting from collect_stats_block() or previous reduce_stats() calls.

Returns

stats – Statistics per ktk_cube dataset ID.

Return type

Dict[str, Dict[str, int]]