kartothek.io.testing.update_cube module

kartothek.io.testing.update_cube.test_compression_is_compatible_on_update_cube(driver, function_store)[source]

Test that partitons written with different compression algorithms are compatible

The compression algorithms are not parametrized because their availability depends on the arrow build. ‘SNAPPY’ and ‘GZIP’ are already assumed to be available in parts of the code. A fully parametrized test would also increase runtime and test complexity unnecessarily.

kartothek.io.testing.update_cube.test_cube_blacklist_dimension_index(function_store, driver)[source]
kartothek.io.testing.update_cube.test_cube_update_secondary_indices_subset(function_store, driver)[source]
kartothek.io.testing.update_cube.test_rowgroups_are_applied_when_df_serializer_is_passed_to_update_cube(driver, function_store, chunk_size_build, chunk_size_update)[source]

Test that the dataset is split into row groups depending on the chunk size

Partitions build with chunk_size=None should keep a single row group if they are not touched by the update. Partitions that are newly created or replaced with chunk_size>0 should be split into row groups accordingly.

kartothek.io.testing.update_cube.test_single_rowgroup_when_df_serializer_is_not_passed_to_update_cube(driver, function_store)[source]

Test that the dataset has a single row group as default path

kartothek.io.testing.update_cube.test_update_partitions(driver, function_store, remove_partitions, new_partitions)[source]
kartothek.io.testing.update_cube.test_update_respects_ktk_cube_dataset_ids(driver, function_store, ktk_cube_dataset_ids)[source]