kartothek.io.testing.append_cube module

kartothek.io.testing.append_cube.existing_cube(function_store)[source]
kartothek.io.testing.append_cube.test_append_partitions(driver, function_store, existing_cube)[source]
kartothek.io.testing.append_cube.test_append_partitions_no_ts(driver, function_store)[source]
kartothek.io.testing.append_cube.test_compression_is_compatible_on_append_cube(driver, function_store)[source]

Test that partitons written with different compression algorithms are compatible

The compression algorithms are not parametrized because their availability depends on the arrow build. ‘SNAPPY’ and ‘GZIP’ are already assumed to be available in parts of the code. A fully parametrized test would also increase runtime and test complexity unnecessarily.

kartothek.io.testing.append_cube.test_fails_incompatible_dtypes(driver, function_store, existing_cube)[source]

Should also cross check w/ seed dataset.

kartothek.io.testing.append_cube.test_fails_missing_column(driver, function_store, existing_cube)[source]
kartothek.io.testing.append_cube.test_fails_unknown_dataset(driver, function_store, existing_cube)[source]
kartothek.io.testing.append_cube.test_indices(driver, function_store, existing_cube)[source]
kartothek.io.testing.append_cube.test_metadata(driver, function_store, existing_cube)[source]

Test auto- and user-generated metadata.

kartothek.io.testing.append_cube.test_rowgroups_are_applied_when_df_serializer_is_passed_to_append_cube(driver, function_store, chunk_size_build, chunk_size_append)[source]

Test that the dataset is split into row groups depending on the chunk size

Partitions build with chunk_size=None should keep a single row group after the append. Partitions that are newly created with chunk_size>0 should be split into row groups accordingly.

kartothek.io.testing.append_cube.test_single_rowgroup_when_df_serializer_is_not_passed_to_append_cube(driver, function_store)[source]

Test that the dataset has a single row group as default path