kartothek.io.testing.build_cube module

kartothek.io.testing.build_cube.test_accept_projected_duplicates(driver, function_store)[source]

Otherwise partitioning does not work w/ projected data.

kartothek.io.testing.build_cube.test_distinct_branches(driver, function_store)[source]

Just check this actually works.

kartothek.io.testing.build_cube.test_do_not_modify_df(driver, function_store)[source]

Functions should not modify their inputs.

kartothek.io.testing.build_cube.test_empty_df(driver, function_store, empty_first)[source]

Might happen during DB queries.

kartothek.io.testing.build_cube.test_fail_all_empty(driver, driver_name, function_store)[source]

Might happen due to DB-based filters.

kartothek.io.testing.build_cube.test_fail_duplicates_global(driver_name, driver, function_store)[source]

Might happen due to bugs.

kartothek.io.testing.build_cube.test_fail_duplicates_local(driver, driver_name, function_store)[source]

Might happen during DB queries.

kartothek.io.testing.build_cube.test_fail_no_store_factory(driver, function_store, skip_eager)[source]
kartothek.io.testing.build_cube.test_fail_nondistinc_payload(driver, function_store)[source]

This would lead to problems during the query phase.

kartothek.io.testing.build_cube.test_fail_not_a_df(driver, function_store)[source]

Pass some weird objects in.

kartothek.io.testing.build_cube.test_fail_partial_build(driver, function_store)[source]

Either overwrite all or no datasets.

kartothek.io.testing.build_cube.test_fail_partial_overwrite(driver, function_store)[source]

Either overwrite all or no datasets.

kartothek.io.testing.build_cube.test_fail_partition_on_1(driver, function_store)[source]
kartothek.io.testing.build_cube.test_fail_partition_on_3(driver, function_store)[source]
kartothek.io.testing.build_cube.test_fail_partition_on_4(driver, function_store)[source]
kartothek.io.testing.build_cube.test_fail_partition_on_nondistinc_payload(driver, function_store)[source]

This would lead to problems during the query phase.

kartothek.io.testing.build_cube.test_fail_sparse(driver, driver_name, function_store)[source]

Ensure that sparse dataframes are rejected.

kartothek.io.testing.build_cube.test_fail_wrong_dataset_ids(driver, function_store, skip_eager, driver_name)[source]
kartothek.io.testing.build_cube.test_fail_wrong_types(driver, function_store)[source]

Might catch nasty pandas and other type bugs.

kartothek.io.testing.build_cube.test_fails_duplicate_columns(driver, function_store, driver_name)[source]

Catch weird pandas behavior.

kartothek.io.testing.build_cube.test_fails_metadata_nested_wrong_type(driver, function_store)[source]
kartothek.io.testing.build_cube.test_fails_metadata_unknown_id(driver, function_store)[source]
kartothek.io.testing.build_cube.test_fails_metadata_wrong_type(driver, function_store)[source]
kartothek.io.testing.build_cube.test_fails_missing_dimension_columns(driver, function_store)[source]

Ensure that we catch missing dimension columns early.

kartothek.io.testing.build_cube.test_fails_missing_partition_columns(driver, function_store)[source]

Just make the Kartothek error nicer.

kartothek.io.testing.build_cube.test_fails_missing_seed(driver, function_store)[source]

A cube must contain its seed dataset, check this constraint as early as possible.

kartothek.io.testing.build_cube.test_fails_no_dimension_columns(driver, function_store)[source]

Ensure that we catch missing dimension columns early.

kartothek.io.testing.build_cube.test_fails_null_dimension(driver, function_store)[source]

Since we do not allow NULL values in queries, it should be banned from dimension columns in the first place.

kartothek.io.testing.build_cube.test_fails_null_index(driver, function_store)[source]

Since we do not allow NULL values in queries, it should be banned from index columns in the first place.

kartothek.io.testing.build_cube.test_fails_null_partition(driver, function_store)[source]

Since we do not allow NULL values in queries, it should be banned from partition columns in the first place.

kartothek.io.testing.build_cube.test_fails_projected_duplicates(driver, driver_name, function_store)[source]

Test if duplicate check also works w/ projected data. (was a regression)

kartothek.io.testing.build_cube.test_indices(driver, function_store)[source]

Test that index structures are created correctly.

kartothek.io.testing.build_cube.test_metadata(driver, function_store)[source]

Test auto- and user-generated metadata.

kartothek.io.testing.build_cube.test_nones(driver, function_store, none_first, driver_name)[source]

Test what happens if user passes None to ktk_cube.

kartothek.io.testing.build_cube.test_overwrite(driver, function_store)[source]

Test overwrite behavior aka call the build function if the cube already exists.

kartothek.io.testing.build_cube.test_overwrite_rollback_ktk(driver, function_store)[source]

Checks that require a rollback (like overlapping columns) should recover the former state correctly.

kartothek.io.testing.build_cube.test_overwrite_rollback_ktk_cube(driver, function_store)[source]

Checks that require a rollback (like overlapping columns) should recover the former state correctly.

kartothek.io.testing.build_cube.test_parquet(driver, function_store)[source]

Ensure the parquet files we generate are properly normalized.

kartothek.io.testing.build_cube.test_partition_on_enrich_extra(driver, function_store)[source]
kartothek.io.testing.build_cube.test_partition_on_enrich_none(driver, function_store)[source]
kartothek.io.testing.build_cube.test_partition_on_index_column(driver, function_store)[source]
kartothek.io.testing.build_cube.test_projected_data(driver, function_store)[source]

Projected dataset (useful for de-duplication).

kartothek.io.testing.build_cube.test_regression_pseudo_duplicates(driver, function_store)[source]

Might happen due to bugs.

kartothek.io.testing.build_cube.test_rowgroups_are_applied_when_df_serializer_is_passed_to_build_cube(driver, function_store, chunk_size)[source]

Test that the dataset is split into row groups depending on the chunk size

kartothek.io.testing.build_cube.test_simple_seed_only(driver, function_store)[source]

Simple integration test w/ a seed dataset only. This is the most simple way to create a cube.

kartothek.io.testing.build_cube.test_simple_two_datasets(driver, function_store)[source]

Simple intergration test w/ 2 datasets.

kartothek.io.testing.build_cube.test_single_rowgroup_when_df_serializer_is_not_passed_to_build_cube(driver, function_store)[source]

Test that the dataset has a single row group as default path

kartothek.io.testing.build_cube.test_split(driver, function_store)[source]

Imagine the user already splits the data.