kartothek.io_components.utils module¶

This module is a collection of helper functions

class kartothek.io_components.utils.InvalidObject[source]¶

Bases: object

Sentinel to mark keys for removal

kartothek.io_components.utils.align_categories(dfs, categoricals)[source]¶

Takes a list of dataframes with categorical columns and determines the superset of categories. All specified columns will then be cast to the same pd.CategoricalDtype

Parameters

dfs (List[pd.DataFrame]) – A list of dataframes for which the categoricals should be aligned
categoricals (List[str]) – Columns holding categoricals which should be aligned

Returns

A list with aligned dataframes

Return type

List[pd.DataFrame]

kartothek.io_components.utils.check_single_table_dataset(dataset, expected_table=None)[source]¶

Raise if the given dataset is not a single-table dataset.

Parameters

dataset (kartothek.core.dataset.DatasetMetadata) – The dataset to be validated
expected_table (Optional[str]) – Ensure that the table in the dataset is the same as the given one.

Deprecated since version 5.3: This will be removed in 6.0. The check_single_table_dataset keyword is deprecated and will be removed.

kartothek.io_components.utils.combine_metadata(dataset_metadata: List[Dict], append_to_list: bool = True) → Dict[source]¶

Merge a list of dictionaries

The merge is performed in such a way, that only keys which are present in all dictionaries are kept in the final result.

If lists are encountered, the values of the result will be the concatenation of all list values in the order of the supplied dictionary list. This behaviour may be changed by using append_to_list

Parameters

dataset_metadata – The list of dictionaries (usually metadata) to be combined.
append_to_list – If True, all values are concatenated. If False, only unique values are kept

kartothek.io_components.utils.extract_duplicates(lst)[source]¶

Return all items of a list that occur more than once.

Parameters: lst (List[Any]) –
Returns: lst
Return type: List[Any]

kartothek.io_components.utils.raise_if_indices_overlap(partition_on, secondary_indices)[source]¶

kartothek.io_components.utils.sort_values_categorical(df: pandas.core.frame.DataFrame, columns: Union[List[str], str]) → pandas.core.frame.DataFrame [source]¶: Sort a dataframe lexicographically by the categories of column column

kartothek.io_components.utils.validate_partition_keys(dataset_uuid, store, ds_factory, default_metadata_version, partition_on, load_dataset_metadata=True)[source]¶