kartothek.io_components.utils module¶
This module is a collection of helper functions
-
class
kartothek.io_components.utils.
InvalidObject
[source]¶ Bases:
object
Sentinel to mark keys for removal
-
kartothek.io_components.utils.
align_categories
(dfs, categoricals)[source]¶ Takes a list of dataframes with categorical columns and determines the superset of categories. All specified columns will then be cast to the same pd.CategoricalDtype
- Parameters
dfs (List[pd.DataFrame]) – A list of dataframes for which the categoricals should be aligned
categoricals (List[str]) – Columns holding categoricals which should be aligned
- Returns
A list with aligned dataframes
- Return type
List[pd.DataFrame]
-
kartothek.io_components.utils.
check_single_table_dataset
(dataset, expected_table=None)[source]¶ Raise if the given dataset is not a single-table dataset.
- Parameters
dataset (kartothek.core.dataset.DatasetMetadata) – The dataset to be validated
expected_table (Optional[str]) – Ensure that the table in the dataset is the same as the given one.
Deprecated since version 5.3: This will be removed in 6.0. The check_single_table_dataset keyword is deprecated and will be removed.
-
kartothek.io_components.utils.
combine_metadata
(dataset_metadata: List[Dict], append_to_list: bool = True) → Dict[source]¶ Merge a list of dictionaries
The merge is performed in such a way, that only keys which are present in all dictionaries are kept in the final result.
If lists are encountered, the values of the result will be the concatenation of all list values in the order of the supplied dictionary list. This behaviour may be changed by using append_to_list
- Parameters
dataset_metadata – The list of dictionaries (usually metadata) to be combined.
append_to_list – If True, all values are concatenated. If False, only unique values are kept
-
kartothek.io_components.utils.
extract_duplicates
(lst)[source]¶ Return all items of a list that occur more than once.
- Parameters
lst (List[Any]) –
- Returns
lst
- Return type
List[Any]
-
kartothek.io_components.utils.
sort_values_categorical
(df: pandas.core.frame.DataFrame, columns: Union[List[str], str]) → pandas.core.frame.DataFrame[source]¶ Sort a dataframe lexicographically by the categories of column column