kartothek.core.cube.conditions module

The condition sublanguage.

kartothek.core.cube.conditions.C

alias of kartothek.core.cube.conditions.VirtualColumn

class kartothek.core.cube.conditions.Condition(column)[source]

Bases: object

An abstract condition on a column.

Multiple conditions may be combined using &:

(C('a') == 1) & (C('b') == 2)
Parameters

column (str) – Column name.

filter_df(df)[source]

Filter given DataFrame w/ condition.

Parameters

df (pandas.DataFrame) – DataFrame to evaluate on, must contain required column.

Returns

result – Part of the DataFrame for which the condition holds.

Return type

pandas.DataFrame

static from_string(s, all_types)[source]

Parse string as condition object.

Parameters
  • s (str) – String to parse.

  • all_types (Dict[str, pyarrow.DataType]) – Mapping from all known columns to pyarrow types.

Returns

condition – Parsed condition.

Return type

Condition

Raises

ValueError – If condition cannot be parsed.:

class kartothek.core.cube.conditions.Conjunction(conditions)[source]

Bases: object

Conjunction of multiple Condition objects.

Parameters

conditions (Tuple[Condition]) – Tuple of conditions that must all be satisfied at the same time. Can address multiple columns.

property columns

Columns that are checked by this conjunction.

filter_df(df)[source]

Filter given DataFrame w/ conjunction.

NULL-values will always treated as non-matching.

Parameters

df (pandas.DataFrame) – DataFrame to evaluate on, must contain required column.

Returns

result – Part of the DataFrame for which the conjunction holds.

Return type

pandas.DataFrame

static from_jsonarray(array)[source]

Recover conjunction from JSON-compatible array.

Parameters

jsonarray (List[Dict[str, Any]]) – JSON-compatible array.

Returns

conjunction – Recovered conjunction.

Return type

Conjunction

Raises
  • TypeError – If are wrong or unknown condition type was passed.:

  • ValueError – If "type" attribute within a condition is missing.:

See also

to_jsonarray

Creates array, illustrates format.

static from_string(s, all_types)[source]

Parse string as conjunction object.

Important

This is intended to be used for human interaction (e.g. CLIs). Do not use this for serializing and deserializing conditions, since this does not support all conditions and is not guaranteed to be roundtrip-safe. For the purpose of serialization, better use to_jsonarray() and from_jsonarray().

Parameters
  • s (str) – String to parse.

  • all_types (Dict[str, pyarrow.DataType]) – Mapping from all known columns to pyarrow types.

Returns

conjunction – Parsed conjunction.

Return type

Conjunction

Raises

ValueError – If condition cannot be parsed.:

classmethod from_two(left: Union[kartothek.core.cube.conditions.Condition, kartothek.core.cube.conditions.Conjunction], right: Union[kartothek.core.cube.conditions.Condition, kartothek.core.cube.conditions.Conjunction])kartothek.core.cube.conditions.Conjunction[source]

Create conjunction from two elements.

Parameters
  • left – Left part.

  • right – Right part.

Returns

conjunction – Conjunction of the two given parts.

Return type

Conjunction

property predicate

Predicate to be consumed by Kartothek and DataFrame serializer.

split_by_column()[source]

Split conjunction by column.

Non-active conditions will be dropped.

Returns

split – Conjunctions by affected column.

Return type

Dict[str, Conjunction]

to_jsonarray()[source]

Converts conjunction to a list that can be used for JSON/YAML serialization.

Important

Not all value types that can be used within conditions are JSON-serializable (e.g. datetime objects). The user is responsible of ensuring that these values can pass functions like json.dump or has to implement proper error handling.

Returns

jsonarray – JSON-compatible array.

Return type

List[Dict[str, Any]]

Example

>>> import json
>>> from kartothek.core.cube.conditions import C
>>> conjunction = (
...     (C("x") > 1)
...     & (C("y").isin(["foo", "bar"]))
... )
>>> array = conjunction.to_jsonarray()
>>> print(json.dumps(array, indent=True, sort_keys=True))
[
 {
  "column": "x",
  "type": "GreaterThanCondition",
  "value": 1
 },
 {
  "column": "y",
  "type": "IsInCondition",
  "value": [
   "foo",
   "bar"
  ]
 }
]

See also

from_jsonarray

Converts array back into a conjunction.

class kartothek.core.cube.conditions.EqualityCondition(column, value)[source]

Bases: kartothek.core.cube.conditions.SimpleCondition

Condition on column equality.

Parameters
  • column (str) – Column name.

  • value (Any) – To which value the column should be compared to.

OP = '=='
class kartothek.core.cube.conditions.GreaterEqualCondition(column, value)[source]

Bases: kartothek.core.cube.conditions.SimpleCondition

Condition that describes that a column should be greater or equal to the given value.

Parameters
  • column (str) – Column name.

  • value (Any) – To which value the column should be compared to.

OP = '>='
class kartothek.core.cube.conditions.GreaterThanCondition(column, value)[source]

Bases: kartothek.core.cube.conditions.SimpleCondition

Condition that describes that a column should be strictly greater than the given value.

Parameters
  • column (str) – Column name.

  • value (Any) – To which value the column should be compared to.

OP = '>'
class kartothek.core.cube.conditions.InIntervalCondition(column, start=None, stop=None)[source]

Bases: kartothek.core.cube.conditions.Condition

Condition expressing that values of a column should be in a given interval.

Parameters
  • columns (str) – Column name.

  • start (Any) – Inclusive start of the interval, optional.

  • stop (Any) – Exclusive stop of the interval, optional.

property active
property predicate_part

Part of the inner list for Kartothek predicate pushdown.

class kartothek.core.cube.conditions.InequalityCondition(column, value)[source]

Bases: kartothek.core.cube.conditions.SimpleCondition

Condition on column inequality.

Parameters
  • column (str) – Column name.

  • value (Any) – To which value the column should be compared to.

OP = '!='
class kartothek.core.cube.conditions.IsInCondition(column, value)[source]

Bases: kartothek.core.cube.conditions.SimpleCondition

Condition that describes that values in a column should be within the given list.

Parameters
  • columns (str) – Column name.

  • value (Tuple[Any]) – Tuple to check for.

OP = 'in'
class kartothek.core.cube.conditions.LessEqualCondition(column, value)[source]

Bases: kartothek.core.cube.conditions.SimpleCondition

Condition that describes that a column should be less or equal to the given value.

Parameters
  • column (str) – Column name.

  • value (Any) – To which value the column should be compared to.

OP = '<='
class kartothek.core.cube.conditions.LessThanCondition(column, value)[source]

Bases: kartothek.core.cube.conditions.SimpleCondition

Condition that describes that a column should be strictly less than the given value.

Parameters
  • column (str) – Column name.

  • value (Any) – To which value the column should be compared to.

OP = '<'
class kartothek.core.cube.conditions.SimpleCondition(column, value)[source]

Bases: kartothek.core.cube.conditions.Condition

A simple condition that only emits a single predicate part. Must be subclassed.

Parameters
  • column (str) – Column name.

  • value (Any) – To which value the column should be compared to.

active = True
property predicate_part

Part of the inner list for Kartothek predicate pushdown.

class kartothek.core.cube.conditions.VirtualColumn(name)[source]

Bases: object

Virtual column that can be used to easily construct conditions.

The following operations are supported:

Operation

Python Example

Result Class

Equal

C("a") == 42

EqualityCondition

Not Equal

C("a") != 42

InequalityCondition

Less Than

C("a") < 42

LessThanCondition

Less Equal

C("a") <= 42

LessEqualCondition

Greater Than

C("a") > 42

GreaterThanCondition

Greater Equal

C("a") >= 42

GreaterEqualCondition

Is In

C("a").isin([1, 2])

IsInCondition

In Interval

C("a").in_interval(0, 100)

InIntervalCondition

Parameters

name (str) – Column name.

in_interval(start=None, stop=None)[source]
isin(other)[source]