kartothek.core.cube.conditions module¶
The condition sublanguage.
-
kartothek.core.cube.conditions.
C
¶
-
class
kartothek.core.cube.conditions.
Condition
(column)[source]¶ Bases:
object
An abstract condition on a column.
Multiple conditions may be combined using
&
:(C('a') == 1) & (C('b') == 2)
- Parameters
column (str) – Column name.
-
filter_df
(df)[source]¶ Filter given DataFrame w/ condition.
- Parameters
df (pandas.DataFrame) – DataFrame to evaluate on, must contain required column.
- Returns
result – Part of the DataFrame for which the condition holds.
- Return type
-
static
from_string
(s, all_types)[source]¶ Parse string as condition object.
- Parameters
s (str) – String to parse.
all_types (Dict[str, pyarrow.DataType]) – Mapping from all known columns to pyarrow types.
- Returns
condition – Parsed condition.
- Return type
- Raises
ValueError – If condition cannot be parsed.:
-
class
kartothek.core.cube.conditions.
Conjunction
(conditions)[source]¶ Bases:
object
Conjunction of multiple
Condition
objects.- Parameters
conditions (Tuple[Condition]) – Tuple of conditions that must all be satisfied at the same time. Can address multiple columns.
-
property
columns
¶ Columns that are checked by this conjunction.
-
filter_df
(df)[source]¶ Filter given DataFrame w/ conjunction.
NULL-values will always treated as non-matching.
- Parameters
df (pandas.DataFrame) – DataFrame to evaluate on, must contain required column.
- Returns
result – Part of the DataFrame for which the conjunction holds.
- Return type
-
static
from_jsonarray
(array)[source]¶ Recover conjunction from JSON-compatible array.
- Parameters
jsonarray (List[Dict[str, Any]]) – JSON-compatible array.
- Returns
conjunction – Recovered conjunction.
- Return type
- Raises
TypeError – If are wrong or unknown condition type was passed.:
ValueError – If
"type"
attribute within a condition is missing.:
See also
to_jsonarray
Creates array, illustrates format.
-
static
from_string
(s, all_types)[source]¶ Parse string as conjunction object.
Important
This is intended to be used for human interaction (e.g. CLIs). Do not use this for serializing and deserializing conditions, since this does not support all conditions and is not guaranteed to be roundtrip-safe. For the purpose of serialization, better use
to_jsonarray()
andfrom_jsonarray()
.- Parameters
s (str) – String to parse.
all_types (Dict[str, pyarrow.DataType]) – Mapping from all known columns to pyarrow types.
- Returns
conjunction – Parsed conjunction.
- Return type
- Raises
ValueError – If condition cannot be parsed.:
-
classmethod
from_two
(left: Union[kartothek.core.cube.conditions.Condition, kartothek.core.cube.conditions.Conjunction], right: Union[kartothek.core.cube.conditions.Condition, kartothek.core.cube.conditions.Conjunction]) → kartothek.core.cube.conditions.Conjunction[source]¶ Create conjunction from two elements.
- Parameters
left – Left part.
right – Right part.
- Returns
conjunction – Conjunction of the two given parts.
- Return type
-
property
predicate
¶ Predicate to be consumed by Kartothek and DataFrame serializer.
-
split_by_column
()[source]¶ Split conjunction by column.
Non-active conditions will be dropped.
- Returns
split – Conjunctions by affected column.
- Return type
Dict[str, Conjunction]
-
to_jsonarray
()[source]¶ Converts conjunction to a list that can be used for JSON/YAML serialization.
Important
Not all value types that can be used within conditions are JSON-serializable (e.g.
datetime
objects). The user is responsible of ensuring that these values can pass functions likejson.dump
or has to implement proper error handling.- Returns
jsonarray – JSON-compatible array.
- Return type
List[Dict[str, Any]]
Example
>>> import json >>> from kartothek.core.cube.conditions import C >>> conjunction = ( ... (C("x") > 1) ... & (C("y").isin(["foo", "bar"])) ... ) >>> array = conjunction.to_jsonarray() >>> print(json.dumps(array, indent=True, sort_keys=True)) [ { "column": "x", "type": "GreaterThanCondition", "value": 1 }, { "column": "y", "type": "IsInCondition", "value": [ "foo", "bar" ] } ]
See also
from_jsonarray
Converts array back into a conjunction.
-
class
kartothek.core.cube.conditions.
EqualityCondition
(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleCondition
Condition on column equality.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP
= '=='¶
-
class
kartothek.core.cube.conditions.
GreaterEqualCondition
(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleCondition
Condition that describes that a column should be greater or equal to the given value.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP
= '>='¶
-
class
kartothek.core.cube.conditions.
GreaterThanCondition
(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleCondition
Condition that describes that a column should be strictly greater than the given value.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP
= '>'¶
-
class
kartothek.core.cube.conditions.
InIntervalCondition
(column, start=None, stop=None)[source]¶ Bases:
kartothek.core.cube.conditions.Condition
Condition expressing that values of a column should be in a given interval.
- Parameters
columns (str) – Column name.
start (Any) – Inclusive start of the interval, optional.
stop (Any) – Exclusive stop of the interval, optional.
-
property
active
¶
-
property
predicate_part
¶ Part of the inner list for Kartothek predicate pushdown.
-
class
kartothek.core.cube.conditions.
InequalityCondition
(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleCondition
Condition on column inequality.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP
= '!='¶
-
class
kartothek.core.cube.conditions.
IsInCondition
(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleCondition
Condition that describes that values in a column should be within the given list.
- Parameters
columns (str) – Column name.
value (Tuple[Any]) – Tuple to check for.
-
OP
= 'in'¶
-
class
kartothek.core.cube.conditions.
LessEqualCondition
(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleCondition
Condition that describes that a column should be less or equal to the given value.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP
= '<='¶
-
class
kartothek.core.cube.conditions.
LessThanCondition
(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleCondition
Condition that describes that a column should be strictly less than the given value.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP
= '<'¶
-
class
kartothek.core.cube.conditions.
SimpleCondition
(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.Condition
A simple condition that only emits a single predicate part. Must be subclassed.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
active
= True¶
-
property
predicate_part
¶ Part of the inner list for Kartothek predicate pushdown.
-
class
kartothek.core.cube.conditions.
VirtualColumn
(name)[source]¶ Bases:
object
Virtual column that can be used to easily construct conditions.
The following operations are supported:
Operation
Python Example
Result Class
Equal
C("a") == 42
Not Equal
C("a") != 42
Less Than
C("a") < 42
Less Equal
C("a") <= 42
Greater Than
C("a") > 42
Greater Equal
C("a") >= 42
Is In
C("a").isin([1, 2])
In Interval
C("a").in_interval(0, 100)
- Parameters
name (str) – Column name.