kartothek.core.cube.conditions module¶
The condition sublanguage.
-
kartothek.core.cube.conditions.C¶
-
class
kartothek.core.cube.conditions.Condition(column)[source]¶ Bases:
objectAn abstract condition on a column.
Multiple conditions may be combined using
&:(C('a') == 1) & (C('b') == 2)
- Parameters
column (str) – Column name.
-
filter_df(df)[source]¶ Filter given DataFrame w/ condition.
- Parameters
df (pandas.DataFrame) – DataFrame to evaluate on, must contain required column.
- Returns
result – Part of the DataFrame for which the condition holds.
- Return type
-
static
from_string(s, all_types)[source]¶ Parse string as condition object.
- Parameters
s (str) – String to parse.
all_types (Dict[str, pyarrow.DataType]) – Mapping from all known columns to pyarrow types.
- Returns
condition – Parsed condition.
- Return type
- Raises
ValueError – If condition cannot be parsed.:
-
class
kartothek.core.cube.conditions.Conjunction(conditions)[source]¶ Bases:
objectConjunction of multiple
Conditionobjects.- Parameters
conditions (Tuple[Condition]) – Tuple of conditions that must all be satisfied at the same time. Can address multiple columns.
-
property
columns¶ Columns that are checked by this conjunction.
-
filter_df(df)[source]¶ Filter given DataFrame w/ conjunction.
NULL-values will always treated as non-matching.
- Parameters
df (pandas.DataFrame) – DataFrame to evaluate on, must contain required column.
- Returns
result – Part of the DataFrame for which the conjunction holds.
- Return type
-
static
from_jsonarray(array)[source]¶ Recover conjunction from JSON-compatible array.
- Parameters
jsonarray (List[Dict[str, Any]]) – JSON-compatible array.
- Returns
conjunction – Recovered conjunction.
- Return type
- Raises
TypeError – If are wrong or unknown condition type was passed.:
ValueError – If
"type"attribute within a condition is missing.:
See also
to_jsonarrayCreates array, illustrates format.
-
static
from_string(s, all_types)[source]¶ Parse string as conjunction object.
Important
This is intended to be used for human interaction (e.g. CLIs). Do not use this for serializing and deserializing conditions, since this does not support all conditions and is not guaranteed to be roundtrip-safe. For the purpose of serialization, better use
to_jsonarray()andfrom_jsonarray().- Parameters
s (str) – String to parse.
all_types (Dict[str, pyarrow.DataType]) – Mapping from all known columns to pyarrow types.
- Returns
conjunction – Parsed conjunction.
- Return type
- Raises
ValueError – If condition cannot be parsed.:
-
classmethod
from_two(left: Union[kartothek.core.cube.conditions.Condition, kartothek.core.cube.conditions.Conjunction], right: Union[kartothek.core.cube.conditions.Condition, kartothek.core.cube.conditions.Conjunction]) → kartothek.core.cube.conditions.Conjunction[source]¶ Create conjunction from two elements.
- Parameters
left – Left part.
right – Right part.
- Returns
conjunction – Conjunction of the two given parts.
- Return type
-
property
predicate¶ Predicate to be consumed by Kartothek and DataFrame serializer.
-
split_by_column()[source]¶ Split conjunction by column.
Non-active conditions will be dropped.
- Returns
split – Conjunctions by affected column.
- Return type
Dict[str, Conjunction]
-
to_jsonarray()[source]¶ Converts conjunction to a list that can be used for JSON/YAML serialization.
Important
Not all value types that can be used within conditions are JSON-serializable (e.g.
datetimeobjects). The user is responsible of ensuring that these values can pass functions likejson.dumpor has to implement proper error handling.- Returns
jsonarray – JSON-compatible array.
- Return type
List[Dict[str, Any]]
Example
>>> import json >>> from kartothek.core.cube.conditions import C >>> conjunction = ( ... (C("x") > 1) ... & (C("y").isin(["foo", "bar"])) ... ) >>> array = conjunction.to_jsonarray() >>> print(json.dumps(array, indent=True, sort_keys=True)) [ { "column": "x", "type": "GreaterThanCondition", "value": 1 }, { "column": "y", "type": "IsInCondition", "value": [ "foo", "bar" ] } ]
See also
from_jsonarrayConverts array back into a conjunction.
-
class
kartothek.core.cube.conditions.EqualityCondition(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleConditionCondition on column equality.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP= '=='¶
-
class
kartothek.core.cube.conditions.GreaterEqualCondition(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleConditionCondition that describes that a column should be greater or equal to the given value.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP= '>='¶
-
class
kartothek.core.cube.conditions.GreaterThanCondition(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleConditionCondition that describes that a column should be strictly greater than the given value.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP= '>'¶
-
class
kartothek.core.cube.conditions.InIntervalCondition(column, start=None, stop=None)[source]¶ Bases:
kartothek.core.cube.conditions.ConditionCondition expressing that values of a column should be in a given interval.
- Parameters
columns (str) – Column name.
start (Any) – Inclusive start of the interval, optional.
stop (Any) – Exclusive stop of the interval, optional.
-
property
active¶
-
property
predicate_part¶ Part of the inner list for Kartothek predicate pushdown.
-
class
kartothek.core.cube.conditions.InequalityCondition(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleConditionCondition on column inequality.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP= '!='¶
-
class
kartothek.core.cube.conditions.IsInCondition(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleConditionCondition that describes that values in a column should be within the given list.
- Parameters
columns (str) – Column name.
value (Tuple[Any]) – Tuple to check for.
-
OP= 'in'¶
-
class
kartothek.core.cube.conditions.LessEqualCondition(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleConditionCondition that describes that a column should be less or equal to the given value.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP= '<='¶
-
class
kartothek.core.cube.conditions.LessThanCondition(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.SimpleConditionCondition that describes that a column should be strictly less than the given value.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
OP= '<'¶
-
class
kartothek.core.cube.conditions.SimpleCondition(column, value)[source]¶ Bases:
kartothek.core.cube.conditions.ConditionA simple condition that only emits a single predicate part. Must be subclassed.
- Parameters
column (str) – Column name.
value (Any) – To which value the column should be compared to.
-
active= True¶
-
property
predicate_part¶ Part of the inner list for Kartothek predicate pushdown.
-
class
kartothek.core.cube.conditions.VirtualColumn(name)[source]¶ Bases:
objectVirtual column that can be used to easily construct conditions.
The following operations are supported:
Operation
Python Example
Result Class
Equal
C("a") == 42Not Equal
C("a") != 42Less Than
C("a") < 42Less Equal
C("a") <= 42Greater Than
C("a") > 42Greater Equal
C("a") >= 42Is In
C("a").isin([1, 2])In Interval
C("a").in_interval(0, 100)- Parameters
name (str) – Column name.