kartothek - unified metadata for datasets

Date:Dec 03, 2019

Datasets are a collection of files with the same schema that reside in a storage. kartothek offers a metadata definition to handle these datasets efficiently. In addition, the kartothek.io module provides building blocks to create and modify these datasets. Handling of I/O, tracking of dataset partitions and selecting subsets of data are handled transparently.

To get started, have a look at our Getting started guide, head to the description of the Specification or read more about the In- and Output module and learn about data pipelines in kartothek.