Blog Watch: Datakin @ the Data & AI Summit 2021 Written by Ross Turk on June 3, 2021 Last week our CTO, Julien Le Dem, took the virtual stage at the 2021 Data & Ai Summit to discuss data lineage with OpenLineage and Apache Spark. If you missed it, fear not! The video has now […]

Blog Data Pipeline Diffing with Datakin Written by Peter Hicks on June 3, 2021 Figure 1: Datakin enables pipeline diffing across runs Envision yourself with a suddenly failing ETL task that you haven’t touched in months. You look at the code and nobody else has touched it, and nothing comes to mind about how this […]

Blog Advantages of tracing data lineage Written by Ross Turk on June 1, 2021 There’s value in data, so the common wisdom goes. So your organization has started to collect, store, process and analyze data from any and all available sources. Over the past few years, the scale of these processes has steadily increased. But […]

Blog Introducing OpenLineage Written by Julien Le Dem December 18, 2020 For anyone watching the space, the acceleration of the data revolution over the last few years has been very exciting. What started as experimental deployments of “big data” projects back in the early days of Hadoop has now morphed into full production, mission-critical deployments […]