Blog How I learned to stop worrying and love lineage Written by Laurent Paris on July 13, 2021 You’re a data engineer, and you’re dreading the coming week. This is the “week of hell”, the one when it’s your turn to be on-call. You will be responsible for any issues that might happen to the […]

Blog Datakin at Berlin Buzzwords 2021 Written by Amanda Bulger on June 30, 2021 Catch Datakin CTO, Julien Le Dem speaking at Berlin Buzzwords on the importance of having a healthy data ecosystem before you can take advantage and get value from your data. Berlin Buzzwords focuses on open source software projects in the field […]

Blog What is data lineage (and why should I care)? Written by Ross Turk on June 22, 2021 Any real-world data architecture is made up primarily of madness and chaos. Your most cared for data pipeline, the one that you spend a lot of time keeping neat, the one that moves your most important data, […]

Blog Watch: Datakin @ the Data & AI Summit 2021 Written by Ross Turk on June 3, 2021 Last week our CTO, Julien Le Dem, took the virtual stage at the 2021 Data & Ai Summit to discuss data lineage with OpenLineage and Apache Spark. If you missed it, fear not! The video has now […]

Blog Data Pipeline Diffing with Datakin Written by Peter Hicks on June 3, 2021 Figure 1: Datakin enables pipeline diffing across runs Envision yourself with a suddenly failing ETL task that you haven’t touched in months. You look at the code and nobody else has touched it, and nothing comes to mind about how this […]

Blog The business argument for data lineage Written by Ross Turk on June 1, 2021 There’s value in data, so the common wisdom goes. So your organization has started to collect, store, process and analyze data from any and all available sources. Over the past few years, the scale of these processes has steadily increased. […]

Blog Introducing OpenLineage Written by Julien Le Dem December 18, 2020 For anyone watching the space, the acceleration of the data revolution over the last few years has been very exciting. What started as experimental deployments of “big data” projects back in the early days of Hadoop has now morphed into full production, mission-critical deployments […]