Datakin is now open to all! Written by Laurent Paris on Sep 24, 2021

This is it! We’re officially out of beta and excited to announce the general availability of Datakin.

Our story began with the creation of Marquez over two years ago. We believed then, and still believe now, that a new approach to data lineage was essential to support today’s pipelines. We know the road ahead is even longer and full of potential. The future of data lineage is bright, and we’re the first in a new wave of data operations tools and practices.

This initial release of Datakin includes features that will help you get a fresh perspective on your data pipelines, and quickly troubleshoot and repair any issues that might arise. You can try it for free for 30 days.

Gain visibility of your entire ecosystem

Datakin integrates with your data processing and scheduling platforms to give you full visibility on the flow of data across teams and systems. By observing the data transformations within various pipelines in real time, we surface up to date information about datasets and the logic processing them.

Seeing a lineage graph of your pipeline is illuminating! It makes cause and effect apparent, showing bottlenecks and illustrating how far bad data has spread.

Datakin currently integrates with the following systems:

  • Scheduling: Airflow, dbt
  • Data processing: Spark
  • Data sources: Snowflake, Google BigQuery, Amazon Redshift, PostgreSQL
  • Data quality checks: Great Expectations

This is only the beginning. We continuously work on the development of additional OpenLineage integrations, and plan to make new ones available regularly.

Quickly identify and resolve data lateness issues

Because Datakin traces data lineage in real time, it can identify all upstream dependencies involved in the production of a specific dataset. It clearly shows the time spent in each stage of the relevant data pipelines.

This information is summarized in a Gantt-style chart that allows to quickly identify performance bottlenecks and the root cause of delays in data production.

Monitor and identify the root cause of data quality issues

Through our integration with Great Expectations (and, in the future, other data quality frameworks) Datakin tracks the evolution of quality metrics over time and checks the integrity of your data as soon as it gets updated.

Leveraging our knowledge of data lineage, it then becomes easy to quickly identify the root cause of data quality issues.

Tell us what you think!

We’re proud of how Datakin has evolved the past few months, and welcome your feedback! Sign up now to see it for yourself, and let us know what you think by clicking the orange circle in the lower right corner of any Datakin page. If one of us is not online, the robot will pass along whatever you type.