How Airbyte Powers Datadog’s Self-Serve Analytics Tool
Jean-Mathieu Saponaro
You can take a look at Datadog's original article if you wanna learn more about the platform they’ve built here!
The Problem
With how fast the tech industry moves, being able to support the growth of your company at scale with the amount of data each team needs is crucial. Not only that, but having an efficient way to deliver the correct data, in a timely manner, catered / formatted to each respective team, and accessible by every team individually is also key. But what are some of the challenges that would even prompt you to build a tool to support this?
As your company grows, so does the size of each team. With this growth, you could have situations where data is just scattered across many different applications or services. For cross functional collaboration, this would not be ideal as none of your data is organized and it will only get worse as the company grows. You would also not have a single, unified place for data visualization. This is important as you may need to find real-time information on the company’s KPIs, statistics for a specific department, the list goes on.
Datadog has seen this growth internally as they’ve grown from 200 employees and 1 product in 2015, to now having 5,000 global employees and dozens of products today! This amount of growth sparked them to build their own self-serve analytics tools at scale to ensure that each org in the company could reach for data on their own. This allows them to make data-driven decisions without needing help from others. There is no more need for teams to answer these one-off questions all the time and instead, they can focus on more impactful tasks to keep driving the company forward.
The cost of building one of these platforms in-house, from scratch, is more than most data organizations can afford. But you don’t have to build your platform from scratch anymore - today, there are a handful of open-source components that can be assembled into a highly-customizable self-serve analytics platform. Datadog chose this approach when building their own solution, and it all starts with the ingestion layer.
The Ingestion Layer
The first concern for Datadog’s self-serve analytics platform was how to ingest data from a large number of diverse data sources. Their ingest layer had to be flexible enough to accommodate all these sources - and any new sources in the future.
Another concern for the Datadog team was usability - the technical proficiency of their platform’s users will vary, so in order to achieve a truly self-serve experience for adding new data sources to the platform, they needed a solution that was simple to operate as an end user.
Ideally, users would be able to choose from a catalog of connectors for their data source - and easily build new connectors or customize existing connectors for their specific use case.
Building an in-house solution that meets these criteria is often one of the most challenging parts of creating your own self-serve analytics platform. What Datadog needed was a reliable, extensible, and user-friendly solution to empower any department to own their data ingestion pipelines. One that could easily integrate with the rest of their data tool ecosystem.
That’s where Airbyte comes in.
How Airbyte Comes In
Datadog’s self-serve intake layer is largely built on Airbyte. Airbyte’s ease of use and extensibility allowed any team in the company to push their data into the platform - without assistance from the data team!
Because Airbyte already has most of the connectors built for the integrations they’re looking for, and makes it easy to add new ones, the data team at Datadog was able to avoid building fragile in-house connectors that they would ultimately have to maintain themselves.
The connectors are all maintained by Airbyte as well as with the lovely help of the open source community. So this effectively is less of a headache for the Data & Analytics team at Datadog, which aligns with our goal of enabling teams to only focus on high-impact tasks, rather than the mundane, time consuming ones. With Airbyte being leveraged in the platform, it’s now easy for teams to extract data from their data sources, load them into the rest of the pipeline which in this case goes to two of Airbyte's Technology Partners for transformation and storage, dbt & Snowflake.
Conclusion
Datadog's journey from a small team to a global force of 5,000 employees with a diverse product range shows the critical importance of scalable, efficient data management. Their "Bring Your Own Data" (BYOD) tool stands as a testament to the power of self-serve analytics and open-source solutions, such as Airbyte, in driving a company's data-driven decision-making process.
In conclusion, Datadog's approach, with Airbyte heading their data-intake platform, serves as a blueprint for other companies seeking to scale their data infrastructure. It emphasizes the value of open-source tools in simplifying data integration and also highlights the importance of enabling teams with self-serve data tools to foster a data-driven culture that drives business growth and innovation.