The ultimate starter for building a full data stack using Airbyte, Dagster, dbt, and BigQuery.
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.
Dagster and dbt work best in ELT (Extract, Load, Transform) pipelines. In an ELT setup, you first load raw data into your warehouse, then use dbt for in-warehouse transformations orchestrated by Dagster. This article provides a quick start guide for building a data pipeline orchestration using Airbyte, Dagster, dbt, and BigQuery. For this example, we will extract data from Postgres, load it into BigQuery, and apply necessary transformations using dbt, all orchestrated seamlessly with Dagster. While this template doesn't delve into specific data or transformations, its goal is to showcase the synergy of these tools.
dbt converts raw data in your warehouse into analytics-ready models through SQL transformations, while handling testing, documentation, and dependencies. Dagster orchestrates entire data pipelines, including dbt runs, managing execution flow, monitoring, and scheduling. Together, they let you build reliable data pipelines where dbt handles the SQL transformations, and Dagster controls when and how everything runs.
Airbyte extracts data from various sources and loads it into your data warehouse using pre-built connectors, while Dagster orchestrates when and how these extracts run within your broader pipeline. Dagster can trigger Airbyte syncs, monitor their status, and coordinate subsequent transformations (like dbt) once the data lands. This integration lets you build end-to-end pipelines where Airbyte handles the EL (Extract-Load), and Dagster manages the workflow, ensuring data flows correctly from source to final transformation.
Before you embark on this integration, ensure you have the following set up and ready:
Get the project up and running on your local machine by following these steps:
For Mac:
For Windows:
Airbyte allows you to create connectors for sources and destinations, facilitating data synchronization between various platforms. In this project, we're harnessing the power of Terraform to automate the creation of these connectors and the connections between them. Here's how you can set this up:
Change to the relevant directory containing the Terraform configuration for Airbyte:
Within the infra/airbyte directory, you'll find three crucial Terraform files:
Adjust the configurations in these files to suit your project's needs. Specifically, provide credentials for your Postgres and BigQuery connections. You can utilize the variables.tf file to manage these credentials.
This step prepares Terraform to create the resources defined in your configuration files.
Before applying any changes, review the plan to understand what Terraform will do.
After reviewing and confirming the plan, apply the Terraform configurations to create the necessary Airbyte resources.
Once Terraform completes its tasks, navigate to the Airbyte UI. Here, you should see your source and destination connectors, as well as the connection between them, set up and ready to go.
dbt (data build tool) allows you to transform your data by writing, documenting, and executing SQL workflows. Setting up the dbt project requires specifying connection details for your data platform, in this case, BigQuery. Here’s a step-by-step guide to help you set this up:
Change to the directory containing the dbt configuration:
You'll find a profiles.yml file within the directory. This file contains configurations for dbt to connect with your data platform. Update this file with your BigQuery connection details.
To keep your credentials secure, you can leverage environment variables. An example is provided within the profiles.yml file.
Once you’ve updated the connection details, you can test the connection to your BigQuery instance using:
If everything is set up correctly, this command should report a successful connection to BigQuery.
Dagster is a modern data orchestrator designed to help you build, test, and monitor your data workflows. In this section, we'll walk you through setting up Dagster to oversee both the Airbyte and dbt workflows:
Switch to the directory containing the Dagster orchestration configurations:
Dagster requires certain environment variables to be set to interact with other tools like dbt and Airbyte. Set the following variables:
With the environment variables in place, kick-start the Dagster UI:
Open your browser and navigate to:
Here, you should see assets for both Airbyte and dbt. To get an overview of how these assets interrelate, click on "view global asset lineage". This will give you a clear picture of the data lineage, visualizing how data flows between the tools.
Once you've set up and launched this initial integration, the real power lies in its adaptability and extensibility. Here’s a roadmap to help you customize and harness this project tailored to your specific data needs:
Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.