How to Setup dbt Staging Environment?

•

November 26, 2024

•

20 min read

Summarize with ChatGPT

The data build tool, or dbt, is an essential component of data engineering workflows. With its robust capabilities, you can transform raw data into analysis-ready format, which can be used to extract actionable insights.

dbt allows you to create different environments within your workflow so that you can maintain separation between the development, testing, and production stages. One crucial dbt environment is the staging environment, which evaluates the results of the development environment, making the data production-ready.

This article will highlight the concept of dbt staging environment, its benefits, and how you can configure it using two different steps.

What Are the Different dbt Environments?

dbt environments are crucial components of data workflows as they help segregate development, staging, and production stages. Partitioning environments this way enables you to ensure end users interact with a separate environment from the one engineering teams work with. This is an essential step to maintain the integrity of the production environment while allowing proper development and testing techniques.

To achieve this, you must create a YAML file, profiles.yml, to configure different environments. The profiles.yml file must be available in the appropriate dbt directory so that dbt can read it and create different environments.

What Is dbt Staging Environment?

The dbt staging environment is a testing environment that appears before the production stage. Similar to development and production, this environment includes three different layers: staging, intermediate, and mart.

The staging layer is the next stage after data acquisition that enables you to perform basic transformation operations like data cleaning, standardization, and more. According to dbt Labs, staging models are atomic, representing the basic building blocks for any dbt project.
The intermediate step represents the stacking layers of logic with the objective of preparing staging models for further processing.‍
Data marts, on the other hand, combine modular pieces into a rich version of entities that matter for business use cases.

What Are the Benefits of a Staging Environment?

Here are a few benefits of using a staging environment:

Adding an additional layer before deploying the changes plays an important role, allowing you to test and explore your model in the staging layer.
Isolating development and production environments enables you to work on different features like deferral and cross-project references without accessing data in the production environment. The deferral feature makes it possible for you to run a subset of a model without requiring the execution of the associated model.
The dbt staging environment allows you to define, edit, and trigger ad hoc jobs while separating the production environment using environment-level permissions.

Enhance Your Data Migration Journey with Airbyte

Before starting the data transformation process with dbt, it’s crucial to establish a proper data extraction strategy. Following certain best practices can help you effectively transform raw data.

A preliminary step is creating a centralized repository that stores all the necessary information for easy accessibility. Although the manual method of migrating data from different sources to warehouses or databases is effective, it has certain limitations. Conducting this process can be error-prone and time-consuming. This is where SaaS-based tools like Airbyte can be beneficial.

Airbyte is a no-code data integration tool that enables you to migrate data from different sources into your preferred data store. It offers 400+ pre-built connectors, allowing you to extract structured, semi-structured, and unstructured data from various sources and create a centralized repository. If the connector you seek is unavailable, you can build custom connectors using Airbyte’s Connector Development Kits (CDKs) and Connector Builder.

By integrating Airbyte with dbt, you can migrate your data into a centralized location and immediately execute custom modifications. This integration streamlines ELT data workflows and helps you generate useful business insights.

Unique features of Airbyte:

AI-Powered Connector Builder: The Connector Builder's AI assist functionality automatically reads through your preferred connector’s API documentation and auto-fills most configuration fields. This feature simplifies the process of developing custom connectors for you.‍
Vector Databases Support: Airbyte supports popular vector databases, including Pinecone, Weaviate, Qdrant, and Milvus. These databases can store vector embeddings, which can then be used to train large language models (LLMs).‍
Automated RAG Transformations: With automatic chunking, embedding, and indexing operations, Airbyte allows you to transform raw data into vector embeddings. These embeddings are beneficial for streamlining the outcomes of LLM-generated content.

After moving the data to your preferred destination, you can write custom queries to extract valuable insights and generate reports. To further enhance the model evaluation step, orchestrate dbt projects by integrating dbt with Apache Airflow.

Configuring a dbt Staging Environment

To set up a staging environment in dbt, you can use dbt Cloud or dbt Core. Follow the steps in this section to learn dbt how to setup staging environment.

Create a Staging Environment in dbt Cloud

On the dbt Cloud interface, navigate to the Deploy option and select Environment.
Click on Create Environment and select Deployment as the environment type.

Select Staging for the Set deployment type option.
Choose the dbt version and click Save.

To complete the rest of the steps, you must provide the necessary credentials to work with the staging environment. For example, the above image highlights the credentials required to set Postgres deployment.

Create a Staging Environment in dbt Core

To create a staging environment in dbt Core, you can create a profiles.yml file with the following content:


my_project:
 outputs:
   dev:
     type: [data_store_type]
     host: [host]
     user: [user]
     password: [dev_password]
     port: [5432]
     dbname: [dev_database]
     schema: [dev_schema]
   staging:
     type: [data_store_type]
     host: [host]
     user: [user]
     password: [staging_password]
     port: [5432]
     dbname: [staging_database]
     schema: [staging_schema]
   prod:
     type: [data_store_type]
     host: [host]
     user: [user]
     password: [dev_password]
     port: [5432]
     dbname: [prod_database]
     schema: [prod_schema]
 target: dev

In the above file structure, replace the [placeholder] values with your credentials. This code defines three different dbt environments, including developer, staging, and production.

Finally, to run the staging environment in the command line interface, execute this code:

dbt run –target staging

Conclusion

Creating a dbt staging environment provides a separate space, allowing you to evaluate data quality. This article covered how to setup dbt staging environment in both Cloud and Core. Although dbt provides easy-to-perform processes to develop a dbt staging environment, you must follow certain best practices to get the best results.

Limitless data movement with free Alpha and Beta connectors

Introducing: our Free Connector Program

The data movement infrastructure for the modern data teams.

Try a 14-day free trial