Connecting your base models to their corresponding source files is pretty simple if you've followed all my advice thus far. You just need to follow some simple syntax. However, if you haven’t been organized about following proper naming conventions, you could run into some trouble.
Base models are the models that reference these source files that you created. You do so by selecting from them using the following syntax:
{{ source(‘<source_name>’,‘<table_name>’)
Source name is the name at the very top of your source file listed under “sources”.
The table name is one of the name blocks you defined under the “tables” section in this file.
These data sources and table names can get confusing to remember if you aren’t following proper naming conventions. That’s why I always recommend creating your own style guide to follow. It’s always nice to have a document to reference and hold yourself accountable to the standards you’ve set.
Another thing to note, if you have sources or tables with the same names, dbt will throw an error when you try to run your model. For example, you can’t have two different campaigns tables under two different sources. You know they are different, but dbt does not. This is why I recommend prefixing your source and table names with the data source itself.
In terms of the content in a base model, like I mentioned earlier, you only want to include basic transformations. This is where you can cast dates or timestamps, rename columns, and round any decimals. Base models are not the place for aggregate functions and any type of filtering.
Here’s what my Mailchimp campaigns social card base model looks like:
select
title AS social_card_title,
image_url AS social_card_image_url,
description AS social_card_descriptio
from {{ source('mailchimp_campaigns',
'madison_mailchimp_campaigns_social_card')}}
Notice how I rename the columns to follow guidelines that I outlined in my style guide. I included a “social_card” prefix here in case I ever join these columns to another table. While it is rare this table will be joined, since it lacks a primary key, it is a good habit to practice in your models for reserved keywords like “title” and “name”.
Defining your models in your source files
After writing your base models, you want to revisit your source files. In addition to the “sources” block within this file, you can add a “models” block. This is a best practice to keep track of which models reference the sources you have defined.
In the corresponding source file, you will want to list the names of the base models that reference the source. You follow the same syntax you did for sources and tables. It should look something like this:
Again, I recommend adding as many details as possible. Always include a description so whoever is looking at your data models knows exactly why that model exists. This is also important in case you ever deploy your documentation.
While I don’t always do this myself, it is also a great habit to add the names of the base model columns here. This is a great thing to do once all of your models have been fully built out and you have some extra time on your hands. It can definitely be a hard thing to prioritize when you have lots of impact to make within your organization. But, the further you move from the start of this process, the more important this documentation will become.
Improving your data pipelines with a style guide
Utilizing source files helps keep the integrity of your raw data. By always selecting from the base models instead of the sources, you ensure your raw data is never touched and always remains the same as when it was ingested. Having a copy of your raw data sources at all times will help prevent any data losses in the case that something goes wrong.
In addition, base models help keep consistency across all data models. Raw data sources all ingest data using different standards and naming conventions. However, by utilizing base models you are the one to set the style of your data models. You put in place the correct column names and data types before building your more complex models.
dbt also has a feature called dbt test which allows you to test your base models to ensure they are following the style guide you have put in place. We will get into these more in another article.
Conclusion
Organization is key when it comes to writing your data models using dbt. And where does organization come from? Your style guide that applies the best practices to organize your dbt projects. It is imperative that you create standards for your dbt environment before starting any coding. Naming conventions and data type casting shouldn’t be left on the back burner as something you think about briefly when writing your models.
The ease of use of your models all comes down to how well everything is documented and how readily available it is to everyone on your team. Make sure your style guide is available to all members of your team to reference whenever they need it. Personally, I push my dbt style guide to a shared Github repo along with all of our SQL and source files. This way it is always there for whoever wants to read it.
dbt is an essential part of your ELT process when you combine with data ingestion tool like Airbyte. It has amazing documentation features, upholds the integrity of your data, and makes data models easy to debug when something goes wrong. Take full advantage of the power it has to offer your team. Set it up right from the beginning and your analytics workflow will be smooth sailing.
The data movement infrastructure for the modern data teams.