Snowflake is a hot tool in the data world. It’s used by analytics engineers, data engineers, and data analysts. While it is considered a data warehouse, it is a tool that can be used to optimize run time and save money during the ingestion and transformation processes of your data stack.
Utilizing the right resources within Snowflake puts the power back into your hands with a tool that can otherwise get expensive fast. A simple thing like storing queries as views is an architectural change that can make all the difference for a business.
In this article, we will discuss the difference between views and tables, the different types of views that exist in Snowflake, and the use cases for each of them. By the end of this article, you should be able to identify the best option for storing your different datasets within Snowflake.
What is a view?
A view is a defined query that sits on top of a table. Unlike a table, it doesn’t store the actual data. It always contains the latest data because it reruns every time it is queried. Whereas a table is only as fresh as the last time it was created or updated, no matter when you query it.
There are two main types of views- non-materialized and materialized views.
Non-materialized views are what people typically think of when they think about a view. This type only runs when the view is actually queried, otherwise, it is not stored in the database.
Non-materialized views are great because they take up no storage space, which means you don’t have to worry about paying for a lot of storage. They also only run when they are needed, saving you money in computing resources. This means, if a source table isn’t needed for months or weeks at a time, you won’t have to pay to maintain it. You only pay for it once the analyst or analytics engineer resumes working with that table.
The best part? Non-materialized views still have all of the same functions as a table! You can perform joins, aggregations, and window functions on them if need be. You can also control who can read and write to the view based on their Snowflake user and role.
Unfortunately, just like with everything, there is always a con that comes with all of the pros. Non-materialized views are not ideal for large amounts of data with complex logic since this logic is run every time the view is queried.
For example, I typically create all of my source data tables as non-materialized views that reference my raw data. These are simple SELECT statements that contain basic functions such as column renaming, casting, and data cleaning. Because their underlying logic is simple, they run fast whenever I query these source tables.
If I were to create complex data models containing joins and window functions as views, chances are my views would never load when I queried them. Or they would just take an extremely long time! Obviously, this isn’t ideal. You would end up using way more computing power to run this query on a view than you would by creating that view as a table instead.
Remember: Non-materialized views are great to utilize, but only when the logic creating them is a simple SELECT statement.
Materialized views are the less common view out of the two we discuss. Materialized views behave more like a table. They are faster to query and considered more accessible than non-materialized views. And, just like a table, they take up more storage space in your data warehouse and require more computing resources. This in turn means they are the more expensive option out of the two types of views.
It’s not often that you will want to utilize them. In fact, I’ve never come across a use case where it made sense to use them. According to Snowflake’s documentation, you should only use materialized views if ALL of the following are true:
- The results of the view are used frequently
- The query powering the view uses a lot of resources
- The view changes frequently
It’s very rare for all three of these to be the case with your base/staging, intermediate, and core dbt models. Base/staging models don’t consume a lot of resources and intermediate and core data models don’t change frequently. Of course, there are always exceptions to this, but I have yet to experience a scenario when this is true.
How these views should be used in data modeling
If you are an analytics engineer, then you may be wondering how un-materialized and materialized views can be used in data modeling. Let’s look at dbt base (or staging) models as well as core models.
dbt Base Models
Dbt base models exist as views on top of your raw data. They are created as un-materialized views in order to keep the integrity of the raw data while utilizing proper naming conventions and company standards. The code in these models is basic SQL select statements that read directly from the raw data ingested into your warehouse via ELT from ingestion tools like Airbyte. A typical base model looks like this:
If you look at the underlying logic of this file in dbt, it actually compiles in Snowflake to look like this:
Because you are only using basic date functions and renaming columns, the views are still fast to query on demand. This in turn saves storage space that you would otherwise use to save an almost identical copy of the raw data.
dbt Core Models
Your core models in dbt are more complex than your base models and often contain multiple CTEs, joins, and window functions. While you may have a specific use case to create these as materialized views, you will most likely create these as a table in your data warehouse. Tables are ideal for handling complex transformations that will take a long time to run if stored as a view.
Here is a code example of one of my core data models:
When compiled in Snowflake as SQL, the code will look like this:
Notice that this is being created as a table within Snowflake rather than a view. This is ideal for any data that will be directly used in a BI tool, which most core data models are. They can be easily queried on demand without the underlying logic needing to be run. This ensures fast dashboards that stakeholders can trust.
Views and tables exist for different reasons in your data warehouse. Views do not store the actual data and can be used as a tool to save money with simple queries that sit on top of other tables. Tables should be utilized to store data generated by more complex logic, ensuring performance and availability are always high.
When used correctly, non-materialized views are a great tool for saving money within Snowflake without sacrificing performance. I highly recommend using them for your base models within dbt in order to create high-quality data that follows all of the company standards you’ve put in place. And, don’t forget to use tables with your core dbt models. The performance increase is worth the higher cost!