Data & AI
Article

Snowflake Architecture: Organize Databases, Schemas & Tables

Madison Schott
March 22, 2022
15 min read
Limitless data movement with free Alpha and Beta connectors
Replicate data from or into Snowflake, in minutes
Learn more about the Snowflake connector ->

How to organise Snowflake schemas?

Now that you’ve created your databases, you can determine the schemas you need to build within each one. Snowflake schemas act as a more granular way to organize tables and views within your database. They should help you write your queries faster, knowing exactly where to find certain data.

Raw data and base models

Your “RAW” and “BASE” databases should be made of the exact same schemas. “BASE” is essentially just a copy of your raw database but with basic transformations applied. In both of these databases, We recommend creating a new schema for every data source. 

For example, if you ingest data from Google Ads, Facebook, Bing Ads, and Mailchimp, you would create a different schema for each of these. Your database would look like this:

Each Snowflake database contains the default schemas “PUBLIC” and “INFORMATIONAL_SCHEMA”. While we recommend deleting “PUBLIC”, “INFORMATIONAL_SCHEMA” contains views describing the content of each of the schemas. This can come in handy, so there’s no harm in keeping it. 

As you add new data sources, you can easily create a new schema for each one. Now, when accessing raw data or base models you know exactly where to look for every piece of data you need. 

Development and production models

Similar to “RAW” and “BASE”, “DATA_MART_DEV” and “DATA_MART_PROD” should have the same schemas. They are exact copies of one another except one is for development purposes and the other is validated and used by business teams. Within these, we only use two schemas: “INTERMEDIATE” and “CORE”.

These terms are commonly used in dbt documentation. Intermediate models are those that come in between base models and the final product, or the core data model. They are the output tables of the SQL files that don’t necessarily get used for analysis, but are an important step in building the final model. The only person that really needs access to these is the analytics engineer, or the one who coded them.

Core models are those that are the final product of a data model. They are the table that results from the very last SQL file in a sequence of code. These are the ones that data analysts and business users will need to access in your production environment. All analyses, dashboards, and reports will be built from these data models. 

How to Organise Snowflake tables & views?

Before deciding on how you want to name your tables and views, you need to decide when to use each. If you aren’t familiar, a view acts as a table but isn’t a physical copy of your data. It is a SQL query that sits on top of the underlying table, making it a great option if you are looking for a secure solution or way to save money.

Personally, we use views for all of our base and intermediate models in Snowflake. Because base models only contain basic transformations on the raw data, there isn’t really a need to store another complete copy. As for intermediary models, because analysts don’t directly query them, there is no need to waste storage space on them.

However, raw data and core data models are always tables. Your raw data needs to be stored as a physical copy in the form of a table. This is important for your core data models as well since these are typically run each day and frequently accessed. You want a historical record to exist.

Naming conventions for tables and views

The name of your Snowflake tables and views follows the same name as your dbt SQL file. dbt automatically generates tables and views with the same names as the file name where the code is present. Because of this, it is imperative that you create a consistent, strong dbt style guide as we've shared in a previous article.

If you look at the names of the SQL files within dbt, you will see those exact file names generated as tables within our Snowflake schemas. For example, we have fb_ads.sql, fb_campaigns.sql, and fb_ad_actions.sql as SQL files that contain code for the base models. If you look at the “BASE” database and “FACEBOOK” schema, you can see these exact file names as tables.

Conclusion

Starting the process of building your Snowflake architecture involves careful planning and design of databases, schemas, and tables/views. Documenting each component and its purpose is crucial. Reviewing this documentation with your team helps in deciding on the best architecture. As business needs evolve, you'll continually add new databases, schemas, and tables.

Don't hesitate to make changes if something isn't working for you. There are various organizational approaches to Snowflake architecture. Consider what aligns best with your use cases. Stay tuned for part two on setting up users, roles, and permissions within your Snowflake data warehouse.

In Snowflake architecture, integrating data science principles can greatly enhance database, schema, and table organization. By employing methodologies like predictive analytics and machine learning, businesses can optimize data storage for efficient insights extraction. This tailored approach enables more informed decision-making processes.

The data movement infrastructure for the modern data teams.
Try a 14-day free trial