Announcing our acquisition of Grouparoo to accelerate Data Movement. Learn more
Security within your data warehouse is the most important part of your Snowflake architecture and one that often gets put to the back burner. A lot of times we don’t see it as being as important as the data itself. However, setting up the correct roles and permissions is what keeps data integrity strong. Configuring this from the start ensures problems are prevented.
Let me paint a picture of what can go wrong. You create one role within Snowflake that all users within the business use. An analyst, marketer, and engineer all have the same types of access to each database. You have a reporting and data analysis (RDA) database, used for one-off analysis and a database where all your raw data is stored. A marketer is looking for a report that was run in the RDA database but accidentally runs a command on the RAW database. They overwrite the data and now that raw data is gone forever.
You get the point. Not everyone within your organization is going to know how to use Snowflake or properly query the data. You want to mitigate the risks so that they can use the data while you can live in peace knowing it’s safe.
When configuring your Snowflake environment I find it is easier to start by creating your users, then using the users to determine the Snowflake roles needed for your environment. From there you can create your corresponding warehouse and reset the defaults of each user. Let’s start with creating our users.
Every person and tool that accesses your data warehouse should have its own user. This means Josh the analyst needs his own Snowflake user, Sarah the engineer needs her own user, and each individual tool connecting to your data warehouse needs its own user. This will keep the reigns tight on who has permission to access certain resources. It will also allow you to track credit usage and changes made.
It is a best security practice to have each person and tool with their own username and password. This way passwords are not being thrown around and you know each person that has access to your company’s data. Also, keep in mind that you may not want everyone in your company to have access to Snowflake.
To create a user in Snowflake, run the following command:
When creating a user, you will have to assign them a temporary password. Make sure you set MUST_CHANGE_PASSWORD=true for security purposes. This will force a user to change their password the first time they log in. You can also do this on the UI and simply check the box at the bottom:
However, if you are creating a user for a tool, you do not want to select this. For example, if you are using Airbyte as your data integration tool of choice, you would create a user, AIRBYTE_USER. You would not want to check the box that requires a password change. The tool won’t be able to change its password on login, so make sure you assign it a secure password from the start and store it in a password manager application like 1Password.
Notice above when creating the user with the command how I also set certain defaults such as a role and warehouse. This is important, especially when users are first learning how to use Snowflake. They may not understand how to change their role or use the correct one. Assign them a default role to help manage the permissions they have. For example, an analyst will be most often analyzing the data, so their default role should be the one related to reporting, even though they may be transforming data at times.
You will also want to set a default warehouse to help manage costs and usage of your data warehouse. This isn’t necessarily a security feature, but it will help you understand where your Snowflake credits are being utilized. Assign the user the warehouse related to their role. We will talk more about specific roles next.
The warehouses you create should match the roles you create. This means, for every role, you should have a corresponding warehouse. I have a “LOAD_WH”, “TRANSFORM_WH”, “ANALYZE_WH”, and “REPORT_WH”. Each warehouse correspond to the roles I will mention in the next part. These will help keep resources organized and allow you to track which parts of your data pipeline are using the most Snowflake credits.
We just discussed roles as relating to the default warehouse you should set, but what roles should you be creating in your warehouse in the first place? Here we will go over the different types of roles and the specific permissions they should have. While this varies from business to business, and depends on the types of users accessing your warehouse, the principles remain the same.
Before diving into the types of roles and the permissions they should be granted, let’s briefly discuss how I organize my Snowflake data warehouse. I have a database RAW which ingests all of my raw data. My BASE database reads from this RAW database to create views to be used by my data models. I then have DATA_MART_DEV and DATA_MART_PROD databases for running my core data models in both development and production. Lastly, I have an RDA (Reporting & Data Analysis) database for one-time reports and queries. You can read more about why I organize my Snowflake data warehouse this way here.
Now, let’s talk roles!
It's a best practice to create a different role for the ingestion tool, analyst, engineer, and business user within your organization. You can also create specific roles for BI tools, or reverse-ETL tools is you use one. Creating a different role for each user and tool will allow you to control what each of these users can and cannot do within each database and its schemas.
Ingestion tool permissions:
Yes, you read that right - your ingestion tool should have it’s own role! This is the only role that should be allowed to write to your RAW database. Your ingestion tool is the only thing that should be creating schemas and tables within your RAW data location. This should never be allowed to be written to by anyone within the company.
The ingestion tool is the only way to write raw data to the database.
This is the most important role within your database because of the power it has to access raw data. Your raw data needs to be as secure as possible because the integrity of all your data models depends on it. If people within the company are poking and prodding this raw data, you can never be sure that it is 100% accurate. Having this role in place increases confidence that this data is dependable.
I call this role “LOADER” because it is the tool (or tools) loading raw data into your data warehouse. This is the role you would feed Airbyte when ingesting data from a source into Snowflake. You can read more about the specific permissions you need to give the role in order to properly use Airbyte. When creating roles for specific tools, it is always helpful to read the requirements in that tool’s documentation.
Airbyte provides a script to set up all the needed permissions in order for the tool to ingest data into your warehouse.
Analytics Engineer permissions:
The analytics engineer, or whoever is writing the dbt data models, should be the only one who has access to the RAW database. However, they should only be able to view and select from it rather than create or delete from it. This way they won’t be able to alter the raw data.
I call this role “TRANSFORMER” in my data warehouse. It is used by the person and tool transforming the data. Those with this role have access to read from RAW and write to all other databases where your models are stored.
Because the analyst doesn’t create base models, there is no reason for them to have access to the raw data. Everything they need should be located in BASE, mitigating risks of accidently deleting raw data. In addition, they should only have view and select access on BASE and DATA_MART_PROD since they are only selecting from these in their queries rather than writing to them.
I call this role “ANALYZER”. They aren’t transforming any data directly within the data warehouse but instead running queries using the data models and writing them to “RDA”.
Business user permissions:
Business users should have the ability to view data and do basic selections, but not alter the data in any way.
I call this role “REPORTER” because this role only reports on the data rather than writes it. This role is also used by data visualization tools such as Tableau and Looker.
In order to create a role in Snowflake, you can run the following command:
In order to grant permissions to a role in Snowflake, you can run the command:
Lastly, make sure you assign your roles to the appropriate user. You can do so using this:
You need to grant certain permissions to the database, schema, tables/views, and future tables/views. Snowflake’s permissions are unique in that you can’t assign a permission to the database and expect it to also apply for the schemas and tables/views within that database.
First, let’s review the most important permission you can grant a role on the database level.
Now, let’s review important permissions on the schema level.
Lastly, here is where things get different. Tables and views have unique privileges from databases and schemas.
With tables, views and schemas, you also need to make sure you grant permission to future tables and views within a schema or database. This will make it so you don’t have to keep granting access on new objects that get created.
It’s a best practice to document your Snowflake architecture, especially the permissions of each role, before running any Snowflake command. Granting permissions can get confusing if you don’t have everything clearly documented. I know it can get frustrating when you think you’ve granted all the needed permissions and then a role still can’t access what you need.
Take your time. Don’t just grant a role all the permissions available because you are frustrated. Well-documented Snowflake users, roles, and permissions are key to a secure data warehouse. The initial work goes a long way as your organization grows and begins to rely more and more on the data ecosystem you’ve put into place.
Madison Schott is an Analytics Engineer, health & wellness enthusiast. She blogs about the modern data stack on Medium.
Get all your ELT data pipelines running in minutes with Airbyte.