If you work for a smaller company, or even a start-up, then you know every dollar matters. Even in larger companies, departments are always looking for ways to reduce spend so that they can put their money into larger growth initiatives. Every few dollars saved in one tool can be put towards a new tool that will uplevel the modern data stack or even another brain to deepen analysis. Dollars are valuable!
Costs can be saved in many different ways. You can switch from an enterprise data ingestion tool to an open-source tool. You can choose to downsize your data team from 5 people to 3 people. Or, you could optimize the resources you are already using in order to save costs while keeping your current data stack and team the same. Personally, I’d choose this option.
If you are a Snowflake user like myself, then you will be happy to know that there are many different ways you can save Snowflake costs without compromising the flow of your data. We will get into how to do exactly this but first let’s review some key components of how Snowflake pricing functions. Understanding Snowflake pricing model will allow you to optimize resources and compute power.
Snowflake pricing: How Snowflake costs are calculated
Snowflake pricing is based on your compute, storage, and cloud services usage. Compute costs are calculated depending on how long your warehouses are running. Storage costs are calculated based on the amount of data you are storing. The cloud services depend on the Snowflake features you are using. In this section, we will dive deeper into how Snowflake charges you for each of these components.
A virtual warehouse is simply a cluster of compute resources. Each warehouse provides the necessary resources such as CPU, memory, and temporary storage, to complete a task. Warehouses come in sizes x-small to 6X-large, doubling in cost and compute power between each tier. Your Snowflake compute costs depend on the amount of time that the warehouse is continually running, or “online”.
When the warehouse is running, it is available to compute resources. This means that if your warehouse runs for 10 minutes at the top of each hour, you will only be charged for those 10 minutes rather than the whole hour. A warehouse is spun up automatically when you run a query. However, it is not shut down automatically unless you set an auto-suspend period. Later in the article, we will discuss how to ensure your warehouses aren’t continually running and shut off.
Many people are surprised to know that Snowflake’s storage costs are actually pretty low. Snowflake storage costs are based on the daily average amount of data stored across database tables, clones, and failover regions. This cost is relatively low compared to that of virtual warehouses. When optimizing costs, it’s important to focus on the latter rather than storage. Plus, you probably don’t want to delete any historic data that could be of value to the business.
Data transfer costs
There can be a few other Snowflake costs depending on the features you and your team are utilizing like data transfer costs. If you are transferring data from one Snowflake region to another, or to another cloud data platform, that will cost a per-byte fee.
Serverless features costs
Snowflake also charges for serverless features such as Snowpipe, a file-loading service. However, you are charged for actual compute usage since Snowflake manages the warehouses here, rather than the customer-managed ones that can sit active but unused.
6 Ways to reduce costs in Snowflake
1. Change your warehouse size
An easy solution to reduce Snowflake costs is to reduce the size of your virtual warehouse. We often think larger is better and more efficient in terms of compute, but this isn’t always the case. A lot of times we can reduce our warehouse size without suffering from performance issues. However, this isn’t always the case. You really need to play around and see what works for you.
Each warehouse size has a different number of available nodes. A query will utilize all of the nodes available within the warehouse to properly execute itself. However, if the query is small, it may not even use all of the available nodes. If it’s not using all of the available nodes, then you are paying for unused resources.
Again, you really don’t know which size will work best for you until you try. I would start with a small or medium size warehouse, unless you are querying large files, and experiment from there. Let your data pipeline run a few days with one size and keep track of performance and cost. Then, switch the size and measure the results. After you’ve tested a few different sizes, you’ll be able to see which optimizes both cost and performance for your particular use case.
In order to change the size of your virtual Snowflake warehouse, you can use the following command:
One important thing to remember with warehouses is that increasing the size of your warehouse does not always result in an increase in performance. A lot of times engineers will increase their warehouse size in hopes of their data pipeline running faster, but this can backfire in terms of cost and compute.
I often recommend creating multiple Snowflake warehouses, one for each function of your data stack, so you can see which activities cost the most. For example, I create separate warehouses for data ingestion, transformation, and analysis. Separating you warehouses by function will allow you to hone in on which parts of your stack need to be optimized.
2. Decrease the number of warehouses running at the same time
As I mentioned earlier, you pay for the time your warehouses are running. If you have multiple processes running, such as transformation and analysis, you are paying for the cost of two warehouses running at once. Despite what I mentioned earlier about creating different warehouses for different processes, you can lower your Snowflake costs by bringing this number down. However, you won’t know which process is utilizing the most compute power, making it a little fuzzy about which parts can be optimized.
This all just depends which is best for you. I recommend trying both options and seeing how much the cost savings is and then assessing whether that is worth not knowing which processes are running up costs. To successfully condense your warehouses, you first want to think about which processes run at the same time.
For me, data ingestion and transformation run hand in hand with my automated data pipeline. I also use the transformation warehouse to test dbt development models during the day, at the same time the data analyst is performing analyses. Instead of having three separate data warehouses, it may benefit me cost-wise to only have one to two warehouses for data pipeline activities and another warehouse for development.
3. Decrease the sync frequency of your data ingestion tool
Previously, I would only run data syncs once every morning using a data ingestion tool like Airbyte, when my data pipeline was automated to run. Then, I had a need for semi real-time data within Snowflake. After increasing data syncs in Airbyte to once every 15 minutes, Snowflake costs skyrocketed. Because data was being ingested every 15 minutes, the data warehouse was almost always running. Rather than it running for an hour or so each day, it was running for 24 hours a day.
It’s important to assess how frequently you need fresh data and data models in your data warehouse. If your business is depending on them at a certain frequency, it is probably worth the extra cost. However, if you are running these syncs multiple times a day when nobody is using fresh data, you are quite literally throwing dollars out the door. Look at your most important dashboards and see how often they are refreshed or checked by business teams. Then talk to your analyst and ask them about the data they work with on a daily basis. Is it necessary to have hourly or multiple data syncs per day?
4. Decrease the warehouse’s auto-suspend period
Decreasing the warehouse’s auto suspend period is the tip that has had the biggest impact on my Snowflake costs. Like I mentioned earlier, I have multiple warehouses running at a time, each for different processes. Because of this, every minute counts. When you create a new virtual warehouse, you have the option to set an auto-suspend period. The lowest default value they provide for you on the UI is 5 minutes. This is actually a long time if you have multiple processes running.
Every minute is valuable to optimize your Snowflake costs! If you create your warehouse manually, rather than in the UI, you can actually set your own auto-suspend time.
I highly recommend setting it for 30 seconds. This is just long enough for back-to-back queries to execute without the warehouse being shut down but short enough so that you aren’t paying for extra downtime.
Now if you look at your warehouses on the UI, you can see that the “Auto Suspend” time is 30 seconds rather than 5 minutes.
5. Use materialized views
Optimizing the run time of your Snowflake queries is one way to reduce your Snowflake costs. Materialized views in Snowflake are only executed against all the data in a table once. After executing against all the available data on its initial run, it only runs against new data added. It does this by utilizing INSERT and UPDATE commands.
This is powerful because of the potential it has to save you in compute costs. Because the query isn’t being executed on all of the data, just new data, it takes a lot less time to run. Less run time means less time a warehouse needs to be active, saving you costs.
Materialized views can only be utilized on data models that use simple SELECT statements rather than those that use joins or aggregation. I highly recommend using them on your base or staging dbt models, since these are basic select statements from your raw data tables.
6. Change your Snowflake account plan
Lastly, you can always reevaluate the account plan that you are using. Depending on your company size and the amount of data you use, enterprise plans may not make sense for you. I’ve seen a lot of people get roped into long-term contracts, thinking they were getting a discount, but in reality, they didn’t use nearly as many credits as they were paying for. Before signing anything, make sure you have a clear understanding of your monthly and yearly usage. Sometimes it may be better to test Snowflake out for a year and then potentially sign a contract if that makes sense for you. If you have no idea how many credits you will be using each month, don’t lock yourself into anything for a small discount. You will probably end up losing money in the long run!
The best thing you can do to optimize costs in Snowflake is try new things! You definitely won’t save any money if you keep your structure the way things currently are. In this article we covered six ways to optimize your Snowflake costs:
- Change your warehouse size
- Decrease the number of warehouses running at the same time
- Decrease the sync frequency of your data ingestion tool
- Decrease the warehouse’s auto-resume period
- Use materialized views
- Change your Snowflake account plan
Pick a few of these Snowflake optimization tips that you can easily implement and keep track of how they perform over the course of a few days. It’s all a balancing game between cost and performance. Performance may suffer, or it may soar while saving hundreds of dollars a week. You don’t know unless you try!
The beauty of the modern data stack is that it’s so easy to try new things. You don’t have any physical location or hardware you have to deal with. With a simple click of a button you can change the compute power of your entire data stack. And, you can change it right back a few days later! Take advantage of this in order to save whatever costs you can. There’s nothing that a business loves more than hearing about money saved or money made. Good luck!