Snowflake, the Data Cloud company, has had an incredible journey since it was founded in 2012. With triple-digit growth, sky-high net retention rates, and a record-breaking IPO, it's no wonder Snowflake is the talk of the data world.
This blog post will closely examine Snowflake's success story, current architecture, features, and what truly sets it apart. I’ll also discuss what the future may hold for the platform. To give you a firsthand perspective, I talked with Madison Schott, analytics engineer and long-time Snowflake user, to get her experience and insights.
A Brief Story of Snowflake
Back in 2012, Benoit Dageville and Thierry Cruanes, both seasoned data architects at Oracle, noticed that data warehouses were struggling to keep up in the era of big data. They knew they had to come up with something new. And so, Snowflake was born – a data platform built specifically for the cloud. This was a game-changer because traditional data warehouses were on-premises, which made them expensive and time-consuming to set up and scale.
Dageville and Cruanes had a clear goal when they founded Snowflake: to create a data warehouse service that was fast, easy to use, and affordable. They also wanted to take full advantage of cloud computing.
One of the things that set Snowflake apart was that it was designed to separate storage from compute so that they could scale independently. That meant customers would only pay for what they used. Plus, users could create virtual data warehouses – independent compute resources – in real-time and turn them off when they didn't need them. Finally, Snowflake was designed to work across different cloud platforms, so customers wouldn't be "locked in" to just one service provider.
At first, Snowflake was built on top of Amazon Web Services (AWS). The situation created a bit of a conflict since Snowflake was also competing with AWS's product, Redshift. Investors might have been worried, but Snowflake had a secret weapon: it would run on other cloud platforms like Microsoft Azure and Google Cloud (GCS) as well.
Today, Snowflake and AWS are key partners and even co-sell each other's services. In fact, Snowflake committed to spending $1.2 billion on AWS through 2025. The ball is in Snowflake's court, with GCS and Azure also vying for Snowflake's business.
Snowflake uses a unique utilization model for pricing, which means users only pay for what they consume. Pricing is based on the separation of storage and compute, with compute charged based on processing units (called "credits") and storage billed separately. This flexible and straightforward pricing model is one of the reasons why Snowflake has taken the tech world by storm.
Snowflake's impressive triple-digit growth and sky-high net retention rates have attracted heavyweight investors, making it one of Silicon Valley's biggest success stories. The numbers speak for themselves: Snowflake set the record for the biggest software IPO in the US, raising over $3 billion and being valued at around $70 billion. Even Warren Buffet joined!
The relevance of these events cannot be overstated, and Benn Stancil described it better: “When the history books on the modern data stack get written, two moments will thus far define its arc. One is dbt Labs stampeding through the ecosystem in 2020 and 2021, and the other is Snowflake’s IPO.”
Recently, Snowflake expanded its vision beyond being just a cloud data warehouse and became the “Data Cloud.” The Data Cloud is a global network that allows organizations to process data with near-limitless scale, concurrency, and performance. By unifying "siloed" datasets within the Data Cloud, businesses can quickly find and securely share governed data and run various analytics workloads.
With the Data Cloud, Snowflake offers a comprehensive solution for data warehousing, data lakes, data engineering, data science, data application development, and data sharing—more details in the coming sections.
The largest data engineering survey of recent times, the State of Data Engineering, conducted by Airbyte at the end of 2022, received responses from nearly 900 participants. The results revealed that Snowflake is the clear leader when it comes to brand recognition and adoption in the cloud data warehousing realm. Most respondents were either current Snowflake users or expressed an interest in trying the platform.
As of January 2023, Snowflake has more than 7,820 customers globally. The Snowflake Partner Network (services, technology, and data provider partners) includes over 820 Powered by Snowflake registrants.
Snowflake Architecture Overview
Snowflake's Data Cloud offers solutions for storing, processing, and analyzing data. Unlike traditional offerings, Snowflake is built on a new SQL query engine and a unique cloud-based architecture.
The best part? Snowflake is a self-managed service, so you don't have to worry about selecting, installing, configuring, or managing hardware or software. Snowflake handles everything, including ongoing maintenance, management, upgrades, and tuning.
One thing to remember: Snowflake runs entirely on public cloud infrastructures and cannot be run on private ones.
Snowflake combines traditional shared-disk and shared-nothing architectures to provide the best of both worlds. It uses a central data repository accessible from all compute nodes, like shared-disk architectures. But it also processes queries using MPP (massively parallel processing) compute clusters, where each node stores a portion of the entire dataset locally, similar to shared-nothing architectures. This approach offers data management simplicity, performance, and scaling benefits.
The architecture comprises three essential layers: database storage, query processing, and cloud services.
- Database storage. Data is loaded into Snowflake and reorganized into an optimized, compressed columnar format. Snowflake manages all storage aspects, including organization, file size, compression, and metadata.
- Query processing. Each virtual warehouse is an independent MPP compute cluster that does not share resources with other virtual warehouses. This ensures that each virtual warehouse has no impact on the performance of the rest.
- Cloud services. These services manage authentication, infrastructure, metadata, query parsing, optimization, and access control.
All three layers of Snowflake's architecture are deployed and managed entirely on a selected cloud platform. As mentioned, Snowflake can be hosted on AWS, GCP, and Azure.
Snowflake Key Features
Security and governance
Snowflake takes security, governance, and data protection seriously. That's why they offer various features to help you keep your data safe. Madison Schott comments on this, "I like that you have a customizability that allows you to keep your data safe and protected. You can separate roles for external tools and those for the analysts working directly within the warehouse.”
For instance, you can choose where your data is stored based on your region. And user authentication is easy – you can use standard user/password credentials or take advantage of more advanced options like multi-factor authentication, federated authentication, single sign-on, and OAuth.
All communication between clients and the server is protected through TLS. And data isolation during loading and unloading is done through different controls depending on your cloud platform.
Snowflake complies with HIPAA and HITRUST CSF regulations and offers automatic data encryption using Snowflake-managed keys. You can also use object-level access control and Snowflake Time Travel to query, restore, and clone historical data. Plus, Snowflake Fail-safe offers disaster recovery of historical data.
If you need even more advanced features, you can use column-level security to apply masking policies to columns in tables or views. And row access policies let you apply row access policies to tables and views.
SQL and extended SQL
Snowflake offers comprehensive SQL support that makes managing and analyzing your data easy. You can use databases, schemas, tables, core data types, SET operations, CAST functions, and standard Data Manipulation Language (DML) like UPDATE, DELETE, and INSERT. And if you need more advanced features, you can also use multi-table INSERT, MERGE, and multi-merge.
Other features that make Snowflake great include support for transactions, temporary and transient tables, lateral views, materialized views, aggregate statistical functions, and analytical aggregates. You can even use some of the SQL:2003 analytic extensions, like windowing functions and grouping sets.
Snowflake even supports recursive queries, including CONNECT BY and Recursive CTE, as well as collation and geospatial data. With such a wide range of SQL features and support, it's no wonder that Snowflake is so famous for managing and analyzing data.
Tools and interfaces
Snowsight is a web-based interface that makes it easy to manage your account, monitor resources and system usage, and query data. And if you prefer a Python-based command line client, SnowSQL provides access to all of Snowflake's services.
Managing virtual warehouses is a breeze with Snowflake. You can use the GUI or the command line to create, resize (with zero downtime), suspend, and drop warehouses.
If you're a Visual Studio Code user, you're in luck. Just install the Snowflake Extension, and you'll get detailed instructions on configuring and using it.
Apps and extensibility
Snowflake makes building applications and processing data easy without moving it to the system where your application code runs. And with a range of APIs and tools available, Snowflake has you covered no matter your preferences. APIs are available for Java, Python, and Scala.
But wait, there's more. Snowflake also offers a REST API for accessing and updating data. And to make things even easier, an extensive set of client connectors and drivers are available for different programming languages like Python, Spark, Node.js, Go, .NET, JDBC, ODBC, and PHP PDO.
Data import and export
Snowflake makes importing and exporting data easy with bulk loading and unloading into/out of tables. You can load any data that uses a supported character encoding and data from compressed files. Most flat, delimited data files like CSV and TSV are supported, as well as files in JSON, Avro, ORC, Parquet, and XML format.
Using the web interface or command line client, you can load files from cloud storage or local files. And if you need continuous data loading, you can use Snowpipe to load data in micro-batches from internal or external stages like Amazon S3, Google Cloud Storage, or Microsoft Azure Storage.
Snowflake also provides a broad ecosystem of supported third-party partners and technologies, which makes connecting with a wide range of tools and services easy. Data migration from other databases like MySQL is easy and seamless.
As Madison explains: “Every tool I've used has a built-out Snowflake integration. You simply specify a username, password, role, and default warehouse. This lets you connect whatever tool you use directly to your Snowflake warehouse. I've used it with Airbyte, dbt, Prefect, Castor, and Datafold.” She continues: “Snowflake works great with dbt. I personally really like how the two tools work together. They make it easy to build and test data models.”
Data Sharing and The Snowflake Marketplace
Secure Data Sharing is a feature that lets you selectively share different objects in your Snowflake database with other Snowflake accounts. You can share tables, external tables, views, materialized views, and user-defined functions (UDFs).
Snowflake uses shares created by data providers and made available to data consumers. The shares ensure that data isn't copied or transferred between accounts, and all sharing is done through Snowflake's services layer and metadata store. So, you can relax knowing your data is safe and sound.
Setting up Secure Data Sharing is a breeze for providers, and accessing the shared data is almost instantaneous for consumers. On the consumer side, a read-only database is created from the share. You can access this database using Snowflake's standard role-based access control, which guarantees the shared data remains secure and protected.
If you're looking for third-party datasets, check out the Snowflake Marketplace. It's a great way to discover and access all data offerings. And the best part? The Marketplace uses Secure Data Sharing to connect data providers with consumers across clouds and regions.
The Data Cloud Explained
Snowflake's core data warehouse has expanded by allowing best-of-breed service providers to integrate and form the Data Cloud, providing users with services through Snowflake's Partner Connect and access to premium market data via Snowflake's zero-copy replication in the Data Marketplace. This way, the Data Cloud allows all Snowflake accounts to be part of a single data universe.
Having a competitive advantage is crucial in business. It's all about focusing on what you do best and letting third parties handle the rest. And that's Snowflake’s core offering with the Data Cloud.
Let's say you're an e-commerce company. You're great at curating product offerings to meet customers' needs. But you want to take it up a notch. That's where a recommendation engine can help. Instead of building your engine and infrastructure, you can rely on Snowflake-trusted partners like Amazon Personalize or Google Recommendations AI to get you up and running in no time.
But what about the latest market trends? You need that information to make your recommendations even more accurate. No worries – Snowflake's Data Marketplace has got you covered. It makes that data instantly available as if it were a local table.
The future of Snowflake
Now, let's look at what's in store for Snowflake. There are some exciting features and acquisitions on the horizon that could shape the future of the company and the data industry as a whole.
Data sharing and collaboration
When we talk about the future of Snowflake, we're talking about the future of data sharing and collaboration. Snowflake is leading the charge, revolutionizing the way companies share data.
As mentioned, the Snowflake Marketplace is like social networking for big data, democratizing access and allowing companies to monetize their data in new ways – changing the game and creating new business opportunities across industries.
As Snowflake continues to develop its data-sharing functionality, we're moving towards a future where data interconnectivity is the norm and data sets can be shared in near real-time to unlimited data consumers.
Data and business professionals must move away from corporate data silos and towards a more interconnected, collaborative mindset. It may take time, but once the shift happens, it will enable digital transformations that tremendously improve business decisions and bottom lines.
This shift will also result in a decrease in the duration of the data value chain, allowing businesses to capture value more quickly. Removing friction caused by data sharing and collaboration will lead to faster, cheaper, and more secure data-sharing practices.
In mid-2022, Snowflake announced an exciting new development: the Native Application Framework. This framework is changing how we build, distribute, and use applications in the Data Cloud.
With the Native Application Framework, application providers can use Snowflake's familiar core functionalities to build their applications. They can then distribute and monetize them in Snowflake Marketplace and even deploy them directly into a customer's Snowflake account.
This framework is beneficial for both application providers and customers. Providers get immediate exposure to thousands of Snowflake customers worldwide, while customers can keep their data centralized and simplify their application procurement process.
As Benn Stancil puts it, “Snowflake could be on the cusp of changing what a database is, what data apps are, how they get built and sold, and what we can do with them. They could be building a platform that stokes the industry once more and leads to another explosion of ideas and products.”
The world of machine learning is increasing, and it will only improve with near real-time data-sharing solutions.
Snowflake is leading the way with its machine learning (ML) capabilities and Snowpark framework. The recent public preview of Snowpark for Python has enabled data scientists and ML engineers to utilize limitless use cases, from feature engineering to training to serving batch inference.
These product innovations are opening up new, more efficient ways to generate and operationalize ML-powered insights. Data scientists, engineers, and developers can collaborate more effectively to take ML models to production, using their language of choice and with unmatched ease to develop interactive applications that turn insights into actions.
But what's more exciting is that partners like dbt can leverage Snowpark to unify data pipelines for analytics and ML use cases. We can expect even more powerful and efficient ML models in the future.
Snowflake is making data management easier with two new tools for low-latency streaming ingestion: Snowpipe Streaming and the Snowflake Connector for Kafka with Snowpipe Streaming support. With these tools, data can be streamed directly into Snowflake tables, making silos and complex infrastructure management a thing of the past.
But Snowpipe Streaming is just the beginning. Snowflake has big plans for its streaming ingestion capabilities, and we can be sure they will be working to improve the platform for even better performance, functionality, and cost-efficiency.
One way to anticipate a company's future direction is by examining its recent significant acquisitions, which can provide valuable insights into its priorities and investment areas.
In early 2023, Snowflake recently announced the acquisition of Mobilize.Net's SnowConvert software tools. With this acquisition, Snowflake can enhance SnowConvert's tools, integrating them more effectively into the Snowflake Data Cloud and ultimately simplifying the migration process for new customers. The acquisition talks about Snowflake's long-term objectives include making the data migration process as simple and efficient as possible.
They also acquired Streamlit last year, a Python framework that makes web development easy for ML engineers and data scientists. With Streamlit, developers can build powerful applications without the traditional complexity of web development. In the same ML motion earlier this year, Snowflake announced a focus on building ML extensibility into the Data Cloud. To that end, they plan to acquire Myst, a company specializing in time series forecasting.
Overall, Snowflake is committed to simplifying data management and making it more accessible to everyone. These acquisitions are just a few examples of their working to achieve that goal.
A big question about the future of Snowflake is, how will cost evolve? The answer is not trivial.
When I asked Madison Schott about challenges working with Snowflake, she said: “Managing costs is the biggest issue by far. Snowflake can be very expensive if you don't use it right. However, once you find a good balance between cost savings and performance, it works great. This involves a lot of play around with active warehouses, warehouse sizing, auto-resume, and suspend periods.”
We will certainly keep an eye on what the future holds for Snowflake regarding costs. Benn Stancil shares some possibilities in the famous How Snowflake fails blog post.
Snowflake is at the forefront of the data industry, with its innovative products and strategic acquisitions leading the way toward a future where data sharing and collaboration are commonplace. By democratizing big data and making it more accessible, Snowflake is unlocking new business opportunities and driving amazing business transformations that can significantly improve decision-making and bottom lines.