The migration from on-premise to serverless data storage has gained momentum over the last few years, with the cloud data storage market projected to approach US$97.5 billion by 2022.
The dispensability of physical infrastructure and its excellent scalability have made serverless data warehousing a fantastic choice for businesses. There are many providers to choose from, and Snowflake, a flexible data warehouse (DWH) solution, is one of the best, providing businesses with a range of benefits, including scalability and optimal pricing. So let’s look in-depth at this cloud-based solution and analyze the cases where Snowflake outperforms its competitors.
What is Snowflake Computing?
To outline the value of Snowflake for your business, let’s first state its core principle. Snowflake is a solution available in AWS, Microsoft Azure, and Google Cloud. It aims to fulfill the majority of data analytic needs, such as processing any scale of structured or semi-structured data, while dramatically minimizing efforts in its handling and analysis. Snowflake provides Data Warehouse-as-a-Service, thus requires no installation, configuration, or management of software or hardware.
However, being a SaaS data warehouse for storing and processing massive data volumes does not make Snowflake unique. There are many other solutions that also do this. To find what puts Snowflake above its competitors, let’s dig deeper.
What Makes Snowflake Unique?
Snowflake is an excellent option for many companies because it offers:
- Scalable universally shared data storage. Organizational data for analytics is not locked in separate clusters or stores and can be easily connected.
- Perfect elasticity of storage and compute, meaning instant scalability. When scale needs to be ramped up or down, there is no need to stop current workloads or wait while cluster rebalancing occurs
- A pay-for-what-you-use pricing model, with one-second resource usage granularity, transparency, and speed control. There is no need to guesstimate your analytics’ needs to correctly size your warehouse.
- An ability to store and process semi-structured data in the same way as relational data and with comparable performance.
The balance between these benefits and its flexible pricing policy is why Snowflake outperforms its competitors on analytical workloads — both traditional on-premise storage, and other cloud-based solutions, such as Oracle, RedShift, Teradata, MS SQL Server, or Azure DWH.
Let’s dive deeper to better understand each benefit of the Snowflake data warehouse solution for your business as compared to other solutions on the market.
Benefit 1: Snowflake Distinguishes Storage and Computing Entities
The primary advantage of Snowflake is its ability to decouple storage and computing functions. For example, RedShift bundles the two features together, making it difficult to optimize or scale each independently. As a result, users pay more.
But what does this mean for your business? Chances are, your storage and compute needs may not scale together. Some companies accumulate data over time but run similar workloads year-to-year. Other companies may have very high or uneven needs for analytical processing, while their data volume remains stable. In most of the data warehouses, you need to provision clusters with servers, each server adds some storage and compute, and they are linked together. Either you underutilize your planned capacity, or your business users have to wait as they compete for resources to run their analytics. But not in Snowflake.
Snowflake’s architecture includes storage and computing layers that can be scaled up or down, independently. So you can keep your data balanced while applying the required CPU power to process it.
Snowflake stores all the data you upload, and it can be either structured or semi-structured. The warehouse automatically controls the data and structures it by parameters such as metadata, organization, file size, statistics, compression, and so on. You can run this layer independently of computing layers.
This layer includes virtual warehouses (clusters) that run the data processing based on user queries. Each cluster can access data from the storage layer and analyze it independently (without sharing between other clusters).
Benefit 2: Snowflake Enables On-Demand Scaling
As Snowflake distinguishes between storage and computing options, the Snowflake data warehouse provides another unique advantage: on-demand scaling.
While processing user queries from the storage layer, the computing power for managing them can be scaled on-demand, without redistributing or rebalancing data. Thus, you scale resources automatically and without harm to the data.
While traditional data warehouses need hours or even days to scale (for example, by physically adding hardware, extending servers, or upgrading the entire infrastructure), Snowflake does so in minutes.
Benefit 3: Snowflake Allows Optimized Pricing
Many organizations emphasize that Snowflake’s pricing policy is a primary factor when selecting it as their data warehouse solution.
While there are two layers — storage and computing — you pay separately for each function. There is no single package to pay for; you only pay for the number of resources you need and expand or scale them down when necessary.
For example, Snowflake storage costs can start at a flat rate of $23 per Terabyte. The use of computing power is charged per second. This is in comparison to Amazon RedShift, which requires users to pay hourly, and has the lowest price of $0.25 for 15 Gibibyte of storage space.
Situations Where Snowflake Outperforms and Underperforms Its Competitors
Regardless of Snowflake’s many advantages, no single technology can be the best choice for every situation. Let’s analyze some cases where businesses need to use Snowflake, and some where other solutions are likely to perform better.
USE CASE 1
Situation: To share data with third-party accounts.
Solution: If you need to share your data with, say, your customers, Snowflake can do this without the need to leave a copy of that data.
USE CASE 2
Situation: A DWH that handles semi-structured data.
Solution: If your DWH deals a lot with semi-structured data like XML and JSON, Snowflake has better support, compared to other solutions.
USE CASE 3
Situation: Certainty about DHW volume and workload.
Solution: If you are confident in your DWH volume and workload over time, use Redshift, as it is likely to be less expensive overall via discounts on reserved long-term commitments.
USE CASE 4
Situation: A vast cloud-native infrastructure, e.g., in AWS.
Solution: If you already have a vast infrastructure built on top of AWS, Redshift is likely to cope with Athena, Data Migration Service, Glue, and other services in a more manageable way. Snowflake requires additional efforts to build ETL/ELT processes but does guarantee easier maintenance. The same applies to other clouds.
USE CASE 5
Situation: A large legacy DWH on a specific technology that needs to be migrated quickly to the cloud.
Solution: If you base your current DWH on, for example, an MS SQL Server, you may require quite a bit of rework to adjust to Snowflake SQL and DB specifics. It is potentially worthwhile to consider Azure DWH options, for better compatibility and short term results.
USE CASE 6
Situation: Online transactional processing workloads.
Solution: Snowflake isn’t suitable for online transactional processing (OLTP) and running operational data stores because of its focus on OLAP and lack of support for DB constraints. For these scenarios, it’s worth considering Postgres, MySQL, Amazon Aurora, MS SQL, or others.
USE CASE 7
Situation: Demand for real-time/streaming or native processing for unstructured or semi-structured data.
Solution: Firstly, Snowflake shows excellent performance for typical BI, Reporting, and Analytical workloads. These workloads usually take a second or more to run. Snowflake isn’t a match however for sub-second and low latency workloads. Secondly, Snowflake deals with semi-structured data effectively by translating it to tabular views, which then allows you to join it with your relational data. If your use case requires comprehensive support for specific non-relational data structures, for example traversing graphs or semantic data, Snowflake may not be a good match for both syntax and compute performance. You may consider building a data lake for such low latency and/or non-relational data structures processing.
Case Study: Why DataArt’s Client Migrated from Amazon RDS to Snowflake
Let’s consider the migration process in a real-world example.
A leading company in the media industry has worked with DataArt over several years on the implementation and evolution of a big data analytics system for the music industry, as well as a range of data-driven apps and BI tools that face internal and external clients.
Snowflake was first introduced as a data engine for integration, enrichment, processing, and aggregation of terabytes per day of raw data, and proved to be a superior technology because of its scalability and flexibility. In its original implementation, Snowflake was used to produce Data Marts hosted in Amazon RDS (PostgreSQL). The Data Marts powered the consumption layer BI and reporting for internal and external users and data-driven apps.
Soon after the launch of the original implementation, the demand for the consumption layer grew significantly, and RDS started to hit scalability issues. Apps, Dashboards, and analytical queries were running slowly, and demands of the engineering team for optimization were becoming untenable. As a result, the company decided to migrate the Data Marts from RDS to Snowflake.
- Data volume for daily updates exceeded 1TB and was continually growing.
- A lot of KPIs needed to be calculated on-the-fly.
- The ETL process execution time for RDS/PostgreSQL (data preparation, aggregates building, KPIs precalculation) grew from 6 hours to 18 hours.
- Adding new entities and data to RDS/PostgreSQL became painful in terms of the ETL duration and poor user experience.
- Running ETL affected user experience in apps and BI.
- Team of data engineers spent much time tuning data partitioning setup, indices, and other optimizations. RDS infrastructure costs were surging.
The migration to Snowflake allowed the platform to work much faster and more effectively:
- In general, the app processing speed increased 2–3 times, as compared to RDS/PostgreSQL.
- The ETL duration reduced from 6–18 hours to 4 hours.
- The ETL process doesn’t affect the user application anymore since they use different Snowflake virtual data warehouses.
- New entities can be added quickly and smoothly.
- BI users in the business are not limited when deriving new insights from the data.
- Product teams for data-driven apps are not limited when delivering more sophisticated analytical features to customers.
- Snowflake infrastructure costs are lower than RDS.
- The amount of optimization effort by the data engineering team is significantly reduced, although still required for the most complex workloads.
- Engineering teams benefit from the ability to copy and support prod/test/dev environments easily.
This case illustrates some of the advantages of Snowflake, but it does not guarantee that Snowflake will be the best choice for your specific business scenario. To choose the optimal architecture option for you, consider the issues discussed in the article Data Warehouse Concepts and Tools: Do’s and Don’ts While Building Your Solution, from the idea stage to building your satisfactory solution.
DataArt can act as a consultant and help you discover which data warehouse (or more generally, data architecture) to choose, depending on your individual needs and requirements. Read more about our expertise in Data, BI and Analytics, and get in touch to discuss your options.