Technical considerations for Snowflake Gen2 migration

Snowflake’s Generation 2 Standard Warehouses (Gen2) introduce an enhanced execution layer leveraging faster hardware and improved software optimizations. These changes are designed to improve performance for workloads involving large scans and Data Manipulation Language (DML) operations such as DELETE, UPDATE and MERGE.

This post provides a technical overview of Gen2 warehouses, potential cost implications and workload evaluation strategies based on findings from an analysis of Slingshot query profiles.

Columnar storage and Snowflake warehouses

Snowflake uses columnar storage, which is ideal for analytical, read‑intensive workloads such as reporting dashboards, ad‑hoc data exploration and large aggregations (e.g., scanning billions of rows to compute metrics or join fact and dimension tables for business intelligence). In these scenarios, columnar layouts allow the engine to read only the specific columns required and apply predicate pushdown, resulting in high scan efficiency and reduced I/O.

For example, consider an example sales_fact table with 100 columns and thousands of rows, but your analytics query only needs two columns (customer_id and total_spend):

    SELECT customer_id, total_spend
FROM sales_fact
WHERE region = 'US';
  

Because of the columnar layout, Snowflake only reads the customer_id, total_spend and region columns from  storage rather than scanning all 100 columns, significantly reducing I/O and improving query execution time.

This efficiency is enabled by immutable micro‑partitions—Snowflake’s internal storage units that contain data for a subset of table columns. Once written, a micro‑partition is never modified in place; instead, new partitions are created for changes. This immutability allows for aggressive compression, clustering metadata and fast pruning of irrelevant data during scans, which result in faster query executions. 

When columnar storage becomes less efficient

While columnar storage excels at read‑intensive analytics, it is less suited for large scans or frequent data modifications. 

In fact, these same design choices introduce several challenges for other workload patterns, such as:

  • Complex updates: Updates and deletes often require replacement or merging of entire column chunks.

  • Overhead for transactional use: Columnar systems are less efficient for workloads needing high‑frequency updates or low‑latency writes.

  • I/O amplification in wide scans: When queries select most or all columns, the engine must read from many separate column segments, increasing scan I/O and CPU cost compared to row‑oriented layouts.

  • Extra CPU resources to reassemble rows: Columnar formats sacrifice row locality; reconstructing complete rows from multiple column files adds overhead during large joins or full‑row projections.

  • Decoding overhead for compressed data: The heavy compression and encoding schemes that boost analytic performance can increase CPU usage and latency during large scans without selective filters.

  • Partition proliferation over time: Because micro‑partitions are immutable, frequent updates and deletes create more partitions and metadata, which increases planning time and scan footprint.

  • Non‑selective scans: Queries that scan most or all rows in a table without strong filtering conditions (e.g., no highly selective WHERE clause) defeat some of the pruning benefits of columnar storage and can be costly on very large tables.

Limitations for large scans

Certain query patterns for large data scans force the columnar storage to scan large portions of data, which increases execution time for these workloads in Gen1 warehouses. Below are examples that highlight these limitations:

Wide scan across many columns

When a query requests almost every column, columnar storage loses its usual advantage of reading only a subset of columns. Each column segment must be accessed and decoded:

    SELECT *
FROM sales_fact
WHERE order_date BETWEEN '2025-01-01' AND '2025-06-30';
  

(This example touches nearly all columns for every row in the date range, increasing I/O and CPU overhead.)

Non‑selective scan without filters

When no filtering predicate is applied, Snowflake must read every micro-partition, even if only a few columns are selected:

    SELECT customer_id, total_spend
FROM sales_fact;
  

(With no WHERE clause, the engine scans all rows in all relevant partitions, which can be expensive on very large tables.)

Low selectivity predicate

Even with a filter, if the predicate matches most of the table, partition pruning provides little benefit:

    SELECT customer_id, total_spend, region
FROM sales_fact
WHERE region IN ('US', 'EU', 'APAC');
  

(If these regions cover nearly all rows, almost every partition still needs to be scanned.)

Limitations for DML operations

DML operations typically involve rewriting significant portions of data and scanning more metadata than a row‑oriented system would. As a result, these operations may take longer and consume more compute resources. Consider the following example queries:

UPDATE queries

Adjusting a column value for many rows forces new partitions to be written:

    UPDATE sales_fact
SET total_spend = total_spend * 1.05
WHERE region = 'EU';
  

DELETE queries

Removing rows requires rewriting partitions without the deleted data:

    DELETE FROM sales_fact
WHERE order_status = 'CANCELLED';
  

MERGE queries

Combining updates and inserts often triggers both scanning and rewriting:

    MERGE INTO sales_fact AS target
USING staging_updates AS src
ON target.order_id = src.order_id
WHEN MATCHED THEN
  UPDATE SET target.total_spend = src.total_spend
WHEN NOT MATCHED THEN
  INSERT (order_id, customer_id, total_spend)
  VALUES (src.order_id, src.customer_id, src.total_spend);
  

In each case, the columnar format amplifies the cost compared to row‑oriented systems because entire column segments are rewritten rather than individual rows. Gen2 warehouses help by executing these operations more efficiently, but the underlying storage characteristics remain the same.

How Generation 2 standard warehouses enhance columnar storage

Snowflake’s Gen2 warehouses build on these columnar foundations to deliver even better performance and scalability, leveraging an optimized execution layer and faster hardware. This includes:

  • Faster columnar scans: Improved query execution and parallelism in Gen2 allow columnar scans to run faster, with more efficient CPU scheduling and better resource utilization.

  • Smarter pruning: Enhanced metadata access and pruning algorithms mean fewer irrelevant micro‑partitions are scanned, amplifying the benefits of columnar layouts.

  • Improved concurrency: The ability of Gen2 warehouses to scale compute clusters dynamically supports more concurrent columnar scan queries without degrading performance, allowing them to process more queries in parallel. This is because Gen2 leverages faster hardware, which frees up more compute resources for query processing.

  • Lower DML overhead: While columnar storage still rewrites micro‑partitions for updates or merges, Gen2’s execution engine mitigates some of the runtime cost with better resource management and faster planning. This is due to optimized execution paths for DELETE, UPDATE and MERGE operations.

These changes allow workloads for large DML or scan-heavy queries to complete in less time.

Insights from Slingshot analysis

Slingshot examined more than 2 billion query profiles across +14K warehouses in April 2025 and found:

  • Significant proportion of eligible workloads: Approximately 30% of queries fell into categories that could benefit from Gen2—specifically those involving DELETE, UPDATE or MERGE.

  • Long-running DML operations: A significant subset of queries running longer than 30 minutes were performing complex DML tasks. These are prime candidates for Gen2 migration.

  • Large scans: Numerous queries scanned over 1 TB of data, where Gen 2’s improved scan performance can provide material runtime reduction.

  • Combined heavy patterns: A smaller but critical set of queries both scanned over 1 TB and ran for 30 minutes or more, often involving DML operations—these represent the highest potential gain from Gen2.

These findings support a targeted migration strategy: focus on warehouses with a significant concentration of eligible workloads rather than migrating all warehouses indiscriminately.

Gen2 cost considerations

If you believe your workloads could benefit from Gen2 warehouses, you should also weigh the costs against the potential benefits. According to Snowflake’s August 2025 pricing guide, Gen 2 warehouses consume 25% more credits per hour than standard warehouses.

Size Standard (Credits/hr) Azure - Gen2 (Credits/hr) AWS/GCP - Gen2 (Credits/hour)
XSmall 1 1.25 1.35
Small 2 2.5 2.7
Medium 4 5 5.4
Large 8 10 10.8
XLarge 16 20 21.6
2XLarge 32 40 43.2
3XLarge 64 80 86.4
4Xlarge 128 160 172.8
5XLarge 256 345.6 340
6XLarge 512 691.2 680

Source: Snowflake credit consumption table

Potential implications:

Based on this pricing schedule, it is possible that:

  • In environments with heavy DML workloads, shorter runtimes may offset the higher per‑hour rate.

  • In warehouses that run continuously with multiple clusters, runtime improvements may not result in lower cost and total spend may increase.

Gen2 regional availability

Regional availability will also impact your decision to migrate. As of August 2025, Gen2 warehouses are only available in the following cloud regions: 

  • AWS US‑West (Oregon)

  • AWS EU‑Central (Frankfurt)

  • Azure East US 2 (Virginia)

  • Azure West Europe (Netherlands)

For the latest regional availability information, refer to Snowflake’s Gen2 standard warehouses documentation.

Migrating to Gen2

If migrating Gen2 is worth the investment for your business, refer to the following pre-migration checklist, Gen2 migration script and post-migration recommendations to help you get started and create your own targeted migration strategy.

Pre-migration checklist

  1. Verify regional availability: Ensure Gen2 is supported in both your primary and disaster recovery Snowflake regions.
  2. Identify migration candidates: Use QUERY_HISTORY (with query_type and execution_time filters) to locate warehouses with:

    • High counts of large joins, MERGE and UPDATE operations.

    • Queries running over 30 minutes and/or scanning >1 TB.

  3. Analyze long‑running queries: Review EXPLAIN plans for:

    • Complex transformations

    • CPU‑intensive calculations

    • Window functions

    • Large GROUP BY / ORDER BY clauses

  4. Prepare migration scripts: Update warehouse definitions as shown below.

Gen2 migration script

You can enable Gen 2 on new or existing Snowflake warehouses by running:

    CREATE OR REPLACE WAREHOUSE new_gen2_warehouse_small
  RESOURCE_CONSTRAINT = STANDARD_GEN_2
  WAREHOUSE_SIZE = SMALL;

-- Migrate an existing warehouse to Gen2
ALTER WAREHOUSE existing_warehouse_xlarge
  SET RESOURCE_CONSTRAINT = STANDARD_GEN_2;

-- Revert a Gen2 warehouse to Gen1
ALTER WAREHOUSE existing_warehouse_xlarge
  SET RESOURCE_CONSTRAINT = STANDARD_GEN_1;
  

Post‑migration recommendations

After migration, it is important to continually analyze cost trends and runtime to decide if further migrations are needed. Slingshot can help you measure cost trends and even provide query-level cost breakdowns for more granular insights.

  • Compare cost trends: Track warehouse credit usage at least one week before and after migration, adjusting for query volume or operational changes.

  • Measure runtime changes: Use query‑ID fingerprints or tooling to validate reductions in execution time.

  • Refine decisions: Apply findings to determine if further migrations are warranted or if some workloads are better left on standard warehouses.

Conclusion

Snowflake’s Generation 2 Standard Warehouses provide optimized performance for DML‑intensive and scan‑heavy workloads. But how do you determine if migrating to Gen2 is right for you?

A targeted approach—identifying warehouses with the highest concentration of long‑running, DML‑heavy or large‑scan queries—can help ensure that the higher per‑hour credit consumption of Gen2 is balanced by meaningful runtime reductions to potentially reduce query costs compared to Gen1.

Slingshot can help you validate when to use Gen2 warehouses automatically. Convert your existing Gen1 warehouses to Gen2 with a click of a button, compare cost and performance before and after migration and revert back to Gen1 in a click if you find those results favorable. 

Interested in learning more? Book a time to see Slingshot in action.


Ganesh Bharathan, Director of Data Engineering, Capital One Software

Ganesh Bharathan is a Director of Data Engineering for Capital One Software, the enterprise B2B software business of Capital One. He leads a team of engineers responsible for managing the Snowflake platform and Slingshot innovations. Ganesh has a master's degree in Information Technology from George Mason University and more than 20 years of experience in database architecture and performance optimization. He is a Snowflake subject matter expert, holding several of Snowflake’s SnowPro Advanced certifications, including Architect, Administrator and Data Engineer.