Thursday, December 9, 2021

Day 4: Virtual Warehouse(Part-1)

Virtual Warehouses, often referred to simply as a “warehouse”, is a cluster of compute resources in Snowflake. A warehouse provides the required resources, such as CPU, memory, and temporary storage, to perform the following operations in a Snowflake session:

  1. Executing SQL SELECT statements that require compute resources (e.g. retrieving rows from tables and views).
  2. Performing DML operations, such as
    • Updating rows in tables (DELETE , INSERT , UPDATE).
    • Loading data into tables (COPY INTO <table>).
    • Unloading data from tables (COPY INTO <location>).


Overview of Virtual Warehouse:

Warehouses are required for queries, as well as all DML operations, including loading data into tables. A warehouse is defined by its size, as well as the other properties that can be set to help control and automate warehouse activity.Warehouses can be started and stopped at any time. They can also be resized at any time, even while running, to accommodate the need for more or less compute resources, based on the type of operations being performed by the warehouse.

Impact on Credit Usage and Billing

There is a doubling of credit usage as you increase in size to the next larger warehouse size for each full hour that the warehouse runs; however, note that Snowflake utilizes per-second billing (with a 60-second minimum each time the warehouse starts) so warehouses are billed only for the credits they actually consume.

Note: For a multi-cluster warehouse, the number of credits billed is calculated based on the multi-cluster warehouse size and the number of warehouses that run within the time period. For example, if a 3X-Large multi-cluster warehouse runs 1 warehouse for one full hour and then runs 2 warehouses for the next full hour, the total number of credits billed would be 192 (i.e. 64 + 128).

Impact on Data Loading

Increasing the size of a warehouse does not always improve data loading performance. Data loading performance is influenced more by the number of files being loaded (and the size of each file) than the size of the warehouse.

Tip: Unless you are bulk loading a large number of files concurrently (i.e. hundreds or thousands of files), a smaller warehouse (Small, Medium, Large) is generally sufficient. Using a larger warehouse (X-Large, 2X-Large, etc.) will consume more credits and may not result in any performance increase.

Impact on Query Processing

The size of a warehouse can impact the amount of time required to execute queries submitted to the warehouse, particularly for larger, more complex queries. In general, query performance scales with warehouse size because larger warehouses have more compute resources available to process queries.

If queries processed by a warehouse are running slowly, you can always resize the warehouse to provision more compute resources. The additional resources do not impact any queries that are already running, but once they are fully provisioned they become available for use by any queries that are queued or newly submitted.

Auto-suspension and Auto-resumption

A warehouse can be set to automatically resume or suspend, based on activity:

  • By default, auto-suspend is enabled. Snowflake automatically suspends the warehouse if it is inactive for the specified period of time.
  • By default, auto-resume is enabled. Snowflake automatically resumes the warehouse when any statement that requires a warehouse is submitted and the warehouse is the current warehouse for the session.

What is a Multi-cluster Warehouse?

By default, the size of a virtual warehouse determines the compute resources available to the warehouse for executing queries. Each warehouse is a set of compute resources. As queries are submitted to a warehouse, the warehouse allocates resources to each query and begins executing the queries. If sufficient resources are not available to execute all the queries submitted to the warehouse, Snowflake queues the additional queries until the necessary resources become available.

With multi-cluster warehouses, Snowflake supports allocating, either statically or dynamically, additional warehouses to make a larger pool of compute resources available. A multi-cluster warehouse is defined by specifying the following properties:

  • Maximum number of warehouses, greater than 1 (up to 10).
  • Minimum number of warehouses, equal to or less than the maximum (up to 10).
Additionally, multi-cluster warehouses support all the same properties and actions as single warehouses, including:
  • Specifying a warehouse size.
  • Resizing a warehouse at any time.
  • Auto-suspending a running warehouse due to inactivity; note that this does not apply to individual warehouses, but rather the entire multi-cluster warehouse.
  • Auto-resuming a suspended warehouse when new queries are submitted.

Maximized vs. Auto-scale

You can choose to run a multi-cluster warehouse in either of the following modes:

Maximized

This mode is enabled by specifying the same value for both maximum and minimum number of warehouses (note that the specified value must be larger than 1). In this mode, when the multi-cluster warehouse is started, Snowflake starts all the warehouses so that maximum resources are available while the multi-cluster warehouse is running.

Auto-scale

This mode is enabled by specifying different values for maximum and minimum number of warehouses. In this mode, Snowflake starts and stops warehouses as needed to dynamically manage the load on the multi-cluster warehouse:
As the number of concurrent user sessions and/or queries for the multi-cluster warehouse increases, and queries start to queue due to insufficient resources, Snowflake automatically starts additional warehouses, up to the maximum number defined for the multi-cluster warehouse. Similarly, as the load on the multi-cluster warehouse decreases, Snowflake automatically shuts down warehouses to reduce the number of running warehouses.

Benefits of Multi-cluster Warehouses

With a standard, single-cluster warehouse, if your user/query load increases to the point where you need more compute resources:
  1. You must either increase the size of the warehouse or start additional warehouses and explicitly redirect the additional users/queries to these warehouses.
  2. Then, when the resources are no longer needed, to conserve credits, you must manually downsize the larger warehouse or suspend the additional warehouses.

In contrast, a multi-cluster warehouse enables larger numbers of users to connect to the same size warehouse. In addition:
  • In Auto-scale mode, a multi-cluster warehouse eliminates the need for resizing the warehouse or starting and stopping additional warehouses to handle fluctuating workloads. Snowflake automatically starts and stops additional warehouses as needed.
  • In Maximized mode, you can control the capacity of the multi-cluster warehouse by increasing or decreasing the number of warehouses as needed.

No comments:

Post a Comment

Mom :Difficult Word but easy to pronunce

Hi bloggers ,I know this is not good post to read but i know you will all relate with this emotion. When you around with your Mom,you feel s...