Snowflake - Best Practices For Modern Data Warehouse

Posted by Ram Narasimhan on Apr 8, 2021 2:10:34 PM

Snowflake-header

Software as a Service has been expanding in its usage as more and more enterprises are adopting it. Snowflake's cloud platform emerged as one such SaaS data warehouse. It enables the storage and processing of data and offers analytical solutions that are flexible, fast, and easy to use. The difference between traditional data clouds and Snowflake is that Snowflake stands independent of database technology or big data software platform. It combines a SQL query engine and a native cloud architectural design. It has a 3 layered architecture including database storage, query processing, and cloud services as the key layers. Hence, as a user of Snowflake, you will have the functionality of an enterprise analytic database with special features and unique capabilities. 

How to utilize Snowflake's capabilities? 

As Snowflake deviates from a traditional database system, it has separate computation and storage tiers, and both are very elastic. The core of Snowflake is its cloud platform; hence it can be dynamically scaled to meet the present workload, especially when raw compute power or concurrency needs fluctuate to meet the demand. Therefore, storage can grow or shrink over time for tables, meta-data, and databases. Furthermore, Snowflake is virtually unlimited and account administrators can choose the level of restrictions they want to put in place to defend it against rogue use. 

The best practices to follow while using Snowflake's modern data warehousing platform are-

  •  Enable data warehouses to auto-suspend

To stop credit consumption in your virtual warehouses, you must set them to auto-suspend. When you use this option, data warehouses will turn off after processing queries automatically. You can run queries to identify the warehouses that do not have auto-suspend turned on and set them to suspend themselves after queries are done processing.  

  • Enable data warehouses to auto-resume

In continuation to the above practice, when you set virtual warehouses to auto-suspend, you should also set them to auto-resume by setting appropriate time-out limits. If you don't select the auto-resume option, users will not be able to query the system. You must run queries to find out which warehouses do not have auto-resume enabled and set time limits to the auto-suspend action. 

  •  Set appropriate timeouts for workloads

All the virtual data warehouses must be enabled with appropriate timeouts for their particular workloads. For instance, in warehouses involved in the task, data loading, and ETL/ELT, keep timeouts for suspending them immediately upon completion. For BI and SELECT query warehouses, keep suspension timeout to 10 minutes to ensure that data caches stay warm for frequent access by end-users. For warehouses having DevOps, DataOps, and data science databases, suspension timeout can be kept at 5 minutes to give priority to ad hoc and highly unique queries instead of warm caches.  

  • Keep account statement timeouts

Using account statement timeout parameters like queued timeout and timeout commands enables you to automatically stop queries that take a long time to execute because of a frozen cluster or user error. You can even customize the session, account, data warehouse, and user timeout levels for your long-running queries according to your data strategies. 

  • Check credit usage average weekly-

When engaging in lengthy projects, your credit usage records can be difficult to keep track of, especially in scenarios where your credit exceeds the desired amount. A useful method is to create a query that checks your seven-day average of credit usage and finds out the weeks that exceed the usage. This method can enlighten you on the tasks that are taking up too much space and time but are not being performed efficiently. 

  • Monitor data warehouses crossing billing threshold-

In a practice similar to the one above, you can design a query to look into your warehouses and check services whose costs comprise a higher percentage of the workload. For an account, Snowflake charges 10% of credit consumption above the daily virtual warehouse credit amount. Creating a query to investigate rises above that threshold can help you keep your credit consumption below the daily allowance.  

  • Remove unused tables in the warehouses-

Unused tables are the best candidates to be dropped out of the system. Before removing a table, just ensure that no one is querying them. It is a good practice to check all the tables before deleting them. This tip is specific from a database point of view, so look at tables across all your databases.  

  • Drop users that are dormant or inactive-

There will be 2 kinds of users in your Snowflake ecosystem. The first type will log in often but perform no activities. The second type will have never logged in to Snowflake. Both types of users can be dropped from Snowflake. 

  • Detect data warehouses having no resource monitors-

Resource monitors are extremely useful tools to monitor and control workload budgets and avoid unexpected resource spikes. Resource monitors can be used for checking both user and service account usage in Snowflake. Your workloads must be warehoused separately for ELT, BI, data science, reporting, and so on. Depending on credit quotas, data warehouses have daily, weekly, monthly, yearly and total billings. You can create a query to identify all the data warehouses that do not have a resource monitor.  

  • Apply resource monitors to all the data warehouses-

After finding out the data warehouses that do not have a resource monitor, you can use UI or SQL and apply your resource monitor policy. You can then set a preference in your settings to notify you when your consumptions reach a low or high threshold. You can suspend the account or data warehouse in case the resource monitor detects a higher rate of consumption than your upper limit. 

Snowflake is a unique 3 layered and modern data warehouse. To make the most of its services, these practices can prove to be useful to you. Exploring and learning more about Snowflake through experimentation and re-reading the comprehensive documentation of Snowflake can help you find the little hacks that make using Snowflake a better experience for you. Until then, the above practices cover all the necessary points you should keep in mind to avoid overspending and perform your queries efficiently. 

Topics: Big Data & Data Science