Azure Synapse Analytics is a scalable cloud-based data warehousing solution from Microsoft. It is also called the advanced version of Azure SQL data warehouse.
Along with all the features of a data warehouse, Azure Synapse offers you:
- ETL & Data Integration – Develop and schedule data pipelines with Azure Data Factory
- Data Analytics – Perform SQL data analytics and Bigdata Analytics with Spark
- Machine Learning – Develop AI & ML models on data
- Business intelligence – Build Interactive reports through PowerBI
- Choice of language - T-SQL, Python, Scala, SparkSQL, & .NET
Look at the below diagram and see how these services are connected and useful in the project.
This architecture diagram shows that –
- We can bring data from any relational or non-relational database. 90+ data sources are available.
- Raw Data can be store on Azure Blob storage/ADLS Gen2
- Store data in Azure SQL data warehouse
- Then you can perform data analytics through analysis services
- Then share this data with PowerBI for interactive reports.
- Data orchestration and scheduling will be done through the Data factory.
All these services are offered as a bundle with Azure Synapse Analytics. Furthermore, we can perform all these activities in 1 single tool called “Azure Synapse Studio.” This doesn’t mean that you will be charged for all services at all the time.
Azure offers you two types of cost models:
- On-demand - serverless model: – Provide resources only when you need them and discard them once done. You can scale up and down as per requirement. Also, you can save costs if you need it for a shorter duration and limited time.
- Pre-provisioned - dedicated model: – Resources will be pre-provisioned and dedicated to you. It will save you time to directly run your job as there is no need to provision and configure resources. Save cost if you need it for a longer duration.
Now let’s talk about the core service of Azure synapse, which is Data warehousing. This service is also offered as “Dedicated SQL pool (formerly SQL DW).”
Dedicated SQL pool (formerly SQL DW)
in Azure Synapse
Dedicated SQL pool (formerly SQL DW): - It separates the Storage and Computes layer of the SQL database, allowing each to scale independently.
As a result, you can scale compute resources up and down to meet performance demands; if you have more concurrent users or running complex queries, you can scale up.
This also offers you the flexibility to pause the compute resources if you are not using them and resume them whenever required. This will save huge costs as you will be charged only for data stored in the storage layer.
Cost Model for Dedicated SQL pool (formerly SQL DW):
Storage resource: - Data storage will be charged in 1 TB allocation. If your data grow beyond 1TB, it will automatically scale to 2 TB.
Currently, it costs around 1520 INR per TB data for 1 Month (Subject to change as per region and Microsoft policies)
Compute resource: - Compute cost and performance are measured in DWU (Data warehouse units). It offers you to choose your compute requirement from “DW100c - DW3000c”.
Currently, DW100c will cost you around 80-100 INR per hour (Subject to change as per region and Microsoft policies)
In the above diagram, you can see other types of pools; let’s take a quick look:
Serverless SQL Pool: - You can use this for analysis when your data resides at some other storage layer, and you do not want a dedicated SQL database to store data. You can run the queries on demand, and it will charge you based on the data processed.
Currently, it charges around 330 INR per TB data processing (Subject to change as per region and Microsoft policies)
Apache Spark Pool: - To run spark jobs and perform big data analytics. The cost will be as per the vCore configuration you choose to run your spark jobs.
Massively parallel processing (MPP): - One of the best features of Azure synapse Data warehousing is the support of MPP database technology, which allows it to manage analytical workloads and process large volumes of data efficiently. In contrast to transactional databases, which store rows in a table as an object, MPP databases store each column as an object. MPP databases also distribute data across many nodes that operate in parallel to process different portions of queries. This database architecture facilitates complex, long-running analytical processes.
How to choose the best Data warehousing solution:
Azure Synapse vs. Azure SQL Database
We recommend the below factors to consider while choosing the data warehousing solution for your project requirement.
Database Size: - For smaller data sizes less than 1TB, which can grow up to 8 TB, consider Azure SQL db. If your requirement is min 1TB and can grow up to 100-200TB, then go for Azura Synapse.
Workload Type: - Azure synapse is best for analytical processing (OLAP) as it supports MPP and distributes the load into multiple nodes to process data. If your requirement is transactional processing (OLTP) or reading and writing a small amount of data, then consider Azure SQL db.
PolyBase: - Azure Synapse supports the PolyBase feature, which allows access to query and import external data using T-SQL. This is quite useful while importing data from Azure Blob storage or Data lake. Azure SQL database doesn’t support PolyBase at this stage.
Conclusion: - Let me conclude and suggest to you from my practical experience with Azure Synapse analytics and different data warehousing solutions.
- Azure SQL database for data warehousing is better to choose for smaller data size, simple query processing, and OLTP requirement. It provides ease of maintenance and predictable cost.
- Azure Synapse SQL Pool for data warehousing is better for large datasets, complex query processing, and OLAP requirements. If you know that your data size will grow or you will have more analytical jobs or data processing requirements in the future, then this is best. Also, you can start with a lower configuration and scale up based on demand.
- Azure Synapse Analytics is the complete package for data ingestion, analysis, modelling, orchestration, and business intelligence. If you are using multiple services of azure separately and looking for managed services with all these features, then go for complete Azure synapse Analytics.
- If your requirement is only data warehousing, then you can choose the data warehousing service of Azure synapse (Dedicated SQL pool (formerly SQL DW)) separately without looking for a complete Azure synapse Analytics solution.