Data engineering is one of the most in-demand and high-paying tech jobs nowadays, and, no wonder why; the knowledge a data engineer needs to acquire in order to be proficient can be quite overwhelming. It is not only the general software development and best practices, but also the know-how of the different products and services the cloud vendors offer.
In this 2-part blog series, you will learn some of the top services used by data engineers in one of the main cloud-computing platforms: Microsoft Azure . In this first part of the series, we will go through the main characteristics of relational-database services , and in part 2, we will review services focused on data storage, integration and analytics .
Relational-database services Within this category of services, there are two subcategories: one is focused to support Microsoft SQL Server databases* and the other to support open-source databases.
* Microsoft uses the term AZ SQL throughout its documentation to refer to this family of database services in Azure
Microsoft Azure relational-database services Microsoft SQL Server based database services SQL Server on Azure Virtual Machines (VMs) This is an infrastructure-as-a-service (IaaS) option. It is basically a Virtual Machine in Azure , which allows you to install a Microsoft SQL Server instance. If you need to have as much control as possible over your database server, this is the way to go. You can set up the Virtual Machine based on the Microsoft SQL Server version’s hardware requirements that you will install on it. This makes it a good option for a lift-and-shift migration of existing on-premises Microsoft SQL Server installations to the cloud. Just keep in mind that with great power, comes great responsibility. This means that you take full responsibility for things such as the operating system, licenses, updates, patching , and network controls, among others.
In preparation for leveraging Azure VMs effectively, you can benefit from obtaining relevant certifications. In fact, there are platforms like ExamSnap where you can learn how to manage Azure services proficiently. Also, it gives you the skills necessary for a successful implementation.
Azure SQL Managed Instance (MI) This is a platform-as-a-service (PaaS) option. You have less control over the service and also less responsibility since this is a fully managed platform and also an evergreen one, which means that Microsoft is constantly applying small and incremental updates to it without impacting its availability. You do not need to worry about planning the database server updates as you would with the SQL Server on Azure Virtual Machines option.
Something to mention here is that it is not the same as an on-premises Microsoft SQL Server Instance. Nevertheless, it offers near-100% compatibility, which makes it another good option for a lift-and-shift migration. It is always recommended to check the differences between an on-premises Microsoft SQL Server instance and an AZ SQL Managed Instance .
As expected due to its PaaS nature, this service offers fully automated updates, backups, and recovery.
Azure SQL Database This is another platform-as-a-service (PaaS) option. It provides the latest features of Microsoft SQL Server Engine. This is so to the point that they release new features first to Azure SQL Database and then to SQL Server itself. If you’re looking to always have the latest capabilities of Microsoft SQL Server Engine, this is the option to choose.
For Azure SQL Database, we have two deployment models to choose from, depending on our needs: Single Database and Elastic Pool. Both of them offer fully automated updates, backups, and recovery.
Single Database This is a fully managed (via a logical server ) and isolated database, similar to a contained database in the Microsoft on-premises world. If you are not familiar with this type of database, it means that they are isolated from other databases and from the instance of SQL Server that hosts the database. For instance, much of the database metadata is stored in the database itself and not in the master database. Due to this isolation characteristic, any instance-level access or features are not available.
This database has its own set of resources, and they can be dynamically scaled up or down, depending on your needs. The Serverless feature is also available in case you would like Azure to automatically scale the resources as required.
The service is designed to quickly set up a database in the cloud and have it available as soon as possible without spending too much time on configurations.
Elastic Pool If you have multiple databases with unpredictable and variable workloads, this is your best choice. This is a pool that allows you to have between 1 and up to 5,000 databases per server, all sharing a set number of resources (CPU, memory, etc.). The main benefit of the service is the cost-benefit it offers. The resources are assured and available to your databases when they need them. This helps to avoid over provisioning your database server to respond appropriately to the peak periods of usage. Also, if you need one of your databases within the pool to have its own resources, it can be removed and converted into a Single Database with minimum downtime (on the order of seconds).
Regarding the flexibility of the service, you can allocate or deallocate resources at any time, and even set the minimum and maximum resources for the databases.
Azure SQL Edge This is a database service oriented to IoT applications , and supports the streaming, processing and analysis of different types of data, such as relational, non-relational, graph and time-series one.
One great feature is the T-SQL support, which will help you cut down the learning curve when you start developing your new Azure SQL Edge-based application.
Before reviewing the open-source database services, below you can find a use-case comparison table to help you decide when to use one over the other:
AZ SQL services - common use cases comparison Open-source database services Azure Database for MySQL This is a platform-as-a-service (PaaS) option based on the MySQL community edition. It offers two deployment modes: Flexible Server and Single Server . Both provide some level of flexibility. However, the Flexible Server allows you to have more control over the database management functions and configuration settings, it runs on a Linux-based server and supports MySQL 5.7 and 8.0; the Single Server is designed for minimal customization, and it runs on a Windows-based server.
Both deployment modes offer high availability, automatic backups, high performance, enterprise-level security, and elastic scaling within seconds, among other great features for different needs. Yet, Microsoft has decided to focus its development efforts on the Flexible Server service, as the Single Server deployment mode is about to be retired.
Azure Database for MariaDB This is a platform-as-a-service (PaaS) option that currently supports two specific versions of the MariaDB community edition: 10.2 and 10.3. Please note that Microsoft manages for you only the patches updates. Major and Minor* updates are not supported.
The service is very similar to the Single Server deployment mode of MySQL in the sense that it aims to free up developers from having to deal with too many database administrative tasks . It requires minimum administration, and you can start with the basic service tier to try out its capabilities and adjust it based on the workloads thanks to its dynamic scaling feature.
* MariaDB follows the version naming convention X.Y.Z. X is the major version, Y is the minor version, and Z is the patch version.
Azure Database for PostgreSQL This is a platform-as-a-service (PaaS) option that offers two deployment modes just like Azure Database for MySQL: Single Server and Flexible Server . The first one supports versions 10 and 11, and the second one supports versions 11, 12, 13, and 14. Single Server mode also works with Windows as its underlying operating system, and Flexible Server with Linux. As with the other open-source options, the latter provides maximum flexibility and control over the databases.
Last, but not least, here it is a use-case comparison table of the open-source database services:
Open-Source database services in Azure - common use cases comparison Conclusion Microsoft Azure has a good offering of services to support relational databases. Most of it is in the form of PaaS, aiming to free up companies from having too many database administration tasks, and to allow them to focus on what really matters to them and generate value for their businesses.
Open-source based services could seem limited, but they will probably increase, and provide better features due to the increasing adoption of open-source systems in the IT industry.