Demystifying Database Options in AWS

Written by Anwar Haq

In today's digital landscape, data isn't just growing; it's exploding in both volume and complexity. For organisations, this exponential growth brings a fundamental dilemma - choose the most suitable database management system for their data. Traditionally what many organisations have done is to pick a relational database management system and use it to store diverse types of data; be that financial records, application logs, XML documents, customer communications, or large binary objects (BLOBs). Primarily designed for structured data and transactional consistency, traditional relational databases can buckle under the pressure of scale, leading to issues like poor performance, low availability, management overhead, and high licensing and maintenance costs.

To overcome these problems businesses look for alternatives like NoSQL databases, data lakes, and data warehouses. This quest for options beyond relational databases has limitations, particularly smaller business. Firstly, there is often no in-house technical expertise to install, configure and manage various database systems. Secondly, investing and maintaining new equipment is expensive, and this is where cloud service providers like AWS, Microsoft Azure, and Google Cloud, among others, come in. This is precisely why cloud service providers like AWS, Azure, and Google Cloud among others. They enable businesses to explore state-of-the-art technologies without having to buy costly equipment, or needing in-depth expertise to install, configure and manage systems.

AWS is currently the leading provider, offering over 200 cloud services which include several services specifically for database management. In this article we will discuss different database management services that AWS offer and how to choose the right service for a use case. We will also look at some examples of how businesses have successfully used these services. But before we go into details of each service let’s step back and establish a foundational understanding of the different types of database systems available.

This article provides an overview of Amazon Web Services (AWS) offerings for database management, structured into two distinct parts:.

Part I - Relational (SQL) Databases on AWS: The first section will begin with a brief discussion on the fundamental types of database management systems. Following this, we will delve into the relational, or SQL, database offerings available within AWS. This will cover services designed for structured data that adheres to a predefined schema and leverages SQL for querying.
Part II - NoSQL Databases on AWS: The second part of the article will shift focus to NoSQL database management systems provided by AWS. This section will explore the diverse range of non-relational database services, designed to handle unstructured, semi-structured, and rapidly evolving data, often without strict schema requirements.

Types of database management systems

Database management systems (DBMS) are broadly categorized into two primary types: SQL (Relational Databases)and NoSQL (Non-Relational Databases). SQL is short for Structured Query Language, which is widely used for querying highly structured data in relational databases. NoSQL is short for Not Only SQL, which is a term used for database management systems that support all types of data including structured, semi-structured and non-structured data.

Let's take a look at key differences between SQL and NoSQL databases.

SQL

NoSQL

Type of data

Used to store structured data with well defined schema, which is stored as rows and columns.

Generally no well defined schema where data is stored in key-value pairs.

Normalisation

Data is stored in normalised tables (often to 3rd normal form).

Data is stored denormalised.

Access patterns

Access patterns do not need to be defined in advance. Relationship between different sets of data are enforced through referential constraints. Data can be retrieved from different related tables through complex joins.

Access patterns need to be defined in advance. Data across collections/tables is typically not joined in queries. Queries are generally key searches from a particular table/collection. Relationship between different sets of data is maintained through embedding.

Consistency

Data is strongly consistent.

Data is generally eventually consistent, by default.

Scaling

Systems typically scale vertically. If the system does not have enough resources to cope with the workload, we get a bigger box with more CPU/memory. Scaling is limited to the biggest box available.

Systems scale horizontally. If the system is unable to cope, we add another box to the cluster. Scaling in many cases is virtually limitless.

Sharding

Typically do not have built-in data sharding mechanism. Sharding requires technical effort both on database as well as application side.

Sharding is in built in the systems, which is transparent to the applications.

Query language

Data is queried using SQL.

Data is queried using various APIs and query languages like MQL, CQL, PartiQL etc. SQL is also supported in some systems as NoSQL databases support both structured and non/semi-structed data.

Typical use cases for SQL databases are online transaction processing, finance, ERP, and data warehouses. Examples of relational databases include PostgreSQL, MySQL, SQL Server, and Oracle. Typical use cases for NoSQL databases are content management, key-value stores, OLTP at scale, social media and messaging apps, gaming application, IOT applications, real time analytics etc. Examples of non-relational databases include MongoDB, Cassandra, Neo4J, and Redis.

Database management in AWS

When deploying databases on AWS, it's vital to grasp the Shared Responsibility Model. This framework clearly delineates what AWS secures versus what customers secure. AWS handles the security of the cloud, meaning the global infrastructure that runs all services. Customers, conversely, are responsible for the security in the cloud, which encompasses their data, applications, and network configurations. This model applies across all AWS offerings, including database services. However, the specific customer responsibilities shift based on the particular database service category we select, and we can generally organize these into four distinct types.

Serverless: Requires least amount of management by users.
AWS Managed: Customer is responsible for things like provisioning and networking of resources.
Self Managed: Customer is responsible for most of the management. Similar to a database on a VM on-prem.
RDS Custom: This is a hybrid of AWS managed and self managed services. Customer has more control and management responsibilities than AWS managed, but less than self managed. This is available only available for RDS Oracle and RDS SQL Server.

Following table shows how responsibilities are shared for each type of option.

Feature

Self Managed

AWS Managed

Custom

Serverless

Application optimization

Customer

Network, Firewall configuration

Customer

AWS

Scaling

Customer

Shared

AWS

High availability

Customer

AWS

Shared*

AWS

Database backups

Customer

AWS

Shared*

AWS

Database software patching

Customer

AWS

Shared

AWS

Database software install

Customer

AWS

Shared

AWS

OS patching

Customer

AWS

Shared*

AWS

OS installation

Customer

AWS

Shared*

AWS

Server maintenance

AWS

* There are some differences between RDS for Oracle and SQL Server.

AWS services for database management

AWS, as we have discussed above, delivers a vast portfolio of services, including many for managing databases. A significant benefit of these database services is their inherent ability to integrate with other AWS services. This allows us to readily harness existing capabilities like encryption, monitoring, backups, and identity and access management. AWS database services, in particular, integrate easily with the following native AWS services:

Key Management System (KMS) for encryption of data at rest
CloudWatch for monitoring and observation
CloudFormation for deployment of infrastructure through code (IaC)
Identity and Access Management (IAM) for access control
AWS Backup for automatic and centralised backup of databases
Secrets Manager for securely storing credentials
Simple Storage Service (S3) for storing backups
CloudTrail for auditing, compliance and security

In following sections we will briefly take a look at features, use cases and limitation of different services in AWS for database management.

SQL

For over 40 years, SQL, or relational database management systems (RDBMS), have been the go-to choice for storing application data. They've consistently met the diverse needs of businesses, regardless of size. Over time, these database engines have evolved, introducing powerful features that enhance management, boost performance, streamline operations, and improve both availability and security. Because of their long-standing popularity, expertise in relational databases is widespread and easily accessible.

Relational databases remain an excellent choice, particularly when:

Access patterns can't be defined in advance.
Data to be stored is highly structured.
Strong consistency is always required for data access.

Relational Database Service (RDS)

RDS is a fully managed AWS service for relational databases. RDS supports following database engines:

PostgreSQL
MySQL
MariaDB
SQL Server
Oracle
DB2

Amazon RDS application architecture (source)

While standard Amazon RDS simplifies database management, it doesn't provide direct access to the operating system or full sysadmin/DBA privileges. For scenarios requiring OS-level access (e.g., to install custom software) or comprehensive database control, Amazon RDS Custom is available. Currently, RDS Custom supports only Oracle and SQL Server databases.

Important Features

Amazon RDS offers several key features that simplify database management and enhance reliability:

Automated Maintenance: AWS automatically handles maintenance for RDS databases, removing the need for manual patching. We can opt for automatic minor version upgrades, which AWS will perform during the designated maintenance window. For major version upgrades, AWS still manages the process, but we'll need to initiate them manually, giving us precise control over these significant changes.
Automated Backups: The automated backup feature ensures our databases are regularly protected. This includes daily full snapshots and continuous transaction log backups, providing comprehensive data recovery options.
Point-in-Time Recovery (PITR): RDS supports point-in-time recovery, allowing us to restore the database to any specific moment within a retention period of up to 35 days.
Performance Insights: This feature helps us diagnose performance issues in our database instances by providing detailed insights into database load and resource consumption. (Note: Performance Insights is not available for RDS Custom instances).
Storage Autoscaling: We can configure RDS instances to automatically scale storage capacity as the database grows. This prevents outages due to full storage. (Note: Storage autoscaling has limited support in RDS Custom instances).

Use Cases

Homogeneous Migration: For organizations utilizing an on-premises relational database engine, a homogeneous migration to a compatible Amazon RDS database can substantially decrease operational management overhead. This transition is also facilitated by the growing support for Amazon RDS among various commercial off-the-shelf (COTS) applications.
Heterogeneous Migration: Faced with the substantial licensing costs of commercial databases like Oracle and SQL Server, consider a heterogeneous migration to more cost-effective options like Amazon RDS for PostgreSQL, MySQL, or MariaDB. AWS streamlines this process by offering the Schema Conversion Tool (SCT), specifically designed to help convert database schema and application code from one engine to another.
OLTP: SQL databases are the right choice for most OLTP workloads like ERP, CRM, health records etc.

Who uses RDS: AWS RDS is a very popular service. Its customer include big companies like Toyota, Atlassian, SalesForce, and AT&T.

Limitations to Consider

While Amazon RDS offers significant advantages, it's important to be aware of its limitations:

No Operating System (OS) Access: RDS is a managed service, meaning we cannot directly access the underlying operating system. This prevents us from choosing a specific OS or installing any custom software, which impacts features like SQL Server's CLR (Common Language Runtime) that require OS-level dependencies.
Restricted Database Access: We do not receive full sysadmin (SQL Server) or SYSDBA (Oracle) privileges. This means applications or tasks that depend on these elevated administrative permissions will not function correctly with standard RDS.
Limited Database Versions: We can only select from the database engine versions officially supported by RDS. This might not be suitable for legacy applications that rely on older, unsupported database versions.
Maximum Storage Size: Depending on the database engine, the maximum storage capacity is limited to either 64 TiB or 128 TiB.

Aurora

While Amazon Aurora is technically a part of RDS, its significant architectural differences make it a highly appealing database choice. Aurora offers both fully managed and serverless deployment options, and it is compatible with these database engines:

PostgreSQL
MySQL

Important Features

As a member of the Amazon RDS family, Aurora inherits all the features available to standard RDS databases. However, it distinguishes itself with the following crucial enhancements:

Advanced Storage Architecture: Aurora's most significant feature is its highly resilient storage. For every Aurora cluster, six copies of the data are synchronously stored across three distinct Availability Zones (AZs). This design ensures exceptional data availability and fault tolerance, significantly reducing the risk of data loss.
Aurora Global Database: This feature streamlines disaster recovery by enabling us to set up Aurora clusters that replicate data across multiple AWS Regions. This provides rapid recovery capabilities and low-latency access for geographically dispersed applications.
Read Replica Autoscaling: Aurora allows us to define autoscaling rules to automatically add or remove read replicas based on the workload demands or a predefined schedule. An Aurora cluster can support up to 15 read replicas, efficiently distributing read traffic.
Zero-ETL Integration with Redshift: This powerful integration allows data to be loaded from Aurora into Amazon Redshift in near real-time. This simplifies and accelerates large-scale analytical processing, enabling us to gain insights from the operational data almost instantly.
Aurora Limitless: (Currently available in select regions) Aurora Limitless extends scaling capabilities beyond traditional throughput and storage limits by distributing the data horizontally across multiple shards. This is designed for applications with extremely high transaction volumes and massive datasets.

Aurora architecture (source)

Use cases

Use cases mentioned under the RDS also apply to Aurora. Additional use case are listed below.

Cost Saving: Many commercial applications support multiple database engines including Aurora PostgreSQL and MySQL, which makes it easy to migrate off expensive commercial databases.
Migration from SQL Server: Babelfish is an open source tool that helps us run TSQL queries against PostgreSQL as if it were a SQL Server database. This makes it easy to migrate from SQL Server to Aurora PostgreSQL.
Variable Workloads: For variable and dev/test workloads, Aurora serverless can help reduce the compute cost. Aurora serverless can scale to 0.
Limitless Storage: Aurora limitless is a PostgreSQL compatible implementation of Aurora. It allows us to store virtually unlimited amount of data in the cluster.

Who uses Aurora: Aurora has a large user base including organisations like UNO, Amdocs, and VmWare.

Limitations to Consider

While Amazon Aurora offers high performance and scalability, it's important to be aware of its specific limitations:

No Operating System (OS) Access: Like standard Amazon RDS, Aurora is a fully managed service, meaning we cannot directly access the underlying operating system. This restricts our ability to install custom software or perform OS-level configurations.
Limited Database Versions: We can only select from the database engine versions officially supported by Aurora. This might not suit applications dependent on specific, unsupported older or very new database versions.
PostgreSQL Extension Limitations: While Aurora PostgreSQL supports most common extensions, it has limitations. For instance, it supports only a limited number of Foreign Data Wrappers (FDWs), which might impact integrations requiring specific external data sources.
Maximum Storage Size: The storage size for most Aurora clusters (excluding Aurora Limitless) is capped at 128 TiB.

Redshift

Amazon Redshift is a fully managed, PostgreSQL-compatible service built for data warehousing. It allows us to store and analyze petabytes of data, leveraging a massively parallel processing (MPP) architecture for superior performance at a fraction of the cost of traditional commercial data warehouses. Redshift offers both provisioned (managed) and serverless deployment options.

Open image-20250406-005306.png

Redshift architecture (source)

Important Features

Amazon Redshift offers several key features that make it a powerful choice for data warehousing and analytics:

Massively Parallel Processing (MPP): Redshift's core strength lies in its MPP architecture. This allows it to process vast datasets by distributing the workload across many compute nodes (up to 128), enabling efficient parallel processing and faster query execution.
Zero-ETL Integrations: With zero-ETL integrations, we can load data into Redshift in near real-time from various sources, such as Amazon RDS, Aurora, DynamoDB, and even Salesforce, without writing any custom code. This streamlines data ingestion for analytics.
Workload Management (WLM) and Concurrency Scaling: Redshift's WLM feature enables us to assign different workloads to dedicated queues, preventing short, interactive queries from being held up by long-running, resource-intensive ones. Furthermore, Concurrency Scaling automatically adds additional compute resources during periods of high usage, ensuring consistent performance even under peak loads.
Redshift Spectrum: This capability allows us to directly query structured and semi-structured data stored in Amazon S3 buckets without first loading it into our Redshift cluster. Since S3 storage is significantly more cost-effective for large volumes of data, this feature helps us save costs by storing infrequently accessed historical data in S3 while still making it queryable.
Secure Data Sharing: Amazon Redshift allows us to securely share data with other Redshift clusters and various AWS services without the need to copy the data. For users accessing the shared data, it simply appears as another table, simplifying data access and collaboration.

Use Cases

Following are some important use cases for Redshift.

Massively Parallel Query: Because of its massively parallel query feature and petabyte scale storage size Redshift is suitable for any Data Warehousing workload.
Streaming Data Analytics: Redshift is integrated with Amazon Kinesis. Amazon Kinesis Data Firehose can deliver realtime streaming data to Redshift tables, making it a good option for streaming data analytics.
Machine Learning: Redshift is a good option to store data from machine learning because of Redshift ML. With Amazon Redshift ML, we can use data in Redshift cluster to train models with Amazon SageMaker AI.
Data Sharing: We can use Redshift for data analytics as service using its data sharing feature.

Who uses Redshift: Numerous companies including Merck, Careem, and Pizza Hut.

Limitations to Consider

Not suitable for OLTP: Amazon Redshift is not suitable for OLTP workloads. Consider RDS or NoSQL database services instead.

In Part II of this article we will take a look at non-relational database options in AWS.

Demystifying Database Options in AWS - Part I

SQL

Relational Database Service (RDS)

Aurora

Redshift