AWS RDS
AWS RDS
Managed relational database service. Handles provisioning, patching, backups, and failover. Supports PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, and Aurora.
Core Concepts
| Concept | Description |
|---|---|
| DB Instance | Isolated database environment. Runs one DB engine. |
| DB Instance Class | CPU/memory size (db.t3.micro → db.r6g.16xlarge) |
| Storage Type | gp2 (SSD), gp3 (better SSD), io1 (provisioned IOPS for high perf) |
| Multi-AZ | Synchronous standby in different AZ — automatic failover |
| Read Replica | Async copy for read scaling. Can be in different region. |
| Parameter Group | DB engine configuration (e.g., max_connections, work_mem) |
| Option Group | Engine-specific features (Oracle/SQL Server only) |
Multi-AZ vs Read Replicas
| Multi-AZ | Read Replica | |
|---|---|---|
| Purpose | HA + failover | Read scaling |
| Replication | Synchronous (no lag) | Asynchronous (some lag) |
| Failover | Automatic (~60s) | Manual promotion |
| Can serve reads? | No (standby is passive) | Yes |
| Cross-region? | No (same region, different AZ) | Yes |
| Cost | 2x instance cost | Additional instance cost |
Use Multi-AZ for production availability. Use Read Replicas to offload read traffic.
Automated Backups vs Snapshots
| Automated Backup | Manual Snapshot | |
|---|---|---|
| Retention | 1–35 days (configurable) | Indefinite |
| Granularity | Point-in-time (5-min restore) | Specific moment |
| Cost | Free up to DB size | S3 storage cost |
| Deleted with DB? | Yes (unless retain specified) | No |
Restore: Creates a NEW DB instance (not in-place). DNS changes needed.
Connection Management
RDS has a fixed max connections based on instance size. Common issue: connection exhaustion.
db.t3.micro: ~85 max connections
db.r6g.large: ~3000 max connections
Solutions:
- RDS Proxy: Connection pooler in front of RDS. Reduces DB connections by pooling app connections. Integrates with IAM auth and Secrets Manager.
- PgBouncer / pgpool: Self-managed connection pooler (for PostgreSQL)
- Right-size your instance class
RDS Proxy
App instances → RDS Proxy → RDS Instance
(thousands of connections) (pool of connections) (max ~500 connections)
Benefits:
- Reduces failover time (proxy maintains pool during Multi-AZ failover — apps don't reconnect)
- IAM authentication support
- Secrets Manager integration
- Useful for Lambda → RDS (Lambda spins up thousands of short-lived functions)
Aurora — AWS-optimized RDS
Aurora is AWS's cloud-native relational DB (compatible with MySQL/PostgreSQL).
| Feature | RDS PostgreSQL | Aurora PostgreSQL |
|---|---|---|
| Storage scaling | Manual | Auto-scales 10GB–128TB |
| Replicas | Up to 5 | Up to 15 |
| Replication | Async | < 10ms lag |
| Failover | ~60s | < 30s |
| Cost | Baseline | ~20-30% more, but cheaper at scale |
| Serverless | No | Aurora Serverless v2 (auto-scale capacity) |
Aurora Global Database: Primary region + up to 5 read-only secondary regions. < 1s replication lag globally.
Security
VPC → Private Subnets → Security Group → RDS Instance
- Always deploy in private subnet (no public IP)
- Security group: allow only from app servers' security group on port 5432/3306
- Encryption at rest: KMS (enabled at creation, can't enable after)
- Encryption in transit: SSL/TLS (enforce with
rds.force_ssl=1parameter) - IAM authentication: Use IAM token instead of password (rotate automatically)
- Secrets Manager: Store DB password, auto-rotate every N days
Monitoring
| Metric | Normal | Alert when |
|---|---|---|
| CPU Utilization | < 80% | > 90% sustained |
| DB Connections | < 80% of max | > 90% of max |
| Free Storage | > 20% | < 10% |
| Read/Write Latency | < 5ms | > 20ms |
| Replica Lag | < 1s | > 10s |
Enhanced Monitoring: OS-level metrics (per-process CPU, memory). 50 vs 60s granularity. Performance Insights: Query-level analysis — top SQL statements by load.
Interview Talking Points
"When would you use Multi-AZ vs Read Replicas?" Multi-AZ for HA (if primary dies, failover in 60s). Read Replicas for horizontal read scaling (reporting, analytics, read-heavy workloads). Can use both together.
"What's Aurora vs RDS PostgreSQL?" Aurora is a drop-in replacement with higher availability (15 replicas < 10ms lag), auto-scaling storage, faster failover, and Aurora Serverless for variable workloads. More expensive per instance but cheaper at scale due to better hardware utilization.
"Lambda to RDS — what's the problem?" Lambda auto-scales to thousands of concurrent executions, each opening a DB connection. RDS has fixed max connections → exhaustion. Solution: RDS Proxy pools connections; Lambda connects to proxy, not RDS directly.
Related
- [[AWS/VPC]] — RDS must be in private subnet
- [[AWS/IAM]] — IAM auth for RDS, Secrets Manager
- [[AWS/Lambda]] — Lambda to RDS via RDS Proxy
- [[System Design/Backend 101/Data Base/Basics]] — DB fundamentals
- [[Distributed Systems Concepts]] — replication, consistency