RDS Cost Optimisation: Where Database Bills Explode

Q: What I check in an actual audit?

- Per-instance CPU, memory, IOPS, connections (14d) - Multi-AZ flag on every non-prod instance - Snapshot list older than 365 days - Storage type (gp2 vs gp3) - Reservation coverage (Cost Explorer → Reservations) - Aurora cluster mode and serverless config

RDS is usually the second-largest line item on an AWS bill and the one teams touch the least. Here's where the savings actually live.

By Andrii Votiakov on 2026-03-03 · More posts

RDS is the line item engineers fear to touch. Wrong move, prod down, weekend ruined. So it sits, over-provisioned, year after year. But the savings are real and the operations are safer than they look. Migrating RDS instances to Graviton adds another 20% saving on top of right-sizing — it's one of the lowest-effort wins in the database tier. And if you're running a self-managed Postgres setup alongside RDS, the Postgres cost optimisation guide covers the storage and query-tuning side.

Quick answer

Most RDS bills carry 30-50% waste from over-sized instances, unnecessary Multi-AZ on non-prod, unattached snapshots, and missed Reserved Instance / Savings Plan opportunities. A thorough RDS audit typically cuts the database line item by 30-45% without sacrificing reliability.

The biggest waste sources, ranked

1. Over-sized instances (30-50% saving)

Same logic as EC2: pull 14 days of CPU, memory, IOPS and connection count. A db.r6i.4xlarge running at 12% CPU and 30% memory is a db.r6i.xlarge waiting to happen.

Postgres-specific tells:

pg_stat_activity shows < 30 active connections and you have 1,000 max — you're paying for memory you don't use
Buffer cache hit ratio > 99.5% with 64 GB RAM — you'd be fine with 16 GB
IOPS provisioned at 30k, actual usage 2k — drop to gp3 with sane baselines

MySQL-specific tells:

innodb_buffer_pool_size at 75% of huge RAM, hit rate 99.9% — your dataset fits in much less

2. Multi-AZ on dev and staging ($200-2k/month each)

Multi-AZ doubles your instance cost. It exists for production resilience. There's no good reason to have it on a dev database that gets restored from snapshot if it dies.

Audit: filter RDS by Environment tag (or instance name pattern) and turn off Multi-AZ on anything not prod. Saves an instance hour for every instance hour you remove.

3. Old snapshots ($100-2k/month)

Manual snapshots accumulate forever unless cleaned up. AWS doesn't tell you. Run:

aws rds describe-db-snapshots \
  --snapshot-type manual \
  --query 'DBSnapshots[?SnapshotCreateTime<`2025-01-01`].[DBSnapshotIdentifier,SnapshotCreateTime,AllocatedStorage]' \
  --output table

Anything older than your real RPO needs to go (after a quick check with ops/legal). I've seen accounts with 6 TB of snapshots from 2021 — that's $600/month for nothing.

4. Burstable instances draining credits

db.t3 and db.t4g are great for low-traffic workloads. Disastrous if your workload bursts above the baseline for hours. CPU credit balance graph tells you instantly: if it spends time at zero, you're throttled. Move to a fixed-performance instance (m or r class) and the database often runs faster on a smaller fixed instance than a starved burstable one.

5. gp2 instead of gp3

gp2 was the default for years. gp3 is 20% cheaper at the same throughput and lets you tune IOPS and throughput independently of size. There's no reason to keep new gp2 volumes in 2026. Existing gp2 → gp3 migration is online for most engines.

6. Provisioned IOPS (io1/io2) you don't need

Teams move to Provisioned IOPS when they hit a performance issue, then never re-evaluate. If your IOPS utilisation sits below 50% of provisioned, drop it. If it sits below 30%, switch back to gp3 with explicit IOPS provisioning — usually half the cost.

7. Reserved Instances or Savings Plans not bought

After right-sizing, RDS RI/SP for 1 year all-upfront pays back in roughly 6 months for steady-state production. 30-40% off on-demand. The only RDS workloads I'd not RI: short-lived dev, pre-launch products, anything you're about to migrate. The Savings Plans vs Reserved Instances guide covers the exact decision tree for which type to buy.

Aurora-specific moves

If you're on Aurora rather than vanilla RDS:

Serverless v2 for spiky or low-utilisation databases: scales down to 0.5 ACU; great for staging and side projects. Watch the upper bound — set it explicitly so you don't pay for a runaway query.
I/O-Optimised is cheaper than standard if you do > 25% of monthly cost on I/O. Switch is one parameter change.
Replicas charged per hour — drop replicas you spun up "for safety" but never query.
Backtrack window is billed; if you don't use it, set to 0 hours.

What I check in an actual audit

Per-instance CPU, memory, IOPS, connections (14d)
Multi-AZ flag on every non-prod instance
Snapshot list older than 365 days
Storage type (gp2 vs gp3)
Reservation coverage (Cost Explorer → Reservations)
Aurora cluster mode and serverless config
Top queries by total time (pg_stat_statements) — sometimes a single bad query forces a too-big instance

Realistic numbers

On a recent client's bill (~$22k/month RDS):

Right-sizing 18 instances: $5,200/month
Multi-AZ off on 11 non-prod: $2,800/month
gp2 → gp3: $900/month
1-year RI on the new floor: $3,400/month

Total: $12,300/month, 56% of the original. Implementation took two weeks of part-time work.

If you want me to find the same in your bill on a pay-for-savings basis, book a call.