RDS Cost Optimisation: Where Database Bills Explode
RDS is usually the second-largest line item on an AWS bill and the one teams touch the least. Here's where the savings actually live.
By Andrii Votiakov on
RDS is the line item engineers fear to touch. Wrong move, prod down, weekend ruined. So it sits, over-provisioned, year after year. But the savings are real and the operations are safer than they look. Migrating RDS instances to Graviton adds another 20% saving on top of right-sizing — it's one of the lowest-effort wins in the database tier. And if you're running a self-managed Postgres setup alongside RDS, the Postgres cost optimisation guide covers the storage and query-tuning side.
Quick answer
Most RDS bills carry 30-50% waste from over-sized instances, unnecessary Multi-AZ on non-prod, unattached snapshots, and missed Reserved Instance / Savings Plan opportunities. A thorough RDS audit typically cuts the database line item by 30-45% without sacrificing reliability.
The biggest waste sources, ranked
1. Over-sized instances (30-50% saving)
Same logic as EC2: pull 14 days of CPU, memory, IOPS and connection count. A db.r6i.4xlarge running at 12% CPU and 30% memory is a db.r6i.xlarge waiting to happen.
Postgres-specific tells:
pg_stat_activityshows < 30 active connections and you have 1,000 max — you're paying for memory you don't use- Buffer cache hit ratio > 99.5% with 64 GB RAM — you'd be fine with 16 GB
- IOPS provisioned at 30k, actual usage 2k — drop to gp3 with sane baselines
MySQL-specific tells:
innodb_buffer_pool_sizeat 75% of huge RAM, hit rate 99.9% — your dataset fits in much less
2. Multi-AZ on dev and staging ($200-2k/month each)
Multi-AZ doubles your instance cost. It exists for production resilience. There's no good reason to have it on a dev database that gets restored from snapshot if it dies.
Audit: filter RDS by Environment tag (or instance name pattern) and turn off Multi-AZ on anything not prod. Saves an instance hour for every instance hour you remove.
3. Old snapshots ($100-2k/month)
Manual snapshots accumulate forever unless cleaned up. AWS doesn't tell you. Run:
aws rds describe-db-snapshots \
--snapshot-type manual \
--query 'DBSnapshots[?SnapshotCreateTime<`2025-01-01`].[DBSnapshotIdentifier,SnapshotCreateTime,AllocatedStorage]' \
--output table
Anything older than your real RPO needs to go (after a quick check with ops/legal). I've seen accounts with 6 TB of snapshots from 2021 — that's $600/month for nothing.
4. Burstable instances draining credits
db.t3 and db.t4g are great for low-traffic workloads. Disastrous if your workload bursts above the baseline for hours. CPU credit balance graph tells you instantly: if it spends time at zero, you're throttled. Move to a fixed-performance instance (m or r class) and the database often runs faster on a smaller fixed instance than a starved burstable one.
5. gp2 instead of gp3
gp2 was the default for years. gp3 is 20% cheaper at the same throughput and lets you tune IOPS and throughput independently of size. There's no reason to keep new gp2 volumes in 2026. Existing gp2 → gp3 migration is online for most engines.
6. Provisioned IOPS (io1/io2) you don't need
Teams move to Provisioned IOPS when they hit a performance issue, then never re-evaluate. If your IOPS utilisation sits below 50% of provisioned, drop it. If it sits below 30%, switch back to gp3 with explicit IOPS provisioning — usually half the cost.
7. Reserved Instances or Savings Plans not bought
After right-sizing, RDS RI/SP for 1 year all-upfront pays back in roughly 6 months for steady-state production. 30-40% off on-demand. The only RDS workloads I'd not RI: short-lived dev, pre-launch products, anything you're about to migrate. The Savings Plans vs Reserved Instances guide covers the exact decision tree for which type to buy.
Aurora-specific moves
If you're on Aurora rather than vanilla RDS:
- Serverless v2 for spiky or low-utilisation databases: scales down to 0.5 ACU; great for staging and side projects. Watch the upper bound — set it explicitly so you don't pay for a runaway query.
- I/O-Optimised is cheaper than standard if you do > 25% of monthly cost on I/O. Switch is one parameter change.
- Replicas charged per hour — drop replicas you spun up "for safety" but never query.
- Backtrack window is billed; if you don't use it, set to 0 hours.
What I check in an actual audit
- Per-instance CPU, memory, IOPS, connections (14d)
- Multi-AZ flag on every non-prod instance
- Snapshot list older than 365 days
- Storage type (gp2 vs gp3)
- Reservation coverage (Cost Explorer → Reservations)
- Aurora cluster mode and serverless config
- Top queries by total time (
pg_stat_statements) — sometimes a single bad query forces a too-big instance
Realistic numbers
On a recent client's bill (~$22k/month RDS):
- Right-sizing 18 instances: $5,200/month
- Multi-AZ off on 11 non-prod: $2,800/month
- gp2 → gp3: $900/month
- 1-year RI on the new floor: $3,400/month
Total: $12,300/month, 56% of the original. Implementation took two weeks of part-time work.
If you want me to find the same in your bill on a pay-for-savings basis, book a call.