RDS Cost Optimisation: Where Database Bills Explode
RDS is usually the second-largest line item on an AWS bill and the one teams touch the least. Here's where the savings actually live.
By Andrii Votiakov on · More posts
RDS is the line item engineers fear to touch. Wrong move, prod down, weekend ruined. So it sits, over-provisioned, year after year. But the savings are real and the operations are safer than they look. Migrating RDS instances to Graviton adds another 20% saving on top of right-sizing — it's one of the lowest-effort wins in the database tier. And if you're running a self-managed Postgres setup alongside RDS, the Postgres cost optimisation guide covers the storage and query-tuning side.
Quick answer
Most RDS bills carry 30-50% waste from over-sized instances, unnecessary Multi-AZ on non-prod, unattached snapshots, and missed Reserved Instance / Savings Plan opportunities. A thorough RDS audit typically cuts the database line item by 30-45% without sacrificing reliability.
The biggest waste sources, ranked
1. Over-sized instances (30-50% saving)
Same logic as EC2: pull 14 days of CPU, memory, IOPS and connection count. A db.r6i.4xlarge running at 12% CPU and 30% memory is a db.r6i.xlarge waiting to happen.
Postgres-specific tells:
pg_stat_activityshows < 30 active connections and you have 1,000 max — you're paying for memory you don't use- Buffer cache hit ratio > 99.5% with 64 GB RAM — you'd be fine with 16 GB
- IOPS provisioned at 30k, actual usage 2k — drop to gp3 with sane baselines
MySQL-specific tells:
innodb_buffer_pool_sizeat 75% of huge RAM, hit rate 99.9% — your dataset fits in much less
2. Multi-AZ on dev and staging ($200-2k/month each)
Multi-AZ doubles your instance cost. It exists for production resilience. There's no good reason to have it on a dev database that gets restored from snapshot if it dies.
Audit: filter RDS by Environment tag (or instance name pattern) and turn off Multi-AZ on anything not prod. Saves an instance hour for every instance hour you remove.
3. Old snapshots ($100-2k/month)
Manual snapshots accumulate forever unless cleaned up. AWS doesn't tell you. Run:
aws rds describe-db-snapshots \
--snapshot-type manual \
--query 'DBSnapshots[?SnapshotCreateTime<`2025-01-01`].[DBSnapshotIdentifier,SnapshotCreateTime,AllocatedStorage]' \
--output table
Anything older than your real RPO needs to go (after a quick check with ops/legal). I've seen accounts with 6 TB of snapshots from 2021 — that's $600/month for nothing.
4. Burstable instances draining credits
db.t3 and db.t4g are great for low-traffic workloads. Disastrous if your workload bursts above the baseline for hours. CPU credit balance graph tells you instantly: if it spends time at zero, you're throttled. Move to a fixed-performance instance (m or r class) and the database often runs faster on a smaller fixed instance than a starved burstable one.
5. gp2 instead of gp3
gp2 was the default for years. gp3 is 20% cheaper at the same throughput and lets you tune IOPS and throughput independently of size. There's no reason to keep new gp2 volumes in 2026. Existing gp2 → gp3 migration is online for most engines.
6. Provisioned IOPS (io1/io2) you don't need
Teams move to Provisioned IOPS when they hit a performance issue, then never re-evaluate. If your IOPS utilisation sits below 50% of provisioned, drop it. If it sits below 30%, switch back to gp3 with explicit IOPS provisioning — usually half the cost.
7. Reserved Instances or Savings Plans not bought
After right-sizing, RDS RI/SP for 1 year all-upfront pays back in roughly 6 months for steady-state production. 30-40% off on-demand. The only RDS workloads I'd not RI: short-lived dev, pre-launch products, anything you're about to migrate. The Savings Plans vs Reserved Instances guide covers the exact decision tree for which type to buy.
Aurora-specific moves
If you're on Aurora rather than vanilla RDS:
- Serverless v2 for spiky or low-utilisation databases: scales down to 0.5 ACU; great for staging and side projects. Watch the upper bound — set it explicitly so you don't pay for a runaway query.
- I/O-Optimised is cheaper than standard if you do > 25% of monthly cost on I/O. Switch is one parameter change.
- Replicas charged per hour — drop replicas you spun up "for safety" but never query.
- Backtrack window is billed; if you don't use it, set to 0 hours.
What I check in an actual audit
- Per-instance CPU, memory, IOPS, connections (14d)
- Multi-AZ flag on every non-prod instance
- Snapshot list older than 365 days
- Storage type (gp2 vs gp3)
- Reservation coverage (Cost Explorer → Reservations)
- Aurora cluster mode and serverless config
- Top queries by total time (
pg_stat_statements) — sometimes a single bad query forces a too-big instance
Realistic numbers
On a recent client's bill (~$22k/month RDS):
- Right-sizing 18 instances: $5,200/month
- Multi-AZ off on 11 non-prod: $2,800/month
- gp2 → gp3: $900/month
- 1-year RI on the new floor: $3,400/month
Total: $12,300/month, 56% of the original. Implementation took two weeks of part-time work.
If you want me to find the same in your bill on a pay-for-savings basis, book a call.