Cloud Cost Optimisation in 30 Days: A Realistic Plan
30 days is enough to cut a typical cloud bill by 30-50% if you sequence the work correctly. Here's the day-by-day plan I run on every engagement.
By Andrii Votiakov on
Cost optimisation as an open-ended project never ends. As a 30-day sprint with a defined sequence, it usually delivers 30-50% savings without much disruption. Here's the playbook.
Quick answer
Days 1-7: visibility and quick wins (kill zombies, set retention, fix obvious waste). Days 8-14: right-size compute and storage. Days 15-21: lock in commitment discounts and migrate where it pays. Days 22-30: guardrails to keep it cheap. By day 30: 30-50% off the bill, sustainable, with monitoring in place.
Days 1-3: Get the picture
Goal: understand where the money goes.
- Pull a 90-day Cost Explorer / Billing report grouped by service, then by linked account, then by tag
- Identify the top 5 services by spend (usually compute, database, network, storage, observability)
- For each top service, identify the top 5 resources (e.g., the 5 biggest EC2 instances, 5 biggest RDS, 5 biggest S3 buckets)
- Note any "AWS Cost" / "Other" / "Tax" line items — they'll often hide things like data transfer
You should be able to fit the result on a single page. If you can't, you don't have visibility yet.
Days 4-7: Quick wins
Goal: free money. No risk, no architecture changes.
- Delete unattached EBS volumes and old snapshots
- Delete unused Elastic IPs / static IPs / load balancers
- Set retention on log groups (CloudWatch / Stackdriver / Azure Monitor)
- Stop dev/staging instances overnight and weekends (
Instance Scheduleror a 5-line Lambda) - Delete unused load balancers, idle NAT Gateways in test accounts, orphan TGW attachments
- Review Cost Anomaly Detection findings — usually there's a 2-month-old anomaly nobody actioned
This usually takes 5-10% off the monthly bill in 4 days.
Days 8-10: Right-size compute
Goal: pay for what you actually use.
- Pull 14 days of CPU + memory metrics for every compute resource
- Apply right-sizing rules (see the EC2 post)
- For containers: audit Kubernetes requests/limits, tighten where actual usage is far below request
- Roll changes one workload at a time, monitor for issues
Typical saving: another 15-25% off compute.
Days 11-14: Right-size storage and observability
- S3 / Cloud Storage / Blob lifecycle rules (see the S3 post)
- Database storage type (gp3 over gp2, smaller IOPS provisioning where overprovisioned)
- CloudWatch / Datadog cardinality cleanup
- Log volume reduction (drop DEBUG, drop health checks, sample 200s)
Saves another 10-15% on monitoring/storage tail.
Days 15-18: Migrate where it pays
These are bigger changes; only do them if confidence is high.
- Graviton / ARM migration for compatible workloads (see the Graviton post)
- VPC endpoints for S3, DynamoDB, ECR, Logs to cut NAT processing
- CloudFront / CDN in front of any high-egress workload
- Spot instances for stateless tier (with proper interruption handling)
Each of these is days, not hours, but ROI is high and lasting.
Days 19-21: Commitment discounts
Goal: lock in the savings.
- Buy 1-year Compute Savings Plans / CUDs on the new floor (after right-sizing!)
- Buy RDS / Cloud SQL / Azure SQL reservations on stable databases
- Don't 3-year-commit unless usage is predictably stable for 3 years
Saves another 25-40% on the steady-state portion that's commitment-eligible.
Days 22-25: Tagging and chargeback
- Tag every resource with
team,service,env(mandatory) - Set up a weekly cost-by-team report
- Identify the team with the biggest "untagged" line — usually 20-40% of spend
- Run an hour-long session per team showing them their costs
Doesn't directly save money on day one, but shifts behaviour for the next quarter.
Days 26-28: Guardrails
Goal: stop bills from growing back.
- Budgets with alerts at 50/80/100% per linked account
- Anomaly detection turned on
- Resource monitor (Snowflake) / per-user query limit (BigQuery)
- IAM policy restricting expensive instance types in non-prod
- Reserved/RI/SP coverage report scheduled weekly
Days 29-30: Document and hand off
- Write down what was changed, why, and the rollback path
- Create a "monthly review" checklist for ongoing maintenance
- Schedule the next review in 90 days
- Hand off the dashboards (cost-by-tag, RI/SP coverage, anomalies) to the team
What this looks like on a real bill
Recent client (~$42k/month starting):
| Phase | Saved | Cumulative |
|---|---|---|
| Days 1-7: zombies + retention | $2,400 | $2,400 |
| Days 8-14: right-sizing | $7,800 | $10,200 |
| Days 15-18: Graviton + VPC endpoints | $4,200 | $14,400 |
| Days 19-21: 1yr SPs | $5,100 | $19,500 |
| Days 22-30: tagging + guardrails | $0 (preventive) | $19,500 |
Final: $22,500/month, ~46% reduction in 30 calendar days.
What goes wrong
Common pitfalls:
- Trying to do everything in week 1. The sequence matters — don't buy commitment discounts before right-sizing.
- No tagging before chargeback. You can't charge back what you can't measure.
- Skipping guardrails. Without budgets and anomaly detection, you'll be back here in 6 months.
- Trying to right-size everything yourself. Use Compute Optimizer / Recommender first, fight the recommendations, but don't reinvent the math.
- Forgetting Datadog / observability. It's often 15-25% of total cloud spend and gets ignored.
If you'd like me to run this 30-day plan on your account on a pay-for-savings basis, book a call. 30-minute conversation, no pitch.