7 Patterns of Cloud Waste I See in Every Audit
After 100+ audits, the same waste shows up everywhere. Here's the pattern catalogue — what to look for, where it hides, and how much it usually costs.
By Andrii Votiakov on
Cloud architectures vary. Cloud waste is depressingly consistent. The same seven patterns show up on AWS, GCP, and Azure in different vocabulary but the same shape. If you can spot these in your own bill, you can usually self-recover 30-50% before calling anyone in.
Quick answer
The seven patterns I see on every audit: forgotten experiments (idle resources from one-off tests), production-tier non-prod (Multi-AZ and full monitoring on dev), data-transfer black holes (cross-AZ chatter and NAT processing), over-provisioned compute (instances sized at launch and never revisited), storage accumulation (snapshots and logs with no expiry), observability cardinality bloat (high-cardinality metrics and debug logs), and no commitment discounts (on-demand pricing on steady-state workloads). Together, these typically account for 40-60% of the bill.
1. The forgotten experiment
Pattern: An engineer spun up a Spark cluster / GPU instance / managed service for a one-off experiment. They got what they needed and moved on. The resource is still running.
Where it hides:
- Idle EMR/Dataproc clusters
- Stopped-but-not-terminated instances (still billed for storage)
- Old SageMaker notebooks
- Dataproc clusters created via UI without auto-delete
- Firebase / Vercel projects with paid tiers nobody uses
Cost: $200-5,000/month per forgotten experiment. Multiply by company size.
Find it: Sort all resources by creation_date descending and uptime. Anything over 90 days old that hasn't been touched and has no tag owner needs a reason to exist.
2. Production-tier setup on dev/staging
Pattern: Multi-AZ RDS, full-tier monitoring agents, large instances, full retention. On dev and staging.
Where it hides:
- RDS Multi-AZ on staging
- Full-tier Datadog monitoring on dev hosts
- Large CloudWatch retention on dev log groups
- Read replicas attached to non-prod databases
- 24/7 always-on dev environments
Cost: 30-50% of non-prod spend. Often 10-15% of total cloud spend.
Find it: Filter by Environment tag (or instance name pattern). Apply the question: "If this dies, would I notice in the next 4 hours?" If no, you're over-spending.
3. The data-transfer black hole
Pattern: Microservices chatting cross-AZ or cross-region without anyone noticing. NAT Gateway processing a small ocean. CDN bypassed in favour of direct egress.
Where it hides:
- Lambda calling RDS in another AZ
- EKS pods pulling images cross-AZ from public ECR
- Service A calling Service B 10K times/sec at 2KB each across the AZ boundary
- Cloud Storage in
us-central1read by Compute Engine ineurope-west1
Cost: $1-50k/month depending on scale. Routinely 5-15% of bill.
Find it: Cost Explorer → Usage Type → filter for DataTransfer*. Cloud Logging → BillingExportSourceMetric. VPC Flow Logs analysed by Athena/BigQuery.
4. Over-provisioned compute
Pattern: Instances sized at launch under uncertainty, never revisited. Average CPU below 20%, memory below 50%.
Where it hides:
- EC2 fleet with 200
m5.xlargerunning at 8% CPU - RDS
db.r6i.4xlargewith buffer pool barely warm - ECS tasks with 4 vCPU requested, using 0.5
- Kubernetes deployments with 2GB memory requested, using 200MB
- Compute Engine
n2-standard-32doing the work of ann2-standard-4
Cost: 30-60% of compute spend.
Find it: 14-day metrics analysis. Tools: Compute Optimizer (AWS), Recommender (GCP), Azure Advisor, kubectl top + Goldilocks/KRR (Kubernetes).
5. Storage forever
Pattern: Snapshots, backups, logs, old objects with no expiry policy. Accumulating quietly.
Where it hides:
- Manual RDS snapshots from 2021
- Untagged S3 buckets with versioning on, no lifecycle
- CloudWatch logs with "Never expire"
- Failed multipart uploads still billed
- Old AMIs / images / container registry tags
- Redis snapshots in ElastiCache backups
Cost: $200-10,000/month depending on age and size.
Find it: AWS Storage Lens, GCP Storage Insights, Azure Storage Explorer + Lifecycle. List the largest buckets/volumes, check what's older than 90 days.
6. Observability cardinality bloat
Pattern: Custom metrics tagged with high-cardinality dimensions (user_id, request_id, path), exploding the unique time-series count. Logs at DEBUG level in production.
Where it hides:
- Datadog Distinct Metrics over 100k for a single metric
- CloudWatch custom metrics paid per active series
- Prometheus
up{...}with too many labels - Console logging full request/response payloads
Cost: 30-70% of observability bill. Often 5-15% of total cloud spend.
Find it: Datadog → Usage → Top metrics by distinct count. CloudWatch → metric streams → top emitters. App-level: any log line over 5KB that ships to centralised logging.
7. Commitment-discount black hole
Pattern: No Savings Plans, no Reserved Instances, no Committed Use Discounts. Pure on-demand for the entire steady-state baseline.
Where it hides:
- Companies that grew fast and never had time to set up commitment discounts
- Teams that bought a 3-year SP at the wrong shape and now overspend on what's not covered
- RDS instances running 24/7 forever without a single RI
- Serverless workloads at high volume with no Compute Savings Plan covering Lambda
Cost: 25-40% of compute and managed services spend, foregone.
Find it: Cost Explorer → Reservations → Coverage. Check the percentage of usage covered by reservations or savings plans. Anything below 60% on stable workloads is leaving money on the table.
How to use this list
Pick the three that look loudest in your environment. Spend a day on each. You'll usually surface 80% of the savings without needing fancy tools. If you want a sequenced action plan once you've identified the waste, the 30-day cloud cost optimisation plan breaks it into daily work. For compute specifically, the EC2 right-sizing 14-day method gives a step-by-step procedure. And once you've right-sized, the Savings Plans vs Reserved Instances guide tells you what to commit to and when.
If you find yourself unsure or short on time, that's exactly when an outside audit pays for itself — fresh eyes catch what familiar ones miss.
Real engagements
The patterns aren't theoretical. Recent finds across audits:
- Forgotten experiment: $4,200/month EMR cluster running 14 months
- Prod-tier non-prod: $9,800/month from over-monitored dev environments
- Data-transfer: $11,400/month from one chatty service pair
- Compute over-provisioning: $14,200/month from a 220-instance fleet
- Storage forever: $2,800/month in old RDS snapshots
- Observability cardinality: $7,600/month from one metric
- Commitment gap: $18,000/month foregone from no Savings Plans
That's a range, not a single client. But every one of those numbers is real.
Want me to find the same in your account on a pay-for-savings basis? Book a call.