7 Patterns of Cloud Waste I See in Every Audit

After 100+ audits, the same waste shows up everywhere. Here's the pattern catalogue — what to look for, where it hides, and how much it usually costs.

By Andrii Votiakov on 2026-04-22

Cloud architectures vary. Cloud waste is depressingly consistent. The same seven patterns show up on AWS, GCP, and Azure in different vocabulary but the same shape. If you can spot these in your own bill, you can usually self-recover 30-50% before calling anyone in.

Quick answer

The seven patterns I see on every audit: forgotten experiments (idle resources from one-off tests), production-tier non-prod (Multi-AZ and full monitoring on dev), data-transfer black holes (cross-AZ chatter and NAT processing), over-provisioned compute (instances sized at launch and never revisited), storage accumulation (snapshots and logs with no expiry), observability cardinality bloat (high-cardinality metrics and debug logs), and no commitment discounts (on-demand pricing on steady-state workloads). Together, these typically account for 40-60% of the bill.

1. The forgotten experiment

Pattern: An engineer spun up a Spark cluster / GPU instance / managed service for a one-off experiment. They got what they needed and moved on. The resource is still running.

Where it hides:

  • Idle EMR/Dataproc clusters
  • Stopped-but-not-terminated instances (still billed for storage)
  • Old SageMaker notebooks
  • Dataproc clusters created via UI without auto-delete
  • Firebase / Vercel projects with paid tiers nobody uses

Cost: $200-5,000/month per forgotten experiment. Multiply by company size.

Find it: Sort all resources by creation_date descending and uptime. Anything over 90 days old that hasn't been touched and has no tag owner needs a reason to exist.

2. Production-tier setup on dev/staging

Pattern: Multi-AZ RDS, full-tier monitoring agents, large instances, full retention. On dev and staging.

Where it hides:

  • RDS Multi-AZ on staging
  • Full-tier Datadog monitoring on dev hosts
  • Large CloudWatch retention on dev log groups
  • Read replicas attached to non-prod databases
  • 24/7 always-on dev environments

Cost: 30-50% of non-prod spend. Often 10-15% of total cloud spend.

Find it: Filter by Environment tag (or instance name pattern). Apply the question: "If this dies, would I notice in the next 4 hours?" If no, you're over-spending.

3. The data-transfer black hole

Pattern: Microservices chatting cross-AZ or cross-region without anyone noticing. NAT Gateway processing a small ocean. CDN bypassed in favour of direct egress.

Where it hides:

  • Lambda calling RDS in another AZ
  • EKS pods pulling images cross-AZ from public ECR
  • Service A calling Service B 10K times/sec at 2KB each across the AZ boundary
  • Cloud Storage in us-central1 read by Compute Engine in europe-west1

Cost: $1-50k/month depending on scale. Routinely 5-15% of bill.

Find it: Cost Explorer → Usage Type → filter for DataTransfer*. Cloud Logging → BillingExportSourceMetric. VPC Flow Logs analysed by Athena/BigQuery.

4. Over-provisioned compute

Pattern: Instances sized at launch under uncertainty, never revisited. Average CPU below 20%, memory below 50%.

Where it hides:

  • EC2 fleet with 200 m5.xlarge running at 8% CPU
  • RDS db.r6i.4xlarge with buffer pool barely warm
  • ECS tasks with 4 vCPU requested, using 0.5
  • Kubernetes deployments with 2GB memory requested, using 200MB
  • Compute Engine n2-standard-32 doing the work of an n2-standard-4

Cost: 30-60% of compute spend.

Find it: 14-day metrics analysis. Tools: Compute Optimizer (AWS), Recommender (GCP), Azure Advisor, kubectl top + Goldilocks/KRR (Kubernetes).

5. Storage forever

Pattern: Snapshots, backups, logs, old objects with no expiry policy. Accumulating quietly.

Where it hides:

  • Manual RDS snapshots from 2021
  • Untagged S3 buckets with versioning on, no lifecycle
  • CloudWatch logs with "Never expire"
  • Failed multipart uploads still billed
  • Old AMIs / images / container registry tags
  • Redis snapshots in ElastiCache backups

Cost: $200-10,000/month depending on age and size.

Find it: AWS Storage Lens, GCP Storage Insights, Azure Storage Explorer + Lifecycle. List the largest buckets/volumes, check what's older than 90 days.

6. Observability cardinality bloat

Pattern: Custom metrics tagged with high-cardinality dimensions (user_id, request_id, path), exploding the unique time-series count. Logs at DEBUG level in production.

Where it hides:

  • Datadog Distinct Metrics over 100k for a single metric
  • CloudWatch custom metrics paid per active series
  • Prometheus up{...} with too many labels
  • Console logging full request/response payloads

Cost: 30-70% of observability bill. Often 5-15% of total cloud spend.

Find it: Datadog → Usage → Top metrics by distinct count. CloudWatch → metric streams → top emitters. App-level: any log line over 5KB that ships to centralised logging.

7. Commitment-discount black hole

Pattern: No Savings Plans, no Reserved Instances, no Committed Use Discounts. Pure on-demand for the entire steady-state baseline.

Where it hides:

  • Companies that grew fast and never had time to set up commitment discounts
  • Teams that bought a 3-year SP at the wrong shape and now overspend on what's not covered
  • RDS instances running 24/7 forever without a single RI
  • Serverless workloads at high volume with no Compute Savings Plan covering Lambda

Cost: 25-40% of compute and managed services spend, foregone.

Find it: Cost Explorer → Reservations → Coverage. Check the percentage of usage covered by reservations or savings plans. Anything below 60% on stable workloads is leaving money on the table.

How to use this list

Pick the three that look loudest in your environment. Spend a day on each. You'll usually surface 80% of the savings without needing fancy tools. If you want a sequenced action plan once you've identified the waste, the 30-day cloud cost optimisation plan breaks it into daily work. For compute specifically, the EC2 right-sizing 14-day method gives a step-by-step procedure. And once you've right-sized, the Savings Plans vs Reserved Instances guide tells you what to commit to and when.

If you find yourself unsure or short on time, that's exactly when an outside audit pays for itself — fresh eyes catch what familiar ones miss.

Real engagements

The patterns aren't theoretical. Recent finds across audits:

  • Forgotten experiment: $4,200/month EMR cluster running 14 months
  • Prod-tier non-prod: $9,800/month from over-monitored dev environments
  • Data-transfer: $11,400/month from one chatty service pair
  • Compute over-provisioning: $14,200/month from a 220-instance fleet
  • Storage forever: $2,800/month in old RDS snapshots
  • Observability cardinality: $7,600/month from one metric
  • Commitment gap: $18,000/month foregone from no Savings Plans

That's a range, not a single client. But every one of those numbers is real.


Want me to find the same in your account on a pay-for-savings basis? Book a call.