CloudWatch Cost Optimisation: Logs, Metrics, Surprises

CloudWatch is the line item that surprises every engineering team. Here's where it actually goes and how to cut it 60-80% without losing visibility.

By Andrii Votiakov on 2026-03-15

CloudWatch surprises engineering teams in the same way every quarter: a new service ships with default logging, the bill jumps a few thousand dollars, and nobody notices for two months. Then it gets investigated and turns out 90% of the spend is on logs nobody reads. If you're considering replacing CloudWatch for metrics and logs altogether, see replacing Datadog with a cheaper observability stack — the self-hosted alternative costs a fraction for teams already spending $20k+/month on Datadog.

Quick answer

CloudWatch is billed for ingestion ($0.50/GB), storage ($0.03/GB-mo), and queries. The single biggest line is almost always Logs ingestion. Set retention, drop debug-level noise, and use Log subscriptions to S3 + Athena for cold queries instead of paying CloudWatch storage.

What you're actually paying for

Five chargeable categories:

  1. Logs ingestion: $0.50/GB ingested
  2. Logs storage: $0.03/GB-month after ingestion
  3. Metrics: $0.30 per custom metric/month, plus API call costs
  4. Logs Insights queries: $0.005 per GB scanned
  5. Alarms: $0.10 per alarm/month (small but adds up at scale)

For most teams, Logs ingestion + storage is 70-90% of the CloudWatch bill.

The fixes that actually move the needle

1. Set retention on every log group

Default retention is "Never expire". A team I worked with had 14 TB of accumulated logs going back 4 years — nobody had ever queried any of it. Cleanup saved $4,200/month immediately.

# Audit log groups with no retention set
aws logs describe-log-groups \
  --query 'logGroups[?!retentionInDays].[logGroupName,storedBytes]' \
  --output table

Recommended retention by type:

Log type Retention
ALB/NLB access logs 30 days
Application logs 14-30 days
CloudTrail 90 days (or send to S3 + Glacier)
VPC Flow Logs 14 days (or S3 only)
Lambda logs 14 days
Compliance/audit per legal — usually S3, not CloudWatch

2. Stop logging at DEBUG level in production

The single biggest ingestion cut. A typical Node app at INFO might emit 1 KB/request. At DEBUG it's 10-30 KB/request. At a million requests a day that's the difference between $15/month and $450/month — per service.

Turn it off. If you need debug for a specific incident, flip it on temporarily.

3. Filter what gets shipped, not what gets logged

Use a CloudWatch agent or Fluent Bit/Vector with filtering. Examples:

  • Drop health-check log lines (/health, /ready) before they hit CloudWatch
  • Drop Kubernetes liveness/readiness probe logs
  • Sample HTTP 200 access logs to 10%, keep 100% of 4xx and 5xx

This is usually a 30-50% ingestion cut on web tiers alone.

4. Send cold logs to S3, query with Athena

CloudWatch Logs storage is $0.03/GB-month. S3 Standard is $0.023/GB-month. S3 Standard-IA is $0.0125. Glacier Instant is $0.004.

For logs you query rarely (security audits, compliance), the move is:

  • 14 days in CloudWatch (hot, instant)
  • After 14 days, subscription filter ships to S3 with lifecycle to IA → Glacier
  • Query from Athena when needed (~$5 per TB scanned, infrequent)

This is the difference between paying $360/month for a TB of cold logs in CloudWatch vs $4/month in Glacier Instant.

5. Custom metrics: kill the cardinality

Each unique combination of metric name + dimensions = one custom metric = $0.30/month. Pay attention to:

  • High-cardinality dimensions (user_id, request_id) — these are bill bombs. Use logs and Logs Insights instead.
  • Per-pod metrics in Kubernetes when you have hundreds of pods — aggregate first.
  • Container Insights with default cardinality — beware. Tunable via the agent config.

A single team I audited had 47,000 custom metrics, 90% of which were never queried. $14k/month in custom metrics alone.

6. Alarms on absent data

A common pattern: alarms on services that have been deleted. Each alarm is $0.10/month. Tiny individually, but I've seen accounts with 4,000+ orphan alarms = $400/month for nothing.

aws cloudwatch describe-alarms \
  --query 'MetricAlarms[?StateValue==`INSUFFICIENT_DATA`].[AlarmName]'

Anything in INSUFFICIENT_DATA for over 30 days is probably dead.

7. Logs Insights queries — scan less

Each query costs $0.005/GB scanned. If you're running ad-hoc queries across full retention every day, that's a real number. Two practical moves:

  • Always pin a time range. The default is 1 hour for a reason.
  • Filter early, parse late. filter @logStream like /api/ before parse @message — Logs Insights respects ordering.

Common surprises

  • Lambda functions logging entire event payloads — easy 5-10x ingestion increase. Strip before logging.
  • CloudTrail data events for every S3 object access — hundreds of GB/day on busy accounts. Targeted, not blanket.
  • Container Insights enabled cluster-wide without tuning — instant 5-figure addition to the bill on a big EKS cluster.
  • Forgotten cross-account log replication — log shipping to a security account that was set up once and never reviewed.

What I check on a real audit

  • Log groups without retention set (retentionInDays = null)
  • Top 10 log groups by ingestion (Insights gives you this)
  • Custom metric count and creator (CloudWatch Metrics → Metric Streams)
  • Alarms in INSUFFICIENT_DATA for 30+ days
  • VPC Flow Logs going to CloudWatch (should be S3)
  • CloudTrail data events scope

Realistic numbers

Recent client (~$8.5k/month CloudWatch):

  • Setting retention everywhere: $2,200/month
  • DEBUG → INFO across 6 services: $1,400/month
  • VPC Flow Logs to S3: $650/month
  • Custom metrics audit (deleted 18k unused): $1,800/month
  • Orphan alarm cleanup: $120/month

Final: $2,330/month, 73% reduction.


If your CloudWatch bill has a mind of its own, book a call. Usually find half the savings within the first hour.