- One service emitting a `user_id` tag on a counter metric → 4M distinct time series → $3-8k/month - DEBUG logging in prod on the busy service → $1-4k/month in ingestion - 30-day retention on logs nobody queries past 7 days → $1-2k/month - 100% APM sampling on a high-QPS API → $2-5k/month in indexed spans - Dev environment paying same per-host as prod → $500-2k/month

Datadog Cost Optimisation: Cardinality, Logs, and Custom Metrics

Q: Where the money actually goes?

Common bill breakdown (typical mid-size SaaS): | Line | % of total | |---|---| | Infrastructure | 15-25% | | APM | 15-25% | | Logs (ingestion + retention) | 25-40% |

Datadog bills can rival the cloud bill they're supposed to monitor. Here's where the spend actually goes and how to cut it 50-70% without losing visibility.

By Andrii Votiakov on 2026-04-04

Datadog is the observability stack that ate the world. It's also the one most likely to be your second-biggest cloud bill. After a hundred audits, the pattern is depressingly consistent: nobody understands the pricing, the team enables features by default, and the bill compounds quietly.

Quick answer

Datadog charges separately for infrastructure hosts, APM hosts, log ingestion ($0.10/GB) and log retention, custom metrics (every unique tag combination), synthetic tests, and RUM. Cardinality on custom metrics and log volume are usually 70%+ of the bill. Cut both and the rest is small.

Where the money actually goes

Common bill breakdown (typical mid-size SaaS):

Line	% of total
Infrastructure	15-25%
APM	15-25%
Logs (ingestion + retention)	25-40%
Custom Metrics (cardinality)	15-30%
RUM, Synthetic, CI Visibility	5-15%

Custom metrics + logs is where 60-70% of all savings live. Start there.

Custom metrics: the cardinality trap

Datadog charges per unique time series — every distinct combination of metric name + tag values. A metric with 50 hosts × 100 endpoints × 5 status codes = 25,000 time series.

Easy ways to blow up cardinality:

Tagging metrics with request_id, user_id, trace_id (unique per request → unbounded cardinality)
Tagging with high-cardinality dimensions like path on a service that exposes 10,000 unique paths
Histogram metrics emitting per-pod, per-status, per-route, per-method tags — multiplies fast
Per-pod metrics in Kubernetes with hundreds of pods

How to find offenders:

Datadog → Infrastructure → Metrics Summary → Sort by "Distinct Metrics"

Anything emitting > 10,000 distinct time series for a single metric is suspect. Drop high-cardinality tags or remove the metric.

Logs: ingestion + retention + indexing

Three separately priced log lines:

Ingestion: $0.10/GB
Indexing (15 days hot search): $1.70/million events
Retention (rehydration): tiered

The savings:

Drop noise before ingestion

Use Datadog's Log Pipelines + Exclusion Filters. Drop:

Health check probes (/health, /ready)
Successful 200 access logs (sample to 5-10%)
DEBUG-level lines in production
Kubernetes audit log noise

This is usually 30-50% of ingestion gone.

Use Logging without Limits + Indexes only on important streams

Indexing is what makes logs searchable in real time. You don't need every log indexed for 15 days. Configure indexes by service:

Production application logs: 15 days indexed
Internal services: 7 days
Batch jobs: 3 days
Cold/security logs: 0 days indexed, just stored for rehydration

Forward cold logs to S3 (Live Tail not enabled)

If you need long-term retention for audit, ship to S3 from your collector instead of paying Datadog's retention tier. Standard log forwarder pattern with Vector or Fluent Bit. Cuts cost by 80%+ for cold logs.

APM: traces, ingestion, and indexing

APM is billed by ingested host-hours plus indexed spans. Sample aggressively:

Default head-based sampling: keep 100% of error and slow traces, sample healthy ones at 1-5%
Use Datadog's dynamic sampling where the agent decides based on rate (rather than client deciding randomly)
Drop high-volume internal-only spans (e.g., DB-driver internal spans)
Watch for span count multiplication on async/messaging architectures (1 user request → 50 spans is normal but expensive at high QPS)

Infrastructure hosts

Datadog charges per host-hour. Two common wastes:

Dev and staging on full pricing. Use the Pro plan SKU on lower-priority environments only if your team needs it. Otherwise drop monitoring entirely on ephemeral preview environments.
Pause monitoring on weekends/nights for dev clusters that scale to zero — agent should also stop reporting (kube-downscaler handles this).

RUM, Synthetics, CI

These are smaller line items, but quick checks:

RUM: per-session pricing. Disable on internal admin pages and bot traffic.
Synthetics: per-test-run pricing. Don't run a 1-minute test for a metric you check daily.
CI Visibility: per-test pricing. Enable selectively on important pipelines, not every PR build.

The audit checklist I run

Pull the last 30 days from Usage → Cost & Usage, group by product
Metrics → Sort by Distinct Metrics — find the cardinality offenders
Logs → Indexes — look at index size and retention; right-size each
Logs → Pipelines — find the high-volume sources, add exclusion filters
APM → Sampling Rules — confirm aggressive sampling is in place
Infrastructure → Hosts — check non-prod hosts; consider tier downgrade
Synthetics → Tests — kill anything you don't actually look at

What I usually find

One service emitting a user_id tag on a counter metric → 4M distinct time series → $3-8k/month
DEBUG logging in prod on the busy service → $1-4k/month in ingestion
30-day retention on logs nobody queries past 7 days → $1-2k/month
100% APM sampling on a high-QPS API → $2-5k/month in indexed spans
Dev environment paying same per-host as prod → $500-2k/month

Realistic numbers

Recent SaaS client (~$28k/month Datadog):

Custom metric cardinality cleanup: $5,400/month
Log exclusion + retention by index: $6,200/month
APM sampling 100% → 5% on healthy traces: $2,800/month
Dropped Synthetic tests not in use: $600/month
Dev environment downgraded: $1,100/month

Final: $11,900/month, ~58% reduction.

If you decide Datadog still isn't worth what's left, the alternative is self-hosted — see the Datadog replacement post.

Want me to audit your Datadog usage on a pay-for-savings basis? Book a call.