Azure Cost Optimisation Playbook: Where the Money Actually Goes

Azure bills hide in oversized VMs, SQL thresholds, hot-tier storage, and Monitor ingestion. Here's where the spend actually goes and how to cut 30-60%.

By Andrii Votiakov on 2026-01-06

Azure is the second-largest cloud by revenue, but it punishes inattention differently from AWS. The pricing model is more opaque in places, the right-sizing tools are less obvious, and the discount paths (Reservations, Savings Plans, Hybrid Benefit) interact in ways that aren't well documented. I've reviewed dozens of Azure bills. The same waste patterns show up every time.

Quick answer

The biggest Azure waste categories are: oversized VMs with no Savings Plan or Reservation, Azure SQL running at full DTUs when serverless or a lower tier would do, storage at hot tier when most blobs are never accessed, and Azure Monitor / Log Analytics ingesting everything without a retention or sampling strategy. Fix those four and you'll typically cut 30-60% from the bill.

Where the money actually goes

Before optimising anything, pull the Azure Cost Management breakdown by service. This is what a typical mid-size workload looks like:

Service % of total bill
Virtual Machines 35-50%
Azure SQL / Managed Instance 15-25%
Storage (Blob, Disk, Files) 10-20%
Azure Monitor / Log Analytics 5-15%
Networking (Bandwidth, VPN, Firewall) 5-10%
Other (App Service, AKS, Functions) 5-15%

Compute is still where the largest single saving lives, but Log Analytics is the one that sneaks up on you.

Virtual Machines: right-sizing and the Hybrid Benefit

VMs are the biggest line item and the easiest place to start. Most Azure VMs I review are running under 20% average CPU. The first step is the same as on AWS: pull 14 days of CPU and memory metrics from Azure Monitor, then drop one or two sizes for anything under 20% average utilisation.

But Azure has two levers AWS doesn't. First, the Azure Hybrid Benefit lets you bring your own Windows Server or SQL Server licences to Azure, which cuts the VM price by 40-49% on Windows workloads. If your company has Software Assurance licences sitting in an EA agreement, activating Hybrid Benefit is often the fastest win on any Azure audit — it takes minutes and requires no architecture change.

Second, Spot VMs for non-critical workloads (batch jobs, dev/test, CI runners) run at 60-90% discount versus on-demand. Azure Spot has a slightly different eviction model from AWS Spot Instances — you get a 30-second eviction notice rather than 2 minutes — so it's less suitable for stateful workloads, but fine for short-lived tasks.

B-series: the most under-used VM family

The B-series (burstable) VMs are significantly cheaper than D-series equivalents and perfect for workloads that spike briefly and idle most of the time — dev environments, small APIs, admin backends. I regularly find teams running D4s_v3 where a B4ms would handle the same load at 40% less cost.

Azure SQL: the DTU and vCore trap

Azure SQL is where I find the most consistent over-provisioning. The legacy DTU model is especially opaque — most teams pick a tier during initial setup, load tests pass, and nobody revisits it for years.

What to check:

  • DTU usage: In the portal, look at DTU percentage over the last 30 days. Under 30% average? You're on the wrong tier.
  • Serverless tier: For dev databases and infrequently used production databases, the Serverless compute tier auto-pauses when idle and charges only for actual vCore-seconds. A dev database paused 16 hours a day drops its compute cost by 65-70%.
  • Elastic Pools: If you run many small databases (common in multi-tenant apps), an Elastic Pool shares capacity across them. Usually 30-50% cheaper than sizing each database individually.
  • Managed Instance vs SQL Database: Managed Instance is expensive. If you migrated from on-prem to get compatibility but aren't using most MI-specific features, a migration to Azure SQL Database (standard tier) can halve the cost.

Azure SQL Reservations are available for vCore-based databases and reduce cost by around 33% for 1-year and 48% for 3-year terms. Don't buy these before right-sizing — buy small and add, not the reverse.

Storage: hot vs cool vs cold vs archive

Azure Blob Storage has four tiers. Almost every account I look at is 80%+ on hot tier.

Tier Storage cost Access cost Use case
Hot ~$0.018/GB/month Low Actively read daily
Cool ~$0.01/GB/month Medium Access < once/month
Cold ~$0.0045/GB/month Higher Access < once/quarter
Archive ~$0.00099/GB/month High + rehydration Compliance/retention

The fix is a Lifecycle Management policy in the Storage account. Set blobs to move to Cool after 30 days of no access, Cold after 90 days, Archive after 365 days. This is a 5-minute configuration change that typically saves 40-70% on storage costs for accounts holding logs, exports, backups, or user-uploaded content.

Managed Disks are a separate waste category. Check for:

  • Unattached disks (common after VM deletion — not automatically deleted)
  • Premium SSD where Standard SSD would do (often on dev VMs)
  • Oversized disk allocations (128 GB disk for an OS using 20 GB pays for the full 128 GB)

Azure Monitor and Log Analytics: the silent cost driver

This is the one that surprises people. Log Analytics charges by ingestion volume ($2.76/GB at pay-as-you-go in most regions) plus a separate retention charge after the free 31 days. At scale, this is serious money.

What I find on audits:

  • Verbose diagnostic logs enabled on everything: Every resource in Azure can stream diagnostics to a Log Analytics workspace. Most teams enable it during debugging and never turn it off.
  • AKS / container logs at 100% verbosity: A busy cluster emitting INFO-level container stdout to Log Analytics is a fast way to spend $5-15k/month on logs.
  • No workspace data cap: Azure Monitor lets you set a daily ingestion cap per workspace. Most workspaces have no cap. A misconfigured logging library or a runaway debug flag can spike your bill 10x in 48 hours.

Fixes:

  1. Set a daily cap on every Log Analytics workspace immediately
  2. Disable diagnostic settings for resources you don't actually query
  3. Reduce AKS log verbosity — route container stdout/stderr to a cheaper log store (Loki on a small VM, or just ship to cold storage) and keep only application-level structured logs in Log Analytics
  4. Set retention to 30 days for most workspaces; 90 days only where compliance requires it

The Cloudwatch cost optimisation post covers the AWS equivalent, and the same patterns apply here.

Networking: bandwidth and forgotten gateways

Azure egress pricing is competitive with AWS but still adds up. Cross-region traffic between Azure regions is billed. VNet Peering across regions is billed on both sides. ExpressRoute and VPN Gateway have fixed hourly charges regardless of utilisation.

Common waste:

  • Idle VPN Gateways: $140-560/month depending on SKU. Check if anyone actually uses the VPN before the next renewal.
  • Unused Application Gateway or Azure Firewall: Fixed hourly + processing fees. If it's not handling traffic, delete it.
  • Data transfer for logs and telemetry: Shipping from Azure to an external SIEM or observability tool crosses the egress boundary. Route through a hub or compress before export.

Reservations and Savings Plans

Azure has two commitment-based discount paths, similar to AWS:

  • Reserved VM Instances (RVIs): 1 or 3 year, specific VM family and region. 33-72% off on-demand.
  • Azure Savings Plans for Compute: More flexible — applies across VM series, regions, and even Azure Functions. 15-40% off on-demand.

The right choice depends on how predictable your VM mix is. See the dedicated Azure Reservations vs Savings Plans post for a full decision framework.

Do not buy Reservations before right-sizing. The typical mistake is reserving current oversized VMs, then right-sizing later — and the reservation no longer fits the new instance size.

The audit checklist I run on Azure

  1. Cost Management + Billing → break down by service and resource group, last 30 days
  2. Advisor → pull all cost recommendations; don't act blindly, but use as a triage list
  3. VMs → CPU/memory utilisation last 14 days; right-size or shut down idle
  4. Hybrid Benefit → check which VMs/SQL instances have it enabled; activate where missing
  5. Azure SQL → DTU/vCore utilisation; consider Serverless or Elastic Pool
  6. Storage accounts → enable Lifecycle Management policies on all accounts > 50 GB
  7. Log Analytics workspaces → set daily caps; cut diagnostic sources; reduce retention
  8. Reservations → check coverage; buy after right-sizing

Realistic numbers

Recent client, a 40-person SaaS running a mid-size Azure workload (~$18,000/month):

Action Monthly saving
Right-sized 12 oversized VMs (D-series → B-series or smaller D) $1,800
Activated Hybrid Benefit on 8 Windows VMs $2,100
Azure SQL Serverless on 5 dev databases $900
Storage Lifecycle Management on 3 accounts $1,400
Log Analytics: disabled 14 diagnostic sources, set daily caps $2,600
Deleted 2 idle VPN Gateways $350
1-year Reserved Instances on right-sized prod VMs $2,200

Total: $11,350/month saved, ~63% reduction. Timeline: 3 weeks.


If you want me to run this review on your Azure account on a pay-for-savings basis, book a call.