Cloud & DevOps7 min read

Cloud cost optimization: a practical framework

The average enterprise wastes 30% of its cloud spend. Not because the cloud is expensive — but because cost optimization is treated as an afterthought instead of an engineering discipline.
Pricing Models

Choose the right model for each workload

No single pricing model fits everything. The art is matching each workload to the right commitment level.

On-Demand

Best for: Unpredictable workloads, dev/test environments

Savings: 0%

Risk: Highest per-unit cost if used for steady-state

relative cost ↑

Reserved Instances

Best for: Predictable, always-on workloads

Savings: Up to 72%

Risk: Locked in for 1-3 years — forecast must be accurate

relative cost ↑

Savings Plans

Best for: Flexible commitment across instance families

Savings: Up to 66%

Risk: Still a commitment — need usage baselines

relative cost ↑

Spot / Preemptible

Best for: Batch processing, CI/CD runners, fault-tolerant jobs

Savings: Up to 90%

Risk: Can be interrupted with 2 min notice

relative cost ↑

Where Money Leaks

Top 5 sources of cloud waste

35%Idle instancesRunning 24/7 but utilized less than 10%. Schedule or rightsize.
25%Oversized resourcesProvisioned for peak, running at 15% average. Rightsize to actual usage.
15%Unattached storageEBS volumes, snapshots, and disks with no active attachment.
15%Forgotten environmentsDev/staging environments left running over weekends and holidays.
10%Unoptimized data transferCross-region and cross-AZ traffic that could be colocated.
The Framework

Six steps to sustainable cloud cost management

01

Visibility

Tag everything. Know which team, project, and environment every dollar maps to.

02

Right-Sizing

Match instance types to actual CPU, memory, and I/O usage — not guesswork.

03

Scheduling

Auto-stop dev/staging environments outside working hours. Save 65% on non-prod.

04

Commitment

Use reserved instances or savings plans for steady-state workloads after 3 months of baselines.

05

Automation

Set up automated policies: scale-down, snapshot cleanup, orphan resource deletion.

06

Culture

Make cost a first-class engineering metric. Show spend on team dashboards alongside latency and uptime.

Avoid These

Six cost optimization mistakes we see repeatedly

Optimizing before you have visibility — you can't cut what you can't see
Buying reserved instances in month one — wait for 3 months of usage data
Ignoring data transfer costs — they can be 20-30% of your bill
Treating cost optimization as a one-time project instead of ongoing practice
Centralizing all cost decisions in finance — engineers make the spending choices
Chasing the cheapest option instead of the best cost-per-outcome
Tagging Strategy

Building a cost allocation tagging strategy that actually works

Tagging is the foundation of cloud cost visibility. Without consistent, enforced tags, your cost reports are noise. Here is a tagging strategy that scales from 10 to 10,000 resources.

Start with four mandatory tags on every resource: team (who owns it), environment (production, staging, development), project (which business initiative it supports), and cost-center (which budget line absorbs the cost). These four tags answer the questions that finance, engineering leadership, and platform teams ask most frequently: who is spending, on what, and why.

Enforce tagging at provisioning time, not after the fact. In Terraform, use validation rules that reject resource creation without required tags. In AWS, deploy Service Control Policies that deny resource creation when mandatory tags are missing. In Azure, use Azure Policy to enforce tagging at the subscription or management group level. The key principle is that an untagged resource should be impossible to create, not something you chase down during monthly cost reviews.

Beyond the four mandatory tags, add contextual tags that support automation. A schedule tag indicating business-hours-only or always-on enables automated start/stop policies. An expiry tag on temporary resources triggers automated cleanup. A data-classification tag (public, internal, confidential, restricted) helps security teams identify high-value targets. Build tag governance into your platform team's responsibilities — publish a tag dictionary, audit compliance weekly, and report tag coverage as a platform health metric.

Spot Patterns

Spot instance patterns for batch and fault-tolerant workloads

Spot instances offer up to 90% savings, but using them effectively requires architectural patterns that handle interruption gracefully. These patterns work for CI/CD, data pipelines, and batch processing.

Diversified fleet strategy

Never depend on a single instance type for spot capacity. Spot pricing and availability vary independently across instance types and availability zones. Configure your auto-scaling groups or managed instance groups to bid across at least four to six instance types with similar compute profiles. On AWS, use Spot Fleet or EC2 Fleet with the capacity-optimized allocation strategy, which automatically selects from instance pools with the highest available capacity, reducing interruption frequency. On GCP, use managed instance groups with preemptible VMs spread across multiple zones.

Checkpointing for long-running jobs

For batch jobs that run longer than two hours, implement checkpointing — periodic saves of intermediate state to durable storage. When a spot interruption occurs (you get a two-minute warning on AWS, 30 seconds on GCP), the job saves its current state and terminates gracefully. When capacity returns, a new instance picks up from the last checkpoint rather than restarting from scratch. Apache Spark, TensorFlow, and most data processing frameworks support native checkpointing. For custom batch jobs, write state to S3 or GCS at regular intervals and implement resume logic in your job orchestrator.

Hybrid on-demand and spot for CI/CD

CI/CD runners are ideal spot candidates because builds are short-lived and idempotent. Run a small on-demand baseline (enough for your team's minimum daily workload) and burst into spot instances during peak hours. Tools like Karpenter for Kubernetes or custom Lambda-triggered scaling can provision spot runners in response to queue depth. If spot capacity is unavailable, fall back to on-demand for critical pipeline runs. This hybrid approach typically achieves 60 to 70 percent savings on CI/CD infrastructure without impacting developer velocity.

Right-Sizing

Container right-sizing methodology

Kubernetes makes it easy to over-provision. Most teams set resource requests and limits once during initial deployment and never revisit them. Here is a systematic approach to right-sizing containers.

Start by collecting at least two weeks of actual resource utilization data using your monitoring stack — Prometheus with cAdvisor metrics, Datadog, or your cloud provider's container insights. Focus on four metrics per container: CPU usage (average and P99), memory usage (average and peak), CPU throttling percentage, and OOM kill frequency. These four data points tell you whether a container is over-provisioned, under-provisioned, or correctly sized.

Set CPU requests to the P95 usage value and CPU limits to two to three times the request. This gives containers enough headroom for traffic spikes without reserving capacity they rarely use. For memory, set requests to the P99 usage value plus a 10 to 15 percent buffer, and set limits equal to requests. Memory is incompressible — when a container exceeds its memory limit, it gets OOM-killed, which is worse than a brief CPU throttle. Conservative memory limits with generous CPU limits is the right trade-off for most workloads.

Automate this process using the Kubernetes Vertical Pod Autoscaler (VPA) in recommendation mode. VPA analyzes historical usage and suggests optimal requests and limits without automatically applying them. Review VPA recommendations weekly as part of your cost optimization cadence. For organizations managing hundreds of services, tools like Kubecost, StormForge, or Robusta provide fleet-wide right-sizing recommendations with estimated savings. The typical result of a first right-sizing pass is a 30 to 50 percent reduction in requested resources, which directly translates to fewer nodes and lower compute costs.

FinOps Practice

Building a FinOps team structure

Cloud cost optimization is not a one-time project — it is a practice that requires dedicated ownership. Here is how to structure a FinOps function that scales with your organization.

At its core, FinOps is a cultural practice that brings together engineering, finance, and business leadership to make informed spending decisions. The FinOps Foundation defines three personas: the FinOps practitioner who drives the practice day-to-day, the engineering teams who make the technical decisions that affect cost, and the finance and procurement teams who manage budgets and commitments. In smaller organizations (under 50 engineers), FinOps responsibility often sits with a senior platform engineer who dedicates 20 to 30 percent of their time to cost visibility and optimization. In larger organizations, a dedicated FinOps team of two to four people manages the practice full-time.

Regardless of team size, establish three recurring ceremonies. First, a weekly cost review where the FinOps practitioner reviews the past week's spending against forecast, identifies anomalies, and flags optimization opportunities. Second, a monthly cost allocation review where engineering leaders verify that their team's spending aligns with business priorities and committed budgets. Third, a quarterly commitment review where finance and engineering jointly evaluate reserved instance and savings plan coverage, assess upcoming capacity needs, and make purchasing decisions. These ceremonies create the feedback loop that prevents cloud spending from drifting.

The most effective FinOps teams operate on a show-back model before moving to charge-back. Show-back means every team sees their cloud costs attributed to them on dashboards and in monthly reports, but they are not formally billed for overages. This builds cost awareness without creating adversarial dynamics. Once teams are comfortable interpreting their cost data and taking action on optimization recommendations, transition to charge-back where each team's cloud spend is deducted from their budget. The transition typically takes six to twelve months and dramatically changes how engineers think about resource provisioning.

Spending more than you should on cloud?

We help organizations right-size infrastructure, implement cost automation, and build FinOps practices. Let's look at your cloud bill together.
Start Your Project

Let's discuss what we can build together

Whether you're modernizing legacy systems, launching a new product, or solving a complex technical challenge, we'd welcome the opportunity to understand your needs.