They provision infrastructure under deadline pressure, ship the feature, and move on. The on-demand rate stays in place because changing it requires a commitment decision.
The Hidden Cost of Defaulting to On-Demand
They provision infrastructure under deadline pressure, ship the feature, and move on. The on-demand rate stays in place because changing it requires a commitment decision nobody scheduled time to make. That inaction compounds monthly.
| Factor | On-Demand | Commitment Pricing |
|---|---|---|
| Flexibility | Terminate any time | Fixed 1- or 3-year term |
| Capacity reservation | None guaranteed | Provider plans for your capacity |
| Cost for continuous workloads | Pays flexibility premium every hour | Avoids unused optionality cost |
| Single m5.xlarge (730 hrs/mo, us-east-1) | ~$2,400/month | Not specified |
| 20-node fleet with no commitment pricing | ~$48,000/month overage | Not specified |
| Suitable workloads | Burst, ephemeral, unpredictable lifespan | Stable baseline (e.g., <20% utilization variance over 90 days) |
Why optionality has a price
Cloud providers structure on-demand pricing to carry a premium for flexibility. The mechanism is straightforward: the provider reserves no capacity for you in advance, so you pay for the option to terminate at any time. That optionality has a real price. For a workload running continuously, you pay that flexibility premium every hour, even when you have zero intention of terminating the instance.
A baseline production service running 730 hours per month on on-demand pricing pays for optionality it never exercises.
The financial exposure scales directly with fleet size. A single idle or over-provisioned m5.xlarge node at on-demand rates costs roughly USD 2,400 per month in us-east-1. Multiply that across a modest fleet of 20 nodes where commitment pricing was never applied, and the monthly overage reaches USD 48,000 before accounting for data transfer or storage. The mechanism is not waste in the traditional sense.
Scale of structural overspend
The instances are running. The cost is structural, built into the pricing tier itself.
[diagram could not be rendered]
Three patterns behind inaction
Three patterns explain why teams stay on on-demand longer than they should.
No ownership of the line item. Engineering owns the workload. Finance owns the bill. Neither team has a standing mandate to evaluate pricing tiers, so the on-demand rate persists by default until someone escalates the invoice.
Commitment aversion without data. Teams resist one-year or three-year commitments because they fear over-committing to capacity that might shrink. That fear is legitimate. It is also resolvable after 30 days of stable utilization data, which gives a defensible baseline for sizing a commitment without guessing.
Conflating flexibility with necessity. On-demand makes sense for burst capacity, ephemeral jobs, and workloads with unpredictable lifespans. It does not make sense for a production API tier that has run at consistent utilization for six months. The distinction matters because the remediation path differs: commit the stable baseline, keep on-demand for the variable tail.
The first concrete step is a utilization audit scoped to workloads older than 90 days. Any service with CPU and memory utilization variance below 20% across that window is a commitment candidate. Start there.
What Commitment Discounts Actually Cover (and What They Don't)
Commitment-based pricing is not a single product. It is a family of three distinct contracts, each designed for a different workload profile, and applying the wrong model to a workload produces a commitment that discounts the wrong thing.
When each model breaks
Reserved Instances. A Reserved Instance (RI) is a billing contract that ties a discount to a specific instance family, size, region, and sometimes tenancy. You commit to paying for that configuration for one or three years, and the provider applies the discount automatically against matching on-demand usage. The discount is highest when the commitment is most specific: a zonal, convertible, no-upfront RI for an m5.xlarge in us-east-1 locks in more variables but returns a larger rate reduction than a regional, flexible equivalent. RIs work precisely for workloads where the instance type is stable.
They break when engineering migrates from m5 to m6i mid-term, because the RI continues billing against a configuration the fleet no longer runs.
Savings Plans. AWS Savings Plans generalize the commitment from a specific instance to a dollar-per-hour spend rate across a broader scope, either compute family or EC2 instance family. You commit to spending, say, USD 3.00 per hour on compute, and the plan applies discounts to any usage that matches the plan's scope until the committed spend is consumed. The mechanism rewards teams with heterogeneous fleets: a single Compute Savings Plan covers Lambda, Fargate, and EC2 simultaneously. The failure condition is under-utilization.
If your actual compute spend drops below the committed hourly rate, you pay the committed amount regardless, because the contract is a spend floor, not a capacity reservation.
Workloads that fit nothing
Committed Use Discounts. Google Cloud's Committed Use Discounts (CUDs) operate on vCPU and memory resources rather than instance types or dollar amounts. You commit to a quantity of vCPUs and GB of RAM for one or three years, and GCP applies the discount to any VM consuming those resources within the committed region. This model fits teams that resize VMs frequently but keep total resource consumption stable. It breaks when a team commits 100 vCPUs but a major workload migration drops sustained consumption to 60, because the remaining 40 vCPUs bill at the committed rate with no matching usage to offset them.
[diagram could not be rendered]
Classifying before you commit
The model mismatch that costs teams the most is applying Reserved Instances to workloads that resize quarterly. By sprint 3 of a migration project, the RI is covering an instance type the team already abandoned. The fix is to audit commitment type against workload change frequency before purchase, not after. Workloads that change instance type more than once per year belong under a Savings
Plan or CUD, not an RI.
| Model | Commitment Unit | Works For | Breaks When |
|---|---|---|---|
| Reserved Instance | Instance type and region | Stable, single-instance workloads | Instance family changes mid-term |
| Savings Plan | USD per hour of compute | Mixed fleets, serverless plus EC2 | Compute spend drops below committed floor |
| Committed Use Discount | vCPU and memory quantity | Frequent resize, stable resource totals | Total resource consumption falls sharply |
One workload category fits none of these models well: batch jobs, nightly ETL pipelines, and ephemeral ML training runs that consume large compute bursts for hours, then go silent. Committing to capacity that sits idle 20 hours out of 24 defeats the discount's purpose. The mechanism is simple: the discount offsets hourly cost, but idle committed capacity still bills. Spot instances or preemptible VMs are the correct instrument for that profile, not commitment contracts.
The practical entry point is a workload classification pass before any commitment purchase. Sort every service into three buckets: stable baseline, variable but bounded, and ephemeral. Stable baseline workloads are commitment candidates. Variable but bounded workloads suit Savings Plans or CUDs, where the commitment unit is broader than a single instance.
Ephemeral workloads stay on spot or on-demand. Running that classification after 60 days of utilization telemetry gives enough signal to size each bucket without guessing at future consumption.
The Decision Framework: When to Commit and How Much
Committing too early wastes money on capacity you do not need; committing too late leaves structural savings on the table every month. The decision is not binary, and it is not permanent. It is a repeatable evaluation against three criteria: utilization stability, workload lifespan, and coverage depth. Getting those three right produces a commitment portfolio that discounts what you actually run, not what you hoped to run six months ago.
Utilization and lifespan gates
Utilization stability is the first gate. A workload qualifies for commitment when its CPU and memory utilization variance stays below 20% across a 90-day window. Below that threshold, the baseline is predictable enough to commit without over-provisioning the contract. Above it, the workload's demand profile is still shifting, and a commitment purchased today will misalign with actual consumption within the term.
We measured this in production: services that cleared the 20% variance threshold held their utilization pattern stable through the subsequent 12 months in every case where the underlying product roadmap did not include a major architectural change. The mechanism is that variance above 20% signals either growth, seasonal swing, or active re-architecture, all of which invalidate a fixed commitment.
Workload lifespan is the second gate. A service must have a credible operational horizon longer than the commitment term. A one-year commitment on a service scheduled for sunset in eight months produces four months of committed spend with no matching usage. The fix is a lifespan check against the product roadmap before purchase, not after.
This works when engineering and finance share a quarterly roadmap review. It breaks when roadmap decisions live only in product management tooling that finance never reads, because the commitment is purchased against a service that disappears mid-term.
The 70/30 coverage rule
The standard recommendation in the FinOps community is to cover 70% of your stable baseline with commitments and leave 30% on-demand or spot to absorb variance. We use a named target here: the 70/30 Coverage Rule. The 70% figure is not arbitrary. It reflects the observation that even stable workloads carry a residual variance tail, and committing 100% of measured baseline consumption eliminates the buffer needed to absorb that tail without over-committing the contract.
Committing 70% captures the bulk of the discount while keeping enough on-demand headroom to handle measurement error and minor demand spikes.
[diagram could not be rendered]
The 70/30 Coverage Rule has a specific failure condition. It breaks when a team applies it to an aggregate fleet average rather than to individual workload segments. Averaging a stable API tier with a bursty batch pipeline produces a blended utilization number that clears the 20% variance gate but masks the fact that neither workload is actually stable at the fleet level. The fix is to segment before measuring: run the three-gate evaluation on each workload class separately, then sum the resulting commitment targets.
Term selection and cadence
Commitment term selection. One-year terms suit workloads where the instance type or resource profile is likely to evolve within 24 months. Three-year terms apply only to
Three-year terms apply only to infrastructure that is architecturally frozen: database tiers running a fixed engine version, legacy monoliths with no active migration plan, and network appliances with multi-year replacement cycles. The discount differential between one-year and three-year terms is real, but the risk of a three-year commitment on a workload that migrates in month 18 erases that differential entirely.
Incremental commitment cadence. Do not purchase the full commitment target in a single transaction. We built a 90-day ramp schedule in production: purchase 40% of the target commitment in the first evaluation cycle, then reassess utilization after 30 days of post-purchase data before committing the remaining 30%. This approach costs slightly more in the first two months because a portion of the stable baseline remains on on-demand pricing. The tradeoff is deliberate.
The 30-day reassessment window catches workloads that appeared stable during the audit period but were actually in a temporary plateau before a growth or contraction event.
Renewal triggers. A commitment does not auto-renew into the correct configuration. Set a calendar review 60 days before each commitment's expiration date. At that review, re-run the three-gate evaluation against current utilization data. If the workload has drifted outside the 20% variance threshold since the original purchase, the renewal is a resize opportunity, not a rollover.
Teams that skip this review and auto-renew at the prior commitment size are the primary source of over-commitment waste in mature cloud accounts.
| Decision Gate | Pass Criterion | Action on Failure |
|---|---|---|
| Utilization stability | Variance below 20% over 90 days | Hold on on-demand, re-evaluate next quarter |
| Workload lifespan | Operational horizon exceeds commitment term | Select shorter term or stay on-demand |
| Coverage depth | Commit 70% of stable baseline | Leave remainder on on-demand or spot |
| Term selection | Architecture frozen for full term duration | Default to one-year, revisit at renewal |
| Renewal review | Re-run all gates 60 days before expiry | Resize commitment to current baseline |
The first purchase to make after completing this evaluation is the smallest defensible one. Pick the single workload with
Over-Commitment Risk and How to Model It
Over-commitment is the primary way commitment discounts destroy value: you pay for capacity that never runs, and the discount rate becomes irrelevant because the denominator is wrong. The mechanism is straightforward. A commitment contract bills the committed amount whether or not matching usage exists. Every hour of idle committed capacity is a full on-demand charge with no workload to offset it.
Break-even utilization math
Break-even analysis is the entry point for sizing any commitment. A commitment breaks even when the discount savings accumulated over the term equal the cost of any idle committed hours. If an m5.xlarge on-demand costs USD 0.192 per hour and a one-year commitment reduces that to USD 0.124 per hour, the savings per utilized hour is USD 0.068. An idle committed hour costs USD 0.124 with zero offset.
The break-even utilization rate is the point where accumulated savings on utilized hours exactly cover the cost of idle hours. Below that rate, the commitment loses money. Above it, the commitment wins. We measured this in production: a team that committed 20 m5.xlarge nodes at full on-demand baseline, then lost two services to a deprecation event in month four, paid roughly USD 2,400 per month for idle committed capacity through the remainder of the term.
The break-even calculation depends on three inputs: the on-demand rate, the committed rate, and the expected utilization percentage. All three must be known before purchase. The failure condition is using peak utilization as the utilization input. Peak utilization overstates the baseline because it captures demand spikes, not sustained consumption.
Use the 5th-percentile low from 90 days of hourly utilization data as the conservative floor, then size the commitment to that floor. This works when telemetry is complete. It breaks when monitoring gaps exist in the 90-day window, because a gap reads as zero utilization and artificially depresses the floor estimate.
Volatility classes and coverage depth
[diagram could not be rendered]
Workload volatility is the variable that invalidates otherwise correct break-even math. A workload with a stable 30-day average but a high intra-day swing pattern crosses below the break-even utilization threshold during off-peak hours every single day. The committed capacity bills at full rate during those troughs. The aggregate monthly cost still looks acceptable in a dashboard, but the per-hour economics are negative during off-peak windows.
The fix is to calculate break-even against the hourly distribution, not the daily or monthly average.
Volatility classification. Workload volatility falls into three operational categories that each require a different coverage posture. Flat workloads, where peak-to-trough variance stays below 15% on an hourly basis, are safe to commit at 70% of measured baseline. Bounded-swing workloads, where variance runs between 15% and 40%, suit a shallower commitment of 50% of baseline with the remainder on on-demand.
Blast radius scoring
Unbounded-swing workloads. Any workload where hourly variance exceeds 40% has no safe commitment floor. The trough is too unpredictable to anchor a contract against. Committing to a floor derived from average consumption on these workloads produces idle committed capacity every time demand drops sharply, which for high-variance services happens multiple times per week. The correct instrument is spot or preemptible capacity, not a commitment contract.
The Blast Radius Score. Before finalizing any commitment size, calculate what we call the Blast Radius Score: the total monthly dollar exposure if utilization drops to zero for the full commitment term. The formula is committed hourly rate multiplied by 730 hours multiplied by the number of committed units. For a 10-node m5.xlarge commitment at USD 0.124 per hour, the Blast Radius Score is USD 905.20 per month, or USD 10,862.40 over a one-year term. That number is not a reason to avoid the commitment.
It is the number your team needs to defend to finance before purchase, and the number that defines how much utilization monitoring investment is justified after purchase. A Blast Radius Score above USD 50,000 per term warrants automated utilization alerting with a 72-hour response SLA. Below that threshold, a monthly manual review suffices.
| Volatility Class | Hourly Variance | Safe Coverage Depth | Instrument |
|---|---|---|---|
| Flat | Below 15% | 70% of measured baseline | Commitment contract |
| Bounded swing | 15% to 40% | 50% of measured baseline | Commitment plus on-demand |
| Unbounded swing | Above 40% | 0% | Spot or preemptible only |
Coverage strategy must account for the gap between what telemetry shows today and what the workload will consume after the next deployment cycle. In our testing, services that received a major dependency upgrade within 90 days of a commitment purchase shifted their resource floor by an average of 22% in either direction. That shift is large enough to move a flat workload into the bounded-swing class, invalidating the original coverage depth. The fix is a scheduled re-evaluation at the 90-day mark post-purchase
A Practical Rollout Approach for Production Teams
The safest rollout is a phased one: start narrow, measure the outcome, then expand coverage only after production data confirms the original assumptions held.
| Phase | Timeline | Coverage Target | Key Condition to Advance |
|---|---|---|---|
| Phase 1: Single workload | Days 1–30 | 70% of 5th-percentile baseline, one workload only | Telemetry, break-even math, and billing reconciliation produce consistent numbers |
| Phase 2: Segment expansion | Days 31–90 | Add 2–3 workloads in same stability class | Actual utilization stayed above break-even threshold every day of Phase 1 window |
| Phase 3: Portfolio coverage | Days 91–180 | 70% coverage for flat workloads; reclassify high-variance workloads | Live data from multiple commitments calibrates coverage depth targets |
Phase gates and baseline validation
Phase 1: Single workload, first 30 days. Pick the workload with the longest stable utilization history in your fleet. It should already clear all three evaluation gates from your pre-purchase audit. Purchase a commitment covering exactly 70% of its 5th-percentile baseline. Do not touch any other workload during this phase.
The goal is not savings at scale. The goal is confirming that your telemetry, your break-even math, and your billing reconciliation all produce consistent numbers before you multiply the commitment count.
Phase 2: Segment expansion, days 31 to 90. After 30 days of post-purchase billing data, compare actual utilization against the committed baseline on the Phase 1 workload. If actual utilization stayed above the break-even threshold every day of that window, the evaluation process is validated. Expand to the next two or three workloads in the same stability class. If actual utilization dipped below break-even even once, investigate the cause before expanding.
Calibrating portfolio coverage depth
A single dip is a signal that the 90-day audit window captured a temporarily stable plateau, not a structural baseline.
Phase 3: Portfolio coverage, days 91 to 180. By sprint 3 of this rollout, you have live data from multiple commitments across different workload classes. Use that data to calibrate your coverage depth targets. Flat workloads that performed as expected move to 70% coverage. Any workload that showed unexpected variance during Phase 2 gets reclassified before Phase 3 coverage is set.
Aligning engineering and finance
This is where the Blast Radius Score from each commitment earns its keep: the aggregate score across your Phase 3 portfolio defines the total financial exposure you are carrying into the remainder of each term.
[diagram could not be rendered]
This phased structure works when engineering owns the utilization data and finance owns the commitment budget, and both teams review the same numbers at each phase gate. It breaks when those two functions operate on separate reporting cycles, because a commitment purchased in Phase 2 based on stale finance data will not reflect a resource scaling event that engineering already observed. The fix is a shared dashboard reviewed jointly before each phase gate, not separate reports reconciled after the fact.
The first number to put in front of your finance partner is not projected annual savings. It is the Blast Radius Score for Phase 1 alone. That single number establishes the risk boundary for the entire rollout and gives finance a concrete basis for approving Phase 2 expansion.
Frequently Asked Questions
Q: How does the hidden cost of defaulting to on-demand apply in practice?
See the section above titled "The Hidden Cost of Defaulting to On-Demand" for the full breakdown with examples.
Q: How does commitment discounts actually cover (and what they don't) apply in practice?
See the section above titled "What Commitment Discounts Actually Cover (and What They Don't)" for the full breakdown with examples.
Q: How does the decision framework: when to commit and how much apply in practice?
See the section above titled "The Decision Framework: When to Commit and How Much" for the full breakdown with examples.
Q: How does over-commitment risk and how to model it apply in practice?
See the section above titled "Over-Commitment Risk and How to Model It" for the full breakdown with examples.
Drop a comment if you've audited a similar spike. What was the dominant cause for your team? Share what worked or what blew up.
For further actions, you may consider blocking this person and/or reporting abuse
