AI Strategy in Reality · Insight

The Cost of Autonomy: Why Four Autonomy Levels Produce Four Different Business Cases

Four autonomy levels produce four cost structures. Most business cases price only the compute — which is why pilots look cheap and scale-ups blow the budget.

9 min read
April 21, 2026
HandsOn Insights

The most expensive AI mistake in the enterprise right now is a pricing error when calculating business cases. Companies buy autonomy without pricing it properly. It is treated like a typical technology implementation. The organizational system that has to carry the deployed autonomy — governance, monitoring, accountability, capability — is not in scope.

BCG’s 2026 AI Radar, based on a survey of 2,400 executives, puts numbers on the pattern. AI spending is expected to rise from 0.8% to 1.7% of revenues in 2026 — roughly a doubling across the sample. In the same dataset, 70% of the value from AI comes from people and process change; only 10% comes from algorithms themselves. If the budget is doubling and seven-tenths of the value lives in the organization, the organizational investment has to move in lockstep. In the companies we work with, it rarely does. That delta is what we call the Cost of Autonomy.

Companies buy autonomy without pricing it. The organizational system that has to carry it is nowhere to be seen in the business case.

The technological implementation cost line is not the whole truth

When a CFO signs off on an AI business case today, the P&L impact model usually has three lines: licences, computational costs, and technical implementation. What is missing is the operating cost that appears once the system is live and takes on autonomy. That cost is structural, not incidental, and it scales non-linearly with the autonomy level at which the system runs.

BCG AI Radar 2026 · 2,400 Executives
70% / 10%
70% of AI’s value comes from people and process change. Only 10% comes from algorithms. AI spending doubles from 0.8% to 1.7% of revenues in 2026 — and the organizational investment has to move in lockstep.

A system that purely recommends is cheap to govern because a human makes every decision. A system that executes end-to-end inside a policy is expensive to govern because the organization has to build a monitoring, escalation, and recalibration architecture that is defensible under EU AI Act scrutiny and operationally embedded into everyday work. Article 14 of the EU AI Act is explicit: human oversight must be assigned to a person with the necessary competence, training and authority to intervene in or stop a high-risk AI system. That authority requires headcount, role design, training hours, governance forums, and a reporting line. None of those live in the infrastructure budget.

“Human oversight shall be assigned to a natural person who has the necessary competence, training and authority to effectively exercise oversight of the high-risk AI system.”

— Article 14, EU AI Act · Regulation (EU) 2024/1689

The failure mode is predictable. A Level 1 pilot delivers a proof point at a flattering cost. The business case for scale assumes the cost structure of the pilot. The system is then promoted to a higher autonomy level — because that is where the ROI actually sits — and the organizational load appears as incidents, stalled deployments, and the CFO question nobody can answer: why did this pilot cost three times the business case? To answer that question in advance, you need a vocabulary for autonomy.

Four autonomy levels, four cost profiles

The HandsOn AI Operating Model defines the Human-AI Interface — the organizational architecture of decision-making and accountability when humans and AI share responsibility — as the core design object of an AI-enabled organization. The interface expresses itself at one of four autonomy levels. Every AI-enabled decision type in the enterprise runs at exactly one of them.

Level 1 · Foundation
Human-in-the-Loop
AI recommends; a human approves every output. Safe by design. Governance load low, capability load moderate, monitoring load negligible. Unscalable in high-volume processes.
Level 2 · Foundation
AI Decides, Human Reviews
AI executes inside a defined scope; a human reviews a sample and handles exceptions. Looks cheapest on paper. Carries the heaviest hidden cost in the model.
Level 3 · Activation
AI Decides, Human Notified
AI runs end-to-end inside policy limits. Humans monitor at system level. Named accountability (AI Owner, AI Steward, override authority) becomes load-bearing.
Level 4 · Activation
Human-in-the-Exception
AI orchestrates multi-step workflows. Humans set objectives; intervene only on exceptions. Heaviest governance load. EU AI Act documentation obligations scale here.

Each level forces a different cost structure. The error most organizations make is to build the technological infrastructure for Level 3 and the governance infrastructure for Level 1 — and then wonder why the things go wrong.

Level 2 is where the business case usually breaks

Levels 1, 3 and 4 force clarity on the organization. Level 1 has no autonomy to govern. Levels 3 and 4 are obviously autonomous — you cannot deploy them without sampling sizes, monitoring thresholds, kill switches, and an escalation path. The risk is visible, so the design work happens.

Level 2 is the trap. It looks like oversight because a human is formally in the loop. The oversight only works if three things are written down and resourced: the sampling rate, the exception trigger, and the authority to override or retrain the model. Skip any one of them and the review becomes a rubber stamp. That is where most AI incidents in mid-sized organizations come from — AI that is nominally reviewed by humans who have neither the time nor the mandate to actually check and if necessary reject the output.

Deloitte Human Capital · 2025
62% / 5%
62% of executives say AI already influences the majority of their decisions. Only 5% report meaningful progress on decision-making governance. The gap sits almost entirely at Level 2 — the zone organizations assume is safe because someone is technically reviewing the output.

Article 14 of the EU AI Act does not treat decorative review as real either. Competence, training, and authority are the standard the regulator will measure against, and a queue of flagged items with no one authorized to reject them does not meet it. The cost consequence is specific. At Level 2, an organization pays for three parallel systems at the same time: the AI execution layer, the human review layer, and the interface that keeps them coupled. If the review layer is built at the capacity and skill level required to actually reject outputs, Level 2 is often more expensive per decision than Level 3 — because at Level 3 the humans have moved to system-level oversight and the per-decision labour has been engineered out.

The autonomy level that feels cautious at first is frequently the most expensive one to run

Why the review architecture has to be real, not decorative

Two things determine whether Level 2 oversight is real: the authority of the reviewer and the design of the exception trigger. Both are organizational/ governance variables.

Authority is about decision rights. If the reviewer cannot overrule the model, request retraining, or suspend the system without a three-layer escalation, the review workflow is functioning as throughput management, not oversight. The HandsOn Decision Rights Registry treats this as a design artifact: for every AI-enabled decision type, there is a named authority, an evidence standard, and a recalibration trigger. A company that cannot produce this registry for its top five AI use cases does not have Level 2 oversight; it has a queue.

The exception trigger is the second variable. A review gate that flags 1% of cases and is resourced to review 100% is expensive but functional. A review gate that flags 20% of cases and is resourced to review 5% is a compliance exposure — because 15% of outputs are passing through unreviewed while the organization tells itself a human is in the loop. The trigger rate, the resourcing, and the statistical sampling design all have to be published, monitored, and adjusted as model performance drifts. That work is invisible in most business cases. It is also the single largest driver of Level 2 run-rate cost in our engagements.

McKinsey’s State of AI data indicates that roughly 80% of AI-using organizations have not redesigned a single workflow around their AI deployments. That is a stark way of saying the same thing: the review architecture exists in slideware, not in the operating model. As long as that is true, the Cost of Autonomy will show up as incidents, rework, and regulatory exposure rather than as a line in the business case.

How to price autonomy before you buy it

The practical question is what a COO or CFO should actually do in the next thirty days. Three decisions turn the Cost of Autonomy from a post-hoc surprise into a priced line item.

Decision 1
01
Classify every AI system by target autonomy level
Not the level it runs at today. The level required to deliver the business case. A forecasting system with a human approving every output is Level 1; if the ROI depends on approving 10,000 outputs a week, the real target is Level 2 or 3. This exercise alone reliably surfaces 5–15 systems per mid-sized enterprise that are planned at one level and budgeted at another.
Decision 2
02
Price the organizational load per level
For each target level, estimate governance, monitoring, capability, and accountability cost as explicit line items. Review headcount at Level 2, dashboard and drift tooling at Level 3, exception engineering at Level 4. Training programs calibrated to the target level — not generic AI literacy. The exercise forces the conversation out of the compute budget and into the operating budget where the real cost lives.
Decision 3
03
Install classification governance
Who is authorized to advance a system from one autonomy level to the next, and under what evidence standard? The most frequently skipped step, and where most regulatory exposure gets built. A Level 2 system that quietly becomes Level 3 without a governance review is a non-compliance event waiting to be discovered. One page is usually enough — but it has to name an authority, an evidence threshold, and a cadence.

These three decisions are within reach of any Vorstand or executive committee this quarter. None of them require a new tool stack. All three fail politely if the organization has not already decided who is accountable for AI outcomes at the board level — which is, separately, the precondition for any of this to work.

The Cost of Autonomy is a line item, not a surprise

The Cost of Autonomy decides whether your AI portfolio earns its business case or embarrasses it. Four autonomy levels; four organizational cost structures. A pilot priced at Level 1 and scaled to Level 2 or 3 without rebuilding the cost model is the single most common cause of AI business cases that look strong on paper and fail in production.

If you run an AI portfolio: take the five largest initiatives, classify each by target autonomy level, and ask your CFO to price the organizational load per level. If the delta between compute budget and full cost is less than 2x, the exercise is incomplete. If you are at Vorstand or board level: put the three decisions above on the agenda — classification, organizational load pricing, classification governance. None of them require new technology. All of them are cheaper to take now than after the first incident.

HandsOn · AI Maturity Assessment

What autonomy level is your next AI check going into?

The HandsOn AI Maturity Assessment maps your AI portfolio against the four autonomy levels — classification, organizational load pricing, and classification governance — and returns a board-ready view of where the Cost of Autonomy is unpriced in your business case.

Similar Posts