Why is the AI Revolution Stalling? Blame the Agentic Last Mile

Robert Cesafsky

May 6 7 Minute Read

A man sitting and watching a chain of falling dominoes.

Summary

In boardrooms I've sat in over the past year, the conversation around AI has quietly shifted. We've stopped asking "What’s possible?" and started asking something far more worrisome: "Where the heck is the ROI?"

The data is expensive and unambiguous. According to Gartner, worldwide AI spending is forecast to reach $2.5 trillion this year. And yet enterprises across every sector are mired in what analysts call the "Trough of Disillusionment." Capital is pouring in. Value keeps leaking out. The gap between the two has become the defining business problem of our era.

A staggering volume of AI investment is yielding frustratingly little return — and the root cause sits at the layer where agents actually have to perform inside a real enterprise. That gap has a name: the Agentic Last Mile.

The $13 Billion Hallucination

The first wave of enterprise AI followed a horizontal logic: take a large, generic model, attach it to a specialized business process, and expect transformation. Finance, resource management, legal compliance — the same approach applied everywhere.

The bill for that strategy is now due. A 2025 study by AllAboutAI found that between 2023 and 2025, companies spent $12.8 billion specifically to address hallucination problems — and in my view, that figure is conservative by a factor of four or five. Models operating without domain context made consequential mistakes in revenue forecasting, compliance reporting, and logistics — the kind of systematic failures that accumulate quietly until they're impossible to ignore.

The broader picture reinforces this. An MIT report found that 95% of generative AI pilots failed to deliver measurable bottom-line impact, and McKinsey found that only 5.5% of organizations qualify as "high performers" seeing real EBIT lift. The pattern is consistent enough that it has a name: pilot purgatory. Execution, specifically at the layer where agents have to perform inside a live enterprise, is where most implementations run out of road.

The Trap of the 85% Generalist

The failure rate starts to make sense when you understand what generic AI actually is. These models are probabilistic engines trained on open internet data to predict the next likely word or output. Extraordinarily broad in knowledge, but genuinely limited in depth. Brilliant generalists, built for breadth rather than precision.

In professional services or finance, an AI that is right 85% of the time functions more as a liability than a productivity tool. That remaining 15% is exactly what the Agentic Last Mile represents: the domain-specific context, institutional rules, and operational metadata that separate a plausible answer from a correct one.

The verification cost of that gap is tangible. A 2026 Foxit report found that the time employees spend reviewing and correcting AI outputs translates to roughly $12,400 per employee, per year in lost productivity; a sort of "verification tax" that quietly offsets most of the efficiency gains AI was supposed to deliver. When senior consultants and controllers are manually auditing AI-generated project margins and revenue recognition schedules, the automation promise has already unraveled.

Is your infrastructure ready for AI?

The AI-Readiness Assessment.

Take the Short Test

Why "Agentic" Changes the Question

Building an AI prototype is fast and cheap. Getting that agent to function reliably inside a real enterprise with compliance obligations, privacy requirements, scale constraints, and institutional complexity is an entirely different undertaking. That transition is the last mile.

Think of the enterprise-ready agent as the top of a pyramid, rather than a standalone product. What sits underneath determines whether the agent at the peak actually works.

The foundation is data and context: structured customer records, unstructured information, and what I'd call context engineering — the deliberate work of shaping inputs so the agent understands relationships and intent, not just raw facts. Above that sits a layer many overlook entirely: metadata and telemetry. Most organizations recognize the value of their data. Far fewer appreciate that metadata (the information about their data) is what gives an agent the context to act with precision.

The next layer is what makes action possible: application logic and APIs. This is the operational connective tissue the system's hands and legs. Without it, the agent can reason but cannot execute. At the center of everything is the reasoning layer, the brain that determines which specific action to take at which moment. LLMs make statistically likely guesses; enterprise execution demands consistent, auditable outputs tied to specific rules and logic. The reasoning layer is what provides that determinism, routing the right instruction to the right system at the right time with a reliability that probabilistic inference alone cannot offer.

An agent sitting on top of that full stack is something worth investing in. But a UI layered over a search engine and labeled an agent is what most vendors are actually selling.

The Need for Domain-Specific Guardrails

Intelligence without context is noise. The enterprises leading the race have internalized this.

The path forward runs through specificity, anchoring AI within the ontologies of your business: the relationships between your people, your contracts, your financials, and the rules governing how they interact. When you build that framework carefully, using curated, high-authority data, the model's behavior changes materially. Hallucinations tied to domain logic decrease. Outputs become auditable. The agent starts to feel less like a research assistant and more like a colleague who actually understands your business.

McKinsey's 2025 State of AI report found that high performers were nearly three times as likely to have fundamentally rebuilt their workflows rather than layered AI on top of existing ones. That finding points to something more than a technical preference; it reflects a different theory of what AI is actually for. Conquering the Agentic Last Mile requires AI that is natively woven into your business logic from the start.

Seeking the Expertise of the Specialist

The great AI correction of 2026 is clarifying something we should have known from the start: generalists are good for exploration; specialists are built for execution.

If your company needed to audit global tax compliance, you wouldn't hire someone who had read everything but never worked a balance sheet. You'd hire an expert. The same reasoning should govern your technology decisions.

The companies scaling AI successfully are identifying trusted domain partners that have spent years, sometimes decades, encoding the specific logic, metadata, and process intelligence of their industries. Those partners provide the guardrails that allow an agent to move from making suggestions to taking action, something a generic platform structurally cannot offer.

The Agentic Last Mile is where the $2.5 trillion bet on AI either pays off or gets written down. The enterprises that close it will do so by building the pyramid underneath the agent — and then letting the one at the top actually work.

If you’re done with generalists and ready for a specialist built specifically for the services economy, it's time to see these principles in practice. Watch the demo introducing Veda, Certinia’s new AI engine designed specifically to solve the agentic last mile, by grounding AI in your actual services data and logic.

The PSA Evaluation Questions Most Teams Never Ask