Akamai acquires LayerX, delivering end-to-end security and real-time AI usage control to any browser. Get details

Back Products Close

Cloud Computing

Cybersecurity

Content Delivery

See all products

Our Infrastructure

Global Services

Back Cloud Computing Close

Artificial intelligence (AI)

Akamai Inference Cloud

Storage

Object Storage

Block Storage

Backups

Databases

Managed Databases

compute

GPU

CPU

Kubernetes

App Platform

Accelerated Compute

Serverless

Akamai Functions

Networking

Cloud Firewall

DNS Manager

NodeBalancers

Private Networking

View cloud pricing

Explore plans and pricing that fit your needs — from small projects to global-scale deployments.

See pricing

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge, and AI tools built for your business.

Sign up

See all Cloud Computing

Back Cybersecurity Close

app and api security

API Security

App & API Protector

Firewall for AI

Client-Side Protection & Compliance

Bot & Agent Control

Account Protector

Content Protector

Bot Manager

AI Brand Presence

Segmentation

Akamai Guardicore Segmentation

zero trust security

Akamai Workforce Protector (formerly LayerX)

Secure Internet Access

Enterprise Application Access

Akamai MFA

Identity, Credential, and Access Management

infrastructure security

Edge DNS

Prolexic

IP Accelerator

DNS Posture Management

Brand Guardian

Get started with Security

Protect the applications that drive your business — every day, every time.

Contact Sales

See all Cybersecurity

Back Content Delivery Close

Application performance

Ion

API Acceleration

IP Accelerator

Media Delivery

Adaptive Media Delivery

Download Delivery

Edge Applications

EdgeWorkers

EdgeKV

Image & Video Manager

Media Services Live

Cloudlets

Cloud Wrapper

Global Traffic Management

Monitoring, reporting and testing

Data Stream

mPulse

CloudTest

Get started with Content Delivery

Trust the agility and scale of Akamai to help you flawlessly deliver extraordinary digital experiences.

Contact Sales

See all Content Delivery

Back Solutions Close

Cloud Computing

Serverless

Media

SaaS

Gaming

See all Cloud Computing

security

Frontier AI Security Risks

Akamai Application Protection Platform

Cybersecurity Compliance

Ransomware Protection

Secure Apps and APIs

DNS Delivery and Security

Zero Trust

DDoS Protection

Bot & Agent Control

Identity, Credential and Access Management

See all Cybersecurity

content delivery

App and API Performance

Media Delivery

See all Content Delivery

industry solutions

Media and Entertainment

Retail, Travel, and Hospitality

Financial Services

Healthcare and Life Sciences

Public Sector

Defense

Games

Online Sports Betting and iGaming

Service Providers

See all Industry Solutions

Back Pricing Close

Security and Delivery

Get started

Contact Sales

Free trials

Cloud pricing

GLOBAL PRICING

North America pricing

Europe pricing

Asia Pacific pricing

South America pricing

SPECIFIC LOCAL PRICING

Jakarta pricing

See all pricing

Cloud pricing

Try Akamai Cloud with US$100 in credits*

Deploy faster with global cloud infrastructure — no surprise bills, no lock-in, and transparent pricing across every data center.

Try now

*See Promotion Redemption Rules & Conditions

Back Developers Close

Cloud developers

Developer hub

Akamai GitHub repo

docs and guides

Cloud docs

Guides and tutorials

cloud marketplace

Developer apps

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge and AI tools built for your business.

Sign up

Back Resources Close

What’s new

Akamai blog

Events and workshops

Learning

White papers, ebooks, videos, product briefs

Customer stories

Training and certifications

Cybersecurity Research

Akamai Security Intelligence Group (SIG)

State of Internet (SOTI) reports

Partners

Partner with Akamai to innovate, scale, and grow your advantage

Channel Partners

Partner Portal

Partner Stories

Technology Partners

Technology Partners Directory

Log in

Back Log in Close

Cloud Manager
Manage your cloud computing services

Back Log in Close

Control Center
Manage your security and delivery services
- Docs
- Sales
- Support
- Under Attack ?
English
Back Language Close
- English
- Deutsch
- Español
- Français
- Italiano
- Português
- 中文
- 日本語
- 한국어

Create account

Under Attack?

Akamai Cloud

Akamai Security and Delivery

Connect with our Sales team to discuss your business needs and find the right solutions.

Contact Sales

Agentic Disconnect: The Latency Crisis Facing Modern AI Architecture

Jun 24, 2026

Jon Alexander

Written by

Jon Alexander

Jon Alexander is Senior Vice President of Product for the Cloud Technology Group at Akamai. He is responsible for the strategy, roadmap, and success of the cloud computing and delivery products. Jon joined Akamai in 2017 and led various product teams inside Akamai, starting within the media division. Previously, he worked in several roles focused on building large-scale internet infrastructure. Jon spent 10 years running the media business at one of the world’s largest telecommunications carriers and has led product teams at start-ups as they defined, launched, and grew new solutions. He is passionate about fostering innovation and building customer-centric product teams. He holds a Master of Arts degree and a Master of Engineering degree from Cambridge University.

The technology sector has reached a critical architectural impasse. While many stakeholders envision a future of autonomous AI agents that anticipate needs in real time, the underlying infrastructure remains tethered to a centralized topology that makes this vision physically impossible.

The industry markets instantaneous machine intelligence while delivering it over a framework built for human patience — a fundamental mismatch that threatens the viability of the next generation of enterprise AI.

For three decades, the cloud was optimized for the human threshold, comfortably absorbing the 100-ms delays inherent in crosscontinental data transit. But an AI agent does not operate on a human clock.

How multi-agent AI frameworks compound latency

Discussions around AI latency often focus on model optimization — the time required for a GPU to generate a token. Existing large language model (LLM)–serving engines optimize individual calls in isolation, while multi-agent frameworks focus on orchestration without system-level performance planning.

CPU overhead and GPU idle time

As a result, repeated prompts, overlapping contexts, and fragmented CPU-GPU execution create substantial redundancy and poor hardware utilization. Research has shown that CPU-side processing accounts for up to 90.6% of total latency and 44% of total dynamic energy in agentic workloads. A far more dangerous bottleneck exists within the agentic layer itself.

Modern agents combine multistep reasoning, heterogeneous tool use, and collaboration across multiple specialized agents. GPUs sit idle as long-duration tool calls running on CPU dominate the execution latency, often leading to high spikes in time to first token (TTFT) after key value (KV) cache is evicted. When each machine-to-machine call requires a round trip to a distant data center, transit time stacks aggressively.

According to Akamai’s The State of AI Inference 2026 report, a single workflow requiring 50 sequential calls quickly incurs seconds of transport latency; this is a threshold that renders AI applications unusable in production for the 82% of organizations whose critical use cases require end-to-end response times of 500 ms or less.

The enterprise market is largely unprepared for this reality, as even tighter constraints are emerging: 64% of operators are now targeting 250 ms or less.

The structural bottleneck

The persistence of centralized compute fortresses is not a technical necessity, but a legacy business incentive. Hyperscaler models prioritize centralized storage to protect established revenue streams, which creates a bottleneck that hybrid “edge extensions” fail to solve.

These centralized systems work well for monolithic applications where users and data are all colocated. However, for production agents where users are distributed, data is localized, and external APIs are regional, the centralized model fails. Centralization does not guarantee security; it guarantees delay. True Zero Trust security requires enforcement at the point of execution, not a distant cluster.

By defending an infrastructure physically too slow for the applications they claim to enable, incumbents force enterprises into a cycle of overprovisioning and unpredictable performance. This is no longer sustainable.

The path forward requires a compute continuum. We must build a seamless spectrum of intelligence that spans the distance from the massive centralized core to the exact millimeter of the network closest to the user. Distributed execution is the only way to bridge the agentic disconnect.

Decoupling brains from hands

Current industry fixation on the GPU overlooks a critical distinction: GPUs provide raw intelligence, but they do not provide execution. While deep reasoning belongs in the core, the operational orchestration of an agent belongs at the edge.

Calling tools, reading local files, and executing code securely does not require a hypercluster of GPUs. It requires high-performance CPUs connected by a high-speed private fiber backbone.

An optimized blueprint for the agentic era relies on a clear division of labor:

Centralized core: Reserves massive resources for heavy reasoning and foundation model training
Regional edge: Deploys distributed GPU clusters for localized, low-latency inference
Distributed edge: Uses high-performance CPUs to execute tools and handle real-time data orchestration

In this topology, the GPU brings the intelligence, while the edge infrastructure coordinates the action. The edge turns raw model outputs into functional agentic behavior.

Solving the problem of the “World Wide Wait”

We have seen this play out before. In the late 1990s, the early internet suffered from the “World Wide Wait” because it relied solely on centralized web servers. The industry didn't solve the problem by making centralized servers bigger; it solved it by creating content delivery networks (CDNs) like Akamai to push content to the edge.

We face the exact same inflection point today, but we are distributing compute rather than content.

Three architectural requirements for modern AI agents

This shift toward distributed compute will manifest first in high-value enterprise applications: personalization, real-time financial systems, and localized fraud detection. These are the exact use cases where latency directly impacts revenue per visitor and customer satisfaction.

When these applications fail under production loads, the conversation will shift. The important metric will no longer be which model has the highest benchmark score, but whose architecture functions in the physical world.

Three architectural recommendations drive success for modern agents:

Dynamic tiered memory: Distributed memory tiers allow for graceful KV cache offload from HBM to DRAM to NVMe and seamless resumption
Adaptive co-scheduling: Orchestrate fragmented handoffs from distributed CPU tool workers and regional GPU inference nodes to shorten end-to-end critical paths
Distributed execution: Spread LLM execution shards and tool calls across diverse, decentralized compute nodes to maximize concurrent throughput and minimize latency

Architect for distribution

Although the industry focuses heavily on token pricing, the smart utilization of physically distributed GPU and CPU assets dictates viability. As AI workloads transition to interconnected systems, success belongs to the leaders who refuse to let the limits of centralized infrastructure dictate the speed of their business. The time to architect for distribution is now.

Learn more

Jun 24, 2026

Jon Alexander

Written by

Jon Alexander

View cloud pricing

Get started with Akamai Cloud

Get started with Security

Get started with Content Delivery

Try Akamai Cloud with US$100 in credits*

Get started with Akamai Cloud

Partners

Akamai Cloud

Akamai Security and Delivery

Agentic Disconnect: The Latency Crisis Facing Modern AI Architecture

How multi-agent AI frameworks compound latency

CPU overhead and GPU idle time

The structural bottleneck

Decoupling brains from hands

Solving the problem of the “World Wide Wait”

Three architectural requirements for modern AI agents

Architect for distribution

Related Blog Posts