Akamai acquires LayerX, delivering end-to-end security and real-time AI usage control to any browser. Get details

Back Products Close

Cloud Computing

Cybersecurity

Content Delivery

See all products

Our Infrastructure

Global Services

Back Cloud Computing Close

Artificial intelligence (AI)

Akamai Inference Cloud

Storage

Object Storage

Block Storage

Backups

Databases

Managed Databases

compute

GPU

CPU

Kubernetes

App Platform

Accelerated Compute

Serverless

Akamai Functions

Networking

Cloud Firewall

DNS Manager

NodeBalancers

Private Networking

View cloud pricing

Explore plans and pricing that fit your needs — from small projects to global-scale deployments.

See pricing

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge, and AI tools built for your business.

Sign up

See all Cloud Computing

Back Cybersecurity Close

app and api security

API Security

App & API Protector

Firewall for AI

Client-Side Protection & Compliance

Bot & Agent Control

Account Protector

Content Protector

Bot Manager

AI Brand Presence

Segmentation

Akamai Guardicore Segmentation

zero trust security

Akamai Workforce Protector (formerly LayerX)

Secure Internet Access

Enterprise Application Access

Akamai MFA

Identity, Credential, and Access Management

infrastructure security

Edge DNS

Prolexic

IP Accelerator

DNS Posture Management

Brand Guardian

Get started with Security

Protect the applications that drive your business — every day, every time.

Contact Sales

See all Cybersecurity

Back Content Delivery Close

Application performance

Ion

API Acceleration

IP Accelerator

Media Delivery

Adaptive Media Delivery

Download Delivery

Edge Applications

EdgeWorkers

EdgeKV

Image & Video Manager

Media Services Live

Cloudlets

Cloud Wrapper

Global Traffic Management

Monitoring, reporting and testing

Data Stream

mPulse

CloudTest

Get started with Content Delivery

Trust the agility and scale of Akamai to help you flawlessly deliver extraordinary digital experiences.

Contact Sales

See all Content Delivery

Back Solutions Close

Cloud Computing

Serverless

Media

SaaS

Gaming

See all Cloud Computing

security

Frontier AI Security Risks

Akamai Application Protection Platform

Cybersecurity Compliance

Ransomware Protection

Secure Apps and APIs

DNS Delivery and Security

Zero Trust

DDoS Protection

Bot & Agent Control

Identity, Credential and Access Management

See all Cybersecurity

content delivery

App and API Performance

Media Delivery

See all Content Delivery

industry solutions

Media and Entertainment

Retail, Travel, and Hospitality

Financial Services

Healthcare and Life Sciences

Public Sector

Defense

Games

Online Sports Betting and iGaming

Service Providers

See all Industry Solutions

Back Pricing Close

Security and Delivery

Get started

Contact Sales

Free trials

Cloud pricing

GLOBAL PRICING

North America pricing

Europe pricing

Asia Pacific pricing

South America pricing

SPECIFIC LOCAL PRICING

Jakarta pricing

See all pricing

Cloud pricing

Try Akamai Cloud with US$100 in credits*

Deploy faster with global cloud infrastructure — no surprise bills, no lock-in, and transparent pricing across every data center.

Try now

*See Promotion Redemption Rules & Conditions

Back Developers Close

Cloud developers

Developer hub

Akamai GitHub repo

docs and guides

Cloud docs

Guides and tutorials

cloud marketplace

Developer apps

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge and AI tools built for your business.

Sign up

Back Resources Close

What’s new

Akamai blog

Events and workshops

Learning

White papers, ebooks, videos, product briefs

Customer stories

Training and certifications

Cybersecurity Research

Akamai Security Intelligence Group (SIG)

State of Internet (SOTI) reports

Partners

Partner with Akamai to innovate, scale, and grow your advantage

Channel Partners

Partner Portal

Partner Stories

Technology Partners

Technology Partners Directory

Log in

Back Log in Close

Cloud Manager
Manage your cloud computing services

Back Log in Close

Control Center
Manage your security and delivery services
- Docs
- Sales
- Support
- Under Attack ?
English
Back Language Close
- English
- Deutsch
- Español
- Français
- Italiano
- Português
- 中文
- 日本語
- 한국어

Create account

Under Attack?

Akamai Cloud

Akamai Security and Delivery

Connect with our Sales team to discuss your business needs and find the right solutions.

Contact Sales

Most AI ROI Gets Lost in the Infrastructure, Not the Model

Jul 02, 2026

Ari Weil

Written by

Ari Weil

Ari Weil is a product strategy and go-to-market executive with experience across various management and operational disciplines. He brings more than 20 years of cross-functional enterprise management expertise, across every aspect of the product and marketing lifecycle, to his role. His key areas of focus include data security, compliance, risk management, cloud adoption, digital transformation, and modern application architectures.

AI gains rarely reach the profit and loss statement, and that is usually an architectural problem, not a model problem. Teams ship a working pilot, then watch the returns get eaten by latency, runaway compute, and security incidents that nobody budgeted for. The model performs. But the underlying infrastructure was never built to carry the workload into production.

Akamai commissioned two studies to pressure test where that breakdown happens. The State of AI Inference 2026 surveyed 200 practitioners who are running inference in production, three-quarters of them engineers and architects, and most of them deployment decision-makers. The 2026 API Security Impact Study surveyed 1,840 security professionals across six industries and 10 countries.

Together, the two reports point to one conclusion: Most teams are scaling on infrastructure designed for training and stretched to cover inference, and the gaps show up as cost, latency, and exposure.

Those gaps close when three things come together:

An architecture that adapts to live conditions
Security built into that architecture instead of bolted on
Inference placed close enough to users to hit real-time targets

None of these is novel on its own. The problem is treating them as separate projects owned by separate teams.

Start with the anatomy. Inference is the live step where a trained model takes new input and returns an output. Every inference call travels as an API call, so the model and the API are not separate concerns. They share the same request path, the same failure modes, and the same attack surface. That thread runs through both surveys.

Where centralization truly breaks

Centralized inference is not doomed. For training, batch jobs, and latency-tolerant workloads, concentrating compute in a few large regions is often the right call — and Akamai runs plenty of workloads that way. The story told by the data is narrower and more useful: Centralization breaks for a specific and growing class of workloads, and many teams have not re-architected for it.

The State of AI Inference report found that 75% of organizations have moved generative AI (GenAI) into production, yet their infrastructure has not kept pace. A growing share of those workloads now carry hard real-time latency requirements that a round trip to a distant region cannot meet, and 60% of practitioners rate proximity to users and data as “important” or “critical.”

Even so, 46% still run inference from a single centralized region. That is the mismatch — not that centralization fails everywhere, but that a rising share of production inference needs to run closer to the user than a centralized footprint allows.

The workloads that break first are predictable: fraud scoring inside a live transaction, voice agents, real-time personalization, and any agentic pipeline that chains several model calls before returning an answer. Each hop inherits the latency of the last. Concentrate that in one region, and queuing delays compound as use climbs — adding hundreds of milliseconds exactly when the workload can least afford it.

So, the real question is not centralized vs. distributed as an article of faith. It is which workloads need proximity and which do not. The teams getting this right answer that question deliberately, workload by workload, instead of defaulting everything to one region and absorbing the latency tax.

Adaptable architecture and the cost you cannot see

Weak ROI is usually a sign that a team is compensating operationally instead of architecturally. The tell is unit economics: the cost of a single inference request, measured per token or per query. If you cannot see that number, you cannot optimize it, and you cannot catch the moment it runs away from you.

Most teams cannot see it: 77% of organizations lack consistent unit-level economics that track for inference, which means the majority cannot say whether a given workload is getting cheaper or more expensive as it scales. That visibility gap is also a security gap. An unexplained spike in token consumption is often the first sign of a denial of wallet (DoW) attack, in which an attacker drives inference volume specifically to run up the bill.

When the architecture cannot adapt on its own, engineers fall back on manual intervention. They reroute traffic by hand when a region spikes. They degrade response quality to keep a strained server alive. When inference slows, 51% of teams retry the same model, which usually deepens the congestion rather than clearing it. This is triage, and it does not scale. Scaling a rigid system scales its losses.

The fix is programmatic:

Tag every inference request with model and token metadata that streams into real-time monitoring, so a runaway model surfaces before it erodes the budget.
Define fail-open and fail-closed behavior in code, so the system pivots to a cached result or a smaller local model when the primary is unresponsive, without waking an engineer at 2 AM.
64% of practitioners already rate automated traffic steering as a critical requirement, which tells you where the market knows that this is heading.

Distribution is the performance mechanism

When poor ROI shows up as high latency and low conversion, geography is usually the root cause. Real-time inference is bursty and latency sensitive, and cannot be reliably served by the public internet plus a distant data center.

The math is unforgiving. If your end-to-end budget is 250 milliseconds, and computation takes 100 ms and the API handshake takes 50 ms, you have 100 milliseconds left for data to travel. Cross a continent and that budget is gone before the model does any work.

Placing inference on distributed points of presence keeps the heavy lifting in the same region as the user, bypasses public internet congestion, and removes the speed limit that a centralized footprint imposes. The response feels native because, in network terms, it is local.

When we talk about things like AI agents, it's said to take about six interactions before a task is actually accomplished. If the user is far enough away that an interaction takes 100 milliseconds, then you've got 600 milliseconds right there. This might be okay for some applications, but there are many applications today being built at the edge that are latency sensitive. The AI system that’s integrated with your vehicle is highly critical that you get all of your data and the information transited in a much shorter time.

Distribution does one more thing that matters for security. When inference and enforcement run in the same location, the network can authenticate the user, inspect the API call, and run the model in one place. You close the loop instead of shipping the request across zones to be checked somewhere else.

Security is a property of the architecture, not a separate track

If you do not secure inference, you cannot scale it. If you treat security as a parallel workstream, you pay for it in performance. Both studies say the real failure mode is not the topology; it’s unsecured, untested, and invisible APIs.

The data is jarring: 87% of organizations experienced an API security incident in the past year, up from 76% in 2022. Among teams that had incidents, attacks on AI-linked APIs were the most commonly cited API-related security incident type in the study with 42% reporting attacks on APIs linked to AI technologies.

The average incident now costs US$700,000 a year, with the top quartile of incidents costing more than US$1.8 million. APIs linked to AI are not a future risk; they are a present one.

You can’t defend what you can’t see

Visibility is moving the wrong way while this happens. Only 23% of enterprises know which of their APIs return sensitive data, down from 40% in 2022. You cannot defend an estate you cannot see, and AI is expanding that estate faster than manual inventory can track it. Copilots spin up endpoints that never get a security review. Natural-language interfaces make data extraction through prompt injection trivial for an attacker who finds the right unguarded route.

Critically, the entry point is not the whole story. An attacker who breaches a single exposed API will try to move laterally into the components that hold real value: the feature stores that curate AI data and the repositories that hold model weights and logic.

Microsegmentation, the security best practice that isolates individual workloads, is what contains that blast radius. Most organizations have not implemented it, but the ones that have contain attacks materially faster. Legacy network-based segmentation is widely recognized to be complex, laborious, and ineffective. Modern AI-powered microsegmentation addresses this, but challenges ingrained biases. If you build that segmentation into the same network that delivers the inference, you avoid the trade-off between protection and performance.

The implementation pattern is identity-based. Organizations should:

Define access by workload identity rather than by IP address, so a specific inference service can reach a specific feature store and nothing else
Run continuous API discovery to find the abandoned test endpoints still wired to production data

Security then becomes a standing property of the fabric instead of a point-in-time audit that developers route around.

A two-track problem

The deeper issue that both studies expose is organizational, not technical. Teams build traffic and security on separate tracks, and the seam between them is where things fail.

That seam shows up as a confidence gap: 40% of C-suite leaders report advanced API testing maturity, while only 28% of the DevSecOps teams doing the work agree. Leaders believe the problem is solved, so the foundation stays under-resourced while spend flows to adjacent tools. The distance between what leadership thinks is protected and what actually is protected keeps widening.

Closing that distance requires the teams managing traffic and the teams securing traffic to operate on the same, or at least a converged, control plane. When detection and containment share that plane, an anomalous request pattern typical of prompt injection or DoW can trigger automated isolation of the affected inference endpoint, with no human in the loop. That synchronization is only possible when distribution and security are not separate systems.

What scales from here

Two capabilities separate teams that scale from teams that stall:

Portability. Mature operators are measurably less locked in because they can move workloads across managed GPUs, hosted APIs, and serverless runtimes as cost and capacity shift.
Runtime governance. Controlling both the malicious prompts coming in and the data-leaking responses going out is enforced at the network layer rather than bolted onto the application.

Put those on a single platform where distribution, security, and traffic steering share one control plane, and the trade-off between protection and performance goes away. You get one view of what is running, where it runs, and whether it is under attack.

This is the problem that Akamai has been solving for content delivery and security, and is now solving for inference in production. The same global network that places inference close to users also runs the API security, microsegmentation, and distributed denial-of-service (DDoS) protection around it. That reach is what lets one network do both jobs at once.

Fragile AI ROI is a symptom

The cause is infrastructure that has not yet brought adaptable architecture, security, and distribution onto the same foundation. The performance and protection gap is widening, but it is neither structural nor permanent. That gap closes when the teams running the traffic are the teams securing it on infrastructure built to carry inference rather than retrofitted to tolerate it.

Find out more

To go deeper, read The State of AI Inference 2026 and the API Security Impact Study.

Read report

Jul 02, 2026

Ari Weil

Written by

Ari Weil

View cloud pricing

Get started with Akamai Cloud

Get started with Security

Get started with Content Delivery

Security and Delivery

Cloud pricing

Cloud pricing

Try Akamai Cloud with US$100 in credits*

Get started with Akamai Cloud

Partners

Akamai Cloud

Akamai Security and Delivery

Most AI ROI Gets Lost in the Infrastructure, Not the Model

Where centralization truly breaks

Adaptable architecture and the cost you cannot see

Scaling a rigid system scales its losses.

Distribution is the performance mechanism

Transcript

Security is a property of the architecture, not a separate track

You can’t defend what you can’t see

A two-track problem

What scales from here

Fragile AI ROI is a symptom

Find out more

Related Blog Posts