Akamai acquires LayerX, delivering end-to-end security and real-time AI usage control to any browser. Get details

Back Products Close

Cloud Computing

Cybersecurity

Content Delivery

See all products

Our Infrastructure

Global Services

Back Cloud Computing Close

Artificial intelligence (AI)

Akamai Inference Cloud

Storage

Object Storage

Block Storage

Backups

Databases

Managed Databases

compute

GPU

CPU

Kubernetes

App Platform

Accelerated Compute

Serverless

Akamai Functions

Networking

Cloud Firewall

DNS Manager

NodeBalancers

Private Networking

View cloud pricing

Explore plans and pricing that fit your needs — from small projects to global-scale deployments.

See pricing

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge, and AI tools built for your business.

Sign up

See all Cloud Computing

Back Cybersecurity Close

app and api security

API Security

App & API Protector

Firewall for AI

Client-Side Protection & Compliance

Bot & Agent Control

Account Protector

Content Protector

Bot Manager

AI Brand Presence

Segmentation

Akamai Guardicore Segmentation

zero trust security

Akamai Workforce Protector (formerly LayerX)

Secure Internet Access

Enterprise Application Access

Akamai MFA

Identity, Credential, and Access Management

infrastructure security

Edge DNS

Prolexic

IP Accelerator

DNS Posture Management

Brand Guardian

Get started with Security

Protect the applications that drive your business — every day, every time.

Contact Sales

See all Cybersecurity

Back Content Delivery Close

Application performance

Ion

API Acceleration

IP Accelerator

Media Delivery

Adaptive Media Delivery

Download Delivery

Edge Applications

EdgeWorkers

EdgeKV

Image & Video Manager

Media Services Live

Cloudlets

Cloud Wrapper

Global Traffic Management

Monitoring, reporting and testing

Data Stream

mPulse

CloudTest

Get started with Content Delivery

Trust the agility and scale of Akamai to help you flawlessly deliver extraordinary digital experiences.

Contact Sales

See all Content Delivery

Back Solutions Close

Cloud Computing

Serverless

Media

SaaS

Gaming

See all Cloud Computing

security

Frontier AI Security Risks

Akamai Application Protection Platform

Cybersecurity Compliance

Ransomware Protection

Secure Apps and APIs

DNS Delivery and Security

Zero Trust

DDoS Protection

Bot & Agent Control

Identity, Credential and Access Management

See all Cybersecurity

content delivery

App and API Performance

Media Delivery

See all Content Delivery

industry solutions

Media and Entertainment

Retail, Travel, and Hospitality

Financial Services

Healthcare and Life Sciences

Public Sector

Defense

Games

Online Sports Betting and iGaming

Service Providers

See all Industry Solutions

Back Pricing Close

Security and Delivery

Get started

Contact Sales

Free trials

Cloud pricing

GLOBAL PRICING

North America pricing

Europe pricing

Asia Pacific pricing

South America pricing

SPECIFIC LOCAL PRICING

Jakarta pricing

See all pricing

Cloud pricing

Try Akamai Cloud with US$100 in credits*

Deploy faster with global cloud infrastructure — no surprise bills, no lock-in, and transparent pricing across every data center.

Try now

*See Promotion Redemption Rules & Conditions

Back Developers Close

Cloud developers

Developer hub

Akamai GitHub repo

docs and guides

Cloud docs

Guides and tutorials

cloud marketplace

Developer apps

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge and AI tools built for your business.

Sign up

Back Resources Close

What’s new

Akamai blog

Events and workshops

Learning

White papers, ebooks, videos, product briefs

Customer stories

Training and certifications

Cybersecurity Research

Akamai Security Intelligence Group (SIG)

State of Internet (SOTI) reports

Partners

Partner with Akamai to innovate, scale, and grow your advantage

Channel Partners

Partner Portal

Partner Stories

Technology Partners

Technology Partners Directory

Log in

Back Log in Close

Cloud Manager
Manage your cloud computing services

Back Log in Close

Control Center
Manage your security and delivery services
- Docs
- Sales
- Support
- Under Attack ?
English
Back Language Close
- English
- Deutsch
- Español
- Français
- Italiano
- Português
- 中文
- 日本語
- 한국어

Create account

Under Attack?

Akamai Cloud

Akamai Security and Delivery

Connect with our Sales team to discuss your business needs and find the right solutions.

Contact Sales

Your AI Cost Model Stops at the Token Price. The Bill Doesn't.

Jun 25, 2026

Ari Weil

Written by

Ari Weil

Ari Weil is a product strategy and go-to-market executive with experience across various management and operational disciplines. He brings more than 20 years of cross-functional enterprise management expertise, across every aspect of the product and marketing lifecycle, to his role. His key areas of focus include data security, compliance, risk management, cloud adoption, digital transformation, and modern application architectures.

AI spending has crossed a line that should change how you budget for it. Inference, not training, is now the dominant cost center, running close to 80% of total AI spend for any team with real production traffic. Training is the headline. Inference is the operating expense, and it shows up every time a model serves a request.

The hidden costs beyond the token price

Most enterprises model that expense as a compute problem. The metric of record is dollars per million tokens, and for good reason: GPU time is the majority of the bill, driven by model size, token volume, and how well you use the hardware. That math is correct as far as it goes; the problem is where it stops.

The line items that teams consistently miss sit one layer down: in data movement and latency. Egress and cross-region transfer are the clearest examples. Enterprises routinely underforecast these costs by three to five times, because the token price is printed on the pricing page and the network charges are not.

Data movement does not rival compute today, and I am not going to tell you it does. But it is the fastest-growing category in the bill, and it behaves differently than compute. It scales with the shape of your architecture, not just with the size of your model.

How agentic and RAG systems reshape the bill

That distinction matters more every quarter, because the workloads are changing shape. A first-generation AI deployment was one prompt, one model call, and one answer. An agentic or retrievalaugmented generation (RAG) system looks nothing like that. A single user request fans out: a vector search for context, several model calls as the agent reasons and revises, tool and API calls to external systems, and then a response.

Every hop that crosses a zone, a region, or a cloud boundary is a charge, and a place where latency compounds. The token price stays roughly flat. However, the number of hops behind each user action increases by an order of magnitude.

High availability makes the same point from a different direction. Run inference in a single region and you accept the latency penalty for distant users. Replicate across regions to fix that, and you duplicate the most expensive thing you own. Cross-region replication for latency doubles the GPU cost of the capacity that you replicate. You are now paying twice for compute to paper over a problem that is fundamentally about distance.

This is the part that the centralized model handles poorly. Concentrating inference in a handful of regions made sense when AI was a batch job that you ran overnight. It does not fit workloads that are interactive, high fan-out, and latency sensitive.

A fraud system intercepting a transaction, a voice agent that needs to sound human, a retail experience personalizing in real time: None of these can absorb a round trip to a distant region. And none of them should pay a premium to move every byte of that interaction across the network.

Decoupling inference from centralized clouds

The answer is to stop treating inference as something that lives in three places and start running it where the workload actually is. That is the premise behind Akamai Inference Cloud, which we launched in October 2025 on NVIDIA Blackwell infrastructure: Extend inference from core data centers out to the edge, closer to users and devices, and across the distributed footprint Akamai already operates inside networks worldwide.

Put the model near the request and two costs fall at once. Latency drops because the round trip is shorter. The last-mile transfer cost drops because the data no longer traverses the public internet from a distant region to reach the user.

Evaluating an AI vendor: Three critical questions

For enterprises, the practical move is no longer choosing a vendor but fixing the model you use to evaluate one. Three questions are worth posing to any AI infrastructure provider, including:

What does it cost to move data inside the environment?
What does it cost to leave?
Where does the inference actually run?

What does it cost to move data?

The first question to consider is “What does it cost to move data inside the environment?” Charging for traffic between zones in your own deployment turns every distributed-agent design into a variable, hard-to-forecast bill. That cost should be flat and predictable, not metered at the boundary.

What does it cost to leave?

Second, consider asking “What does it cost to leave?” Hyperscaler egress costs run roughly 8 to 12 cents per gigabyte, and that number is the mechanism that keeps data where it lands.

A neutral architecture lets you store data in one place, run retrieval in another, and serve inference where it makes sense, without an exit penalty engineered to prevent exactly that. For example, Akamai prices transfer on a flat, pooled model that runs close to an order of magnitude lower per gigabyte.

By switching from standard hyperscaler setups to an open, distributed cloud platform, enterprises running interactive inference workloads can reduce structural infrastructure costs by up to 86%.

Where does the inference actually run?

Finally, consider the question “Where does the inference actually run?” If the answer is always a central region, you’ve accepted that both the latency and the transfer costs are fixed. They are not fixed. They are architectural choices.

None of this requires believing that the centralized cloud is going away. It is not. Training will keep concentrating in large clusters, and plenty of inference will run happily in a region near the user.

The argument is narrower and more useful than that. As AI workloads get more interactive and more chatty, the cost of moving data and the cost of distance stop being rounding errors and start being design decisions.

The underlying architecture decides the bill

The token price is what everyone quotes, but the underlying architecture is what decides the bill. The teams that win the next phase of AI will be the ones who modeled the whole cost, and built for where inference actually needs to happen.

Book your AI consultation

Jun 25, 2026

Ari Weil

Written by

Ari Weil

View cloud pricing

Get started with Akamai Cloud

Get started with Security

Get started with Content Delivery

Try Akamai Cloud with US$100 in credits*

Get started with Akamai Cloud

Partners

Akamai Cloud

Akamai Security and Delivery

Your AI Cost Model Stops at the Token Price. The Bill Doesn't.

The hidden costs beyond the token price

How agentic and RAG systems reshape the bill

Decoupling inference from centralized clouds

Evaluating an AI vendor: Three critical questions

What does it cost to move data?

What does it cost to leave?

Where does the inference actually run?

The underlying architecture decides the bill

Related Blog Posts