Akamai to acquire LayerX to enforce AI usage control on any browser. Get details

Back Products Close

Cloud Computing

Cybersecurity

Content Delivery

See all products

Our Infrastructure

Global Services

Back Cloud Computing Close

Artificial intelligence (AI)

Akamai Inference Cloud

Storage

Object Storage

Block Storage

Backups

Databases

Managed Databases

compute

GPU

CPU

Kubernetes

App Platform

Accelerated Compute

Serverless

Akamai Functions

Networking

Cloud Firewall

DNS Manager

NodeBalancers

Private Networking

View cloud pricing

Explore plans and pricing that fit your needs — from small projects to global-scale deployments.

See pricing

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge, and AI tools built for your business.

Sign up

See all Cloud Computing

Back Cybersecurity Close

app and api security

API Security

App & API Protector

Firewall for AI

Client-Side Protection & Compliance

Bot & Agent Control

Account Protector

Content Protector

Bot Manager

AI Brand Presence

Segmentation

Akamai Guardicore Segmentation

zero trust security

Secure Internet Access

Enterprise Application Access

Akamai MFA

Identity, Credential, and Access Management

infrastructure security

Edge DNS

Prolexic

IP Accelerator

DNS Posture Management

Brand Guardian

Get started with Security

Protect the applications that drive your business — every day, every time.

Contact Sales

See all Cybersecurity

Back Content Delivery Close

Application performance

Ion

API Acceleration

IP Accelerator

Media Delivery

Adaptive Media Delivery

Download Delivery

Edge Applications

EdgeWorkers

EdgeKV

Image & Video Manager

Media Services Live

Cloudlets

Cloud Wrapper

Global Traffic Management

Monitoring, reporting and testing

Data Stream

mPulse

CloudTest

Get started with Content Delivery

Trust the agility and scale of Akamai to help you flawlessly deliver extraordinary digital experiences.

Contact Sales

See all Content Delivery

Back Solutions Close

Cloud Computing

Serverless

Media

SaaS

Gaming

See all Cloud Computing

security

Frontier AI Security Risks

Akamai Application Protection Platform

Cybersecurity Compliance

Ransomware Protection

Secure Apps and APIs

DNS Delivery and Security

Zero Trust

DDoS Protection

Bot & Agent Control

Identity, Credential and Access Management

See all Cybersecurity

content delivery

App and API Performance

Media Delivery

See all Content Delivery

industry solutions

Media and Entertainment

Retail, Travel, and Hospitality

Financial Services

Healthcare and Life Sciences

Public Sector

Defense

Games

Online Sports Betting and iGaming

Service Providers

See all Industry Solutions

Back Pricing Close

Security and Delivery

Get started

Contact Sales

Free trials

Cloud pricing

GLOBAL PRICING

North America pricing

Europe pricing

Asia Pacific pricing

South America pricing

SPECIFIC LOCAL PRICING

Jakarta pricing

See all pricing

Cloud pricing

Try Akamai Cloud with US$100 in credits*

Deploy faster with global cloud infrastructure — no surprise bills, no lock-in, and transparent pricing across every data center.

Try now

*See Promotion Redemption Rules & Conditions

Back Developers Close

Cloud developers

Developer hub

Akamai GitHub repo

docs and guides

Cloud docs

Guides and tutorials

cloud marketplace

Developer apps

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge and AI tools built for your business.

Sign up

Back Resources Close

What’s new

Akamai blog

Events and workshops

Learning

White papers, ebooks, videos, product briefs

Customer stories

Training and certifications

Cybersecurity Research

Akamai Security Intelligence Group (SIG)

State of Internet (SOTI) reports

Partners

Partner with Akamai to innovate, scale, and grow your advantage

Channel Partners

Partner Portal

Partner Stories

Technology Partners

Technology Partners Directory

Log in

Back Log in Close

Cloud Manager
Manage your cloud computing services

Back Log in Close

Control Center
Manage your security and delivery services
- Docs
- Sales
- Support
- Under Attack ?
English
Back Language Close
- English
- Deutsch
- Español
- Français
- Italiano
- Português
- 中文
- 日本語
- 한국어

Create account

Under Attack?

Akamai Cloud

Akamai Security and Delivery

Connect with our Sales team to discuss your business needs and find the right solutions.

Contact Sales

AI Inference Is Swallowing the Cloud

Jul 01, 2026

Robert Blumofe

Written by

Robert Blumofe

Dr. Robert Blumofe is Executive Vice President and Chief Technology Officer at Akamai. As CTO, he guides Akamai’s technology strategy, works with Akamai’s largest customers, and convenes technology leaders within the company to catalyze innovation. Previously, he led Akamai’s Platform organization and Enterprise Division, where he was responsible for developing and operating the distributed system underlying all Akamai products and services, as well as creating solutions for major enterprises to secure and improve performance. He holds a Ph.D. in Computer Science from Massachusetts Institute of Technology and a Bachelor of Science from Brown University.

More than a decade after software began eating the world, AI is now eating software. AI is changing the very nature of software: its role in the human ecosystem and how it serves humankind. The consequences of this change are profound and reach into nearly every aspect of human endeavor. The focus here, though, is on the consequences for infrastructure — the cloud, in particular.

As AI devours software, is the cloud the final course in that meal?

The short answer is yes, but not because AI will eliminate the cloud. Rather, AI will dramatically and irrevocably alter it. Specifically, it is generative AI (GenAI) — encompassing large language models (LLMs), image- and video-generation models, and their orchestration within autonomous agents — that is the most consequential form of AI for cloud infrastructure.

We all know that GenAI is ravenous: ravenous for power, ravenous for computation, and ravenous for storage. So as AI becomes part of everything we do: How do we feed the beast?

The current dominant approach is essentially brute force: Expend massive amounts of capital to build out massive, centralized data centers that host massive AI models. This approach will fail. It is economically unsustainable. It is ecologically disastrous. Most critically, it is architecturally incapable of scaling to meet the looming demand.

We refuse to accept a future defined by a new “World Wide Wait.” The industry must move beyond the illusion of hypercentralized infrastructure. We must be smart about matching infrastructure to use cases, tailoring the technology, and meeting agents where they actually live. The cloud must adapt, decentralize, and evolve … or it will be consumed.

Fortunately, there is a better way to answer this question. Not with brute force. With intelligence.

History repeating itself

In 1998, the World Wide Web was eating the internet. Before the web, the internet was mostly email, telnet (a remote access protocol), FTP(a file transfer protocol), and Usenet (a message board organized by topics, not unlike today’s forums and subreddits). There was no streaming media, no live video conferencing, and no online shopping, travel planning, banking, or healthcare.

As such, it was used almost exclusively by academics, computer scientists, and military and government personnel. The web changed all that. Though some of the web capabilities that we now use every day took longer to emerge, by 1998 (just 7 years after the web became available) websites were sprouting like weeds and everyone was taking notice.

The architectural flaw of the early web

That wild success, however, led to enormous demand. This, in turn, led to some widely publicized failures, which led to the joking refrain: WWW should stand for “World Wide Wait.” Some pundits even predicted that the web was going to kill the internet.

The problem was that websites were built and deployed in a centralized hub-and-spoke model where every user request had to route all the way back to a central origin server. The brute-force solution — building out massive amounts of centralized infrastructure -– would be wildly expensive. And would it even work?

That’s when Professor Tom Leighton and his graduate student, Danny Lewin, stepped up and founded Akamai Technologies. They proposed an alternative to the brute-force solution: an intelligent solution using math, algorithms, and distributed systems.

Their solution, now known as a content delivery network (CDN), distributes storage and computation to the edge of the internet so that web applications can be delivered from locations that are near the people, devices, and things that are using those applications.

About a decade later, Akamai successfully brought this same intelligent approach to the problem of cybersecurity. With the clarity of hindsight, we can safely assert that brute force could never have solved these problems.

Fast forward to today. We see AI transforming the web every bit as profoundly (if not more so) as the web transformed the internet. So how can brute force be the answer? How can massive investment in massive, centralized data centers hosting massive models be the answer?

Geography doesn’t care about capital, and physics doesn’t care about dollars. If every invocation of a GenAI model has to traverse thousands of miles through a network followed by trillions of weights through an AI model, the result will be even worse than the “World Wide Wait”. We will slide inevitably into the very structural gridlock we've warned against: the “Large Language Molasses.”

We don’t have to. Not if we’re smart.

Generative AI’s true superpower: Tools

For most of us, our first interaction with GenAI came in the form of a chatbot like ChatGPT. This was little more than a simple chat interface on top of an LLM, but it captured the popular imagination like very few other innovations have throughout history. That was just a few years ago, and GenAI has already gone far beyond the chatbot. We’ve swiftly moved from AI being the application to AI powering the application.

When AI is part of an application, we get agents. Agents can write code; agents can conduct research; agents can summarize email and draft replies; agents can find a great restaurant and make dinner reservations; agents can help you plan a trip, make the hotel, airline, and rental-car reservations, and give you turn-by-turn directions from the rental-car center to the hotel; and agents can help you shop for a great shirt that will go with your favorite pants (and make you look a bit younger for your upcoming college reunion).

Pretty much everything we do today by navigating web pages, filling out forms, and clicking links can and will be replaced by agents.

Deconstructing the anatomy of an agent

But an AI agent is not just a super powerful LLM with advanced LLM-reasoning capabilities. An agent is a system with many components, one of which is an LLM. An LLM typically plays the central role, managing the natural-language communication and making the decisions that guide the conversation and determine the sequence of steps. But an agent may have many other AI models.

For example, a fashion-consultant agent may use image or video generation to show you how a garment will look on you. And an agent will have tools: non-AI components that allow it to search the web, read and write files, run programs on the command line, and invoke APIs. For example, that fashion-consultant agent might use a tool to access information about your past purchases and preferences so it can show you a shirt in your favorite color paired with those pants you recently bought.

Architectural intelligence dictates a clear rule: When building an agent, put as much functionality as possible into the non-AI tools. We must reject the impulse to use AI for everything. As powerful as today’s AI is, it can’t do everything, and even for the things it can do, it’s almost always wildly inefficient.

Consider arithmetic. It’s actually quite amazing that LLMs can often produce correct answers to arithmetic problems, but they sometimes get the wrong answer. Moreover, even when they do get the right answer, they’re expending many orders of magnitude more compute and energy than a calculator.

As in other examples, we do not need a trillion-parameter model to scan text when a regular expression works perfectly, nor do we need deep learning to map a route when a shortest-path algorithm is already optimized for the job.

This is the true superpower of modern GenAI: LLMs can invoke tools. If not for that, the industry would still be blissfully unaware of the word “agentic.” We must give the LLM the tools it needs and let it do only what it is uniquely good at. We must stop expending megawatts to solve problems that can be solved with milliwatts.

The great compute misallocation

From megawatts to milliwatts, when it comes to infrastructure, we need to be smart about matching infrastructure to use cases. The recent AI mania has fueled massive investment in massive, centralized data centers hosting massive models on massive, dense GPU clusters.

Recent cloud investments have been dominated by GPUs, and we’ve even seen the emergence of a new kind of cloud, the neo-cloud, focused almost exclusively on GPU infrastructure for AI. But not everything requires a dense cluster of GPUs.

Dense GPU clusters made sense and still make sense when the primary AI use case is training, especially the pre-training of foundation LLMs. Of course, this was and continues to be a prerequisite for everything that’s happening in GenAI today.

But what about actually using the models, otherwise known as inference? That is the way we realize AI’s real-world value. And for many inference use cases, dense, centralized GPU clusters are an architectural mismatch.

Indeed, when doing inference with a massive GenAI model, a dense GPU cluster is likely the most efficient and maybe the only way to get acceptable levels of performance. But not everything requires a massive model. Many workloads function best with specialized models that are significantly smaller than "Ask me anything” models.

An AI application running inside a car to manage climate control and entertainment systems, for instance, does not need to understand theoretical physics. Just as an AI agent running in the cloud whose sole job is to help a patient schedule a dentist appointment has no reason to compose sonnets or summarize the plot of every episode of “M*A*S*H.”

Using a trillion-parameter model for these narrow tasks is profoundly wasteful. It is far smarter to deploy smaller, specialized models that run efficiently on less expensive GPUs, or even standard CPUs.

Why the agentic footprint is inherently hybrid

Furthermore, because an agent is an LLM integrated with tools, its infrastructure footprint is inherently hybrid. While the central LLM may require specific acceleration, the tools it invokes are not numerically dense and have no need for a GPU.

Consider, again, the fashion-consultant agent. It may leverage advanced image or video generation to demonstrate how a garment looks on a person, but it relies heavily on traditional tools to pull past purchase history, access user preferences, and query inventory databases. These tools run on CPUs, demand significant storage, and rely on constant communication with remote services.

An AI agent's infrastructure needs cannot be reduced to a single, one-size-fits-all infrastructure type. They are fundamentally hybrid: a dynamic combination of GPU, CPU, storage, and communication.

The cloud has always triumphed because it was built to be a combination of flexible, hybrid compute and networking. Just as we do not use AI for every task, we must stop forcing GPUs onto every workload. The hardware must bend to the use case, not the other way around.

The myth of Agentlandia

It is tempting to imagine AI agents existing in their own insulated realm, chattering among themselves and getting things done with little direct interaction with us humans. In this myth, the agents are massive LLMs, running on massive GPU clusters, in a handful of hypercentralized data centers. But this myth does not survive scrutiny.

The reality is that today's most popular agent frameworks — like OpenClaw, Hermes Agent, Claude Cowork, and Claude Code — are commonly installed and executed directly on the desktop. While they may query foundation LLMs running in centralized clouds, their other components run on the desktop.

Moreover, it’s become increasingly popular to use alternative LLMs that also run on the desktop. These agents can also be run in the cloud, any cloud, with a strong preference for a cloud location that is nearby to the user.

A parallel trend is playing out in the enterprise.

Companies are building and deploying agents for use by their employees and customers, often with the help of agent frameworks such as LangChain, Pydantic AI, n8n, and CrewAI. These agents can also be run in the cloud, any cloud, of their choosing and may be configured to use massive, centrally hosted LLMs from foundation-model providers. But again, it’s becoming increasingly popular to use alternative LLMs, specialized for their use case and running in their cloud of choice.

There is no isolated “Agentlandia.” AI agents will run anywhere and everywhere: on edge devices, in cars, on desktops, and across highly distributed cloud environments chosen for proximity to the user.

Mapping the new distributed communication web

To get anything done, these agents must interact constantly with a distributed web of services. They search remote websites, query local databases, and invoke remote APIs. A single user request may trigger a flurry of back-and-forth communication between the LLM, local tools, and remote services.

Furthermore, agents must interact with other agents across different cloud ecosystems, all while maintaining rich, multimodal, conversational loops with human beings. The complete picture is a dizzying, highly interconnected, heavily utilized, and massively distributed web of communication channels spanning the globe.

The industry challenge: Meeting agents where they live

Infrastructure must meet agents where they live, and that means everywhere. We cannot force the agentic ecosystem into centralized clusters. We must support it with distributed infrastructure in the exact locations where the models run, where the tools execute, and where the users interact. These locations must provide low-latency, high-bandwidth connections to keep the ecosystem moving.

This does not imply that centralized data centers are going away: Dense GPU clusters remain the correct solution for heavy training workloads and specific massive inference tasks. Rather, centralized hubs must be augmented with highly distributed edge infrastructure. The future of the cloud is a flexible continuum stretching from the core to the edge.

The brute-force build-out of hypercentralized megastructures simply does not align with the operational reality of modern GenAI. It is wasteful, it will not scale, and it will inevitably stall progress in its “Large Language Molasses.”

Our challenge to the industry is to build a cloud for the AI era that is smarter, more adaptable, and fiercely flexible. We must demand highly flexible hybrid combinations of CPU, GPU, storage, and connectivity, deployed across a fluid continuum of locations. The cloud must evolve to support the distributed reality of AI — or risk being entirely consumed by it.

Learn more

Jul 01, 2026

Robert Blumofe

Written by

Robert Blumofe

View cloud pricing

Get started with Akamai Cloud

Get started with Security

Get started with Content Delivery

Try Akamai Cloud with US$100 in credits*

Get started with Akamai Cloud

Partners

Akamai Cloud

Akamai Security and Delivery

AI Inference Is Swallowing the Cloud

History repeating itself

The architectural flaw of the early web

Generative AI’s true superpower: Tools

As AI devours software, is the cloud the final course in that meal?

Deconstructing the anatomy of an agent

The great compute misallocation

Why the agentic footprint is inherently hybrid

The myth of Agentlandia

Mapping the new distributed communication web

The industry challenge: Meeting agents where they live

Related Blog Posts