Akamai acquires LayerX, delivering end-to-end security and real-time AI usage control to any browser. Get details

Back Products Close

Cloud Computing

Cybersecurity

Content Delivery

See all products

Our Infrastructure

Global Services

Back Cloud Computing Close

Artificial intelligence (AI)

Akamai Inference Cloud

Storage

Object Storage

Block Storage

Backups

Databases

Managed Databases

compute

GPU

CPU

Kubernetes

App Platform

Accelerated Compute

Serverless

Akamai Functions

Networking

Cloud Firewall

DNS Manager

NodeBalancers

Private Networking

View cloud pricing

Explore plans and pricing that fit your needs — from small projects to global-scale deployments.

See pricing

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge, and AI tools built for your business.

Sign up

See all Cloud Computing

Back Cybersecurity Close

app and api security

API Security

App & API Protector

Firewall for AI

Client-Side Protection & Compliance

Bot & Agent Control

Account Protector

Content Protector

Bot Manager

AI Brand Presence

Segmentation

Akamai Guardicore Segmentation

zero trust security

Akamai Workforce Protector (formerly LayerX)

Secure Internet Access

Enterprise Application Access

Akamai MFA

Identity, Credential, and Access Management

infrastructure security

Edge DNS

Prolexic

IP Accelerator

DNS Posture Management

Brand Guardian

Get started with Security

Protect the applications that drive your business — every day, every time.

Contact Sales

See all Cybersecurity

Back Content Delivery Close

Application performance

Ion

API Acceleration

IP Accelerator

Media Delivery

Adaptive Media Delivery

Download Delivery

Edge Applications

EdgeWorkers

EdgeKV

Image & Video Manager

Media Services Live

Cloudlets

Cloud Wrapper

Global Traffic Management

Monitoring, reporting and testing

Data Stream

mPulse

CloudTest

Get started with Content Delivery

Trust the agility and scale of Akamai to help you flawlessly deliver extraordinary digital experiences.

Contact Sales

See all Content Delivery

Back Solutions Close

Cloud Computing

Serverless

Media

SaaS

Gaming

See all Cloud Computing

security

Frontier AI Security Risks

Akamai Application Protection Platform

Cybersecurity Compliance

Ransomware Protection

Secure Apps and APIs

DNS Delivery and Security

Zero Trust

DDoS Protection

Bot & Agent Control

Identity, Credential and Access Management

See all Cybersecurity

content delivery

App and API Performance

Media Delivery

See all Content Delivery

industry solutions

Media and Entertainment

Retail, Travel, and Hospitality

Financial Services

Healthcare and Life Sciences

Public Sector

Defense

Games

Online Sports Betting and iGaming

Service Providers

See all Industry Solutions

Back Pricing Close

Security and Delivery

Get started

Contact Sales

Free trials

Cloud pricing

GLOBAL PRICING

North America pricing

Europe pricing

Asia Pacific pricing

South America pricing

SPECIFIC LOCAL PRICING

Jakarta pricing

See all pricing

Cloud pricing

Try Akamai Cloud with US$100 in credits*

Deploy faster with global cloud infrastructure — no surprise bills, no lock-in, and transparent pricing across every data center.

Try now

*See Promotion Redemption Rules & Conditions

Back Developers Close

Cloud developers

Developer hub

Akamai GitHub repo

docs and guides

Cloud docs

Guides and tutorials

cloud marketplace

Developer apps

Get started with Akamai Cloud

Sign up today and unlock cloud computing, edge and AI tools built for your business.

Sign up

Back Resources Close

What’s new

Akamai blog

Events and workshops

Learning

White papers, ebooks, videos, product briefs

Customer stories

Training and certifications

Cybersecurity Research

Akamai Security Intelligence Group (SIG)

State of Internet (SOTI) reports

Partners

Partner with Akamai to innovate, scale, and grow your advantage

Channel Partners

Partner Portal

Partner Stories

Technology Partners

Technology Partners Directory

Log in

Back Log in Close

Cloud Manager
Manage your cloud computing services

Back Log in Close

Control Center
Manage your security and delivery services
- Docs
- Sales
- Support
- Under Attack ?
English
Back Language Close
- English
- Deutsch
- Español
- Français
- Italiano
- Português
- 中文
- 日本語
- 한국어

Create account

Under Attack?

Akamai Cloud

Akamai Security and Delivery

Connect with our Sales team to discuss your business needs and find the right solutions.

Contact Sales

Optimize AI Inference: Real-Time NodeBalancers Metrics for AI Workloads

Jun 03, 2026

Himanshu Gupta

Written by

Himanshu Gupta

Himanshu Gupta is a Senior Product Manager at Akamai.

Load balancers have long handled traffic distribution for traditional web applications — ensuring high availability, spreading load, and keeping services responsive. But AI inference workloads are changing the shape of traffic itself.

Inference-heavy systems generate highly concurrent, bursty, and latency-sensitive traffic. Requests are no longer tied to human interaction patterns — they’re driven by autonomous systems that are making real-time decisions. In this environment, understanding how traffic behaves at the load balancer layer becomes critical.

Introducing Akamai Cloud Pulse metrics for NodeBalancers

We are excited to announce that Akamai Cloud Pulse metrics for NodeBalancers are now in limited availability across all core regions. With this integration, cloud architects, DevOps engineers, and site reliability engineers (SREs) gain centralized, quantitative data about Akamai NodeBalancers performance.

Whether you are monitoring complex session behaviors, mitigating traffic bottlenecks, or ensuring the back-end health of your AI application, Akamai Cloud Pulse gives you the visibility required to operate seamlessly.

Observability for the agentic web

In traditional applications, traffic patterns are largely predictable; they are driven by human activity and gradual use changes. However, in the era of the agentic web, AI agents perform autonomous actions, meaning that request and traffic patterns differ.

Requests are triggered programmatically (not by users)
AI agents generate parallel API calls (fan-out behavior)
Traffic spikes are sudden and nonlinear
Latency directly impacts model performance and user experience

This makes the load balancer a critical control point. Without visibility into connection-level behavior:

Traffic bursts go undetected until latency spikes
Back-end saturation appears too late
Scaling decisions become reactive instead of proactive

If a load balancer cannot distribute traffic efficiently during a massive burst of inference requests, the entire AI application suffers from unacceptable latency, degrading the user experience and potentially interrupting critical autonomous tasks.

In this volatile environment, NodeBalancers metrics — specifically monitoring session spikes and back-end health — are critical for ensuring that latency-sensitive inference calls do not bottleneck.

Reference architecture: AI inference with distributed GPU clusters

A common architecture for AI inference systems includes:

Global routing (DNS or edge) directing users to the nearest region
Regional load balancing using Akamai NodeBalancers
Back-end pools of GPU-enabled compute nodes that are handling inference

Each region operates independently for latency and fault isolation.

Why this architecture works

Reduces inference latency by serving traffic locally
Prevents cascading failures across regions
Enables independent scaling of GPU clusters

Where NodeBalancers fits

At the regional level, NodeBalancers:

Distributes incoming TCP/UDP traffic
Routes traffic to healthy GPU nodes
Removes failing nodes via health checks

With real-time metrics, NodeBalancers becomes more than a routing layer — it becomes a signal source for system health and scaling decisions.

Key metrics for AI workload observability

Akamai Cloud Pulse provides Layer 4 (transport layer ) metrics that expose how traffic behaves before it impacts application performance:

Traffic volume (ingress/egress)
Burst detection (new sessions)
Concurrency (active sessions)
Serving capacity (active back ends)

This mapping helps teams reason about system behavior and make faster, more accurate scaling decisions.

Traffic volume (ingress/egress)

Ingress traffic rate
Egress traffic rate
TCP/UDP breakdown

These metrics show how much data is flowing through the system.

Why it matters for AI
Sudden increases often indicate fan-out behavior, where AI agents trigger multiple inference requests simultaneously.

Figure 1 shows how these metrics appear in real-time dashboards.

Burst detection (new sessions)

New sessions
New TCP sessions
New UDP sessions

These metrics act as early warning signals.

Why it matters for AI
Spikes in new sessions often precede:

Back-end overload
Increased latency
Connection drops

This makes them critical for proactive scaling.

Concurrency (active sessions)

Total active sessions
Active TCP sessions
Active UDP sessions

These represent real-time connection load on your infrastructure.

Why it matters for AI
High concurrency indicates:

Sustained inference demand
Long-lived connections
Potential pressure on back-end systems

Serving capacity (active back ends)

Total active back ends
Active TCP back ends
Active UDP back ends

These metrics show how many back-end nodes are healthy and serving traffic (Figure 2).

Why it matters for AI
A drop in active back ends is often the first sign of:

GPU node failures
Failed health checks
Resource exhaustion

Built for your workflow

We know that every engineering team has its own way of monitoring infrastructure. That’s why we’ve made NodeBalancers metrics highly flexible with:

Native dashboards: View your metrics instantly within the Akamai Cloud Pulse user interface by using customizable dashboards, date ranges, and grouping filters (Figure 3).
OpenTelemetry integrations: Export metrics using OTel Collector and expose them in Prometheus-compatible formats, enabling integration with tools like Grafana.
Programmatic access: Retrieve your metrics programmatically using the Akamai Cloud API (built on legacy Linode infrastructure) to automate your monitoring and alerting workflows.

A growing observability ecosystem

Akamai NodeBalancers joins a growing list of services supported by Akamai Cloud Pulse metrics, which currently includes Managed Database Services and Akamai Object Storage. By bringing these resources into a centralized monitoring environment, we are making it easier than ever to monitor the holistic health of your Akamai Cloud environment.

Get started today

Ready to gain deeper insights into your traffic routing? Check out the resources below to start exploring NodeBalancers metrics on Akamai Cloud Pulse:

As AI workloads continue to evolve, real-time visibility at the load balancer layer will play an increasingly critical role in maintaining performance and reliability. Log in to your Cloud Manager today to explore NodeBalancers metrics in Akamai Cloud Pulse to better understand and optimize your traffic at scale.

Get started

Jun 03, 2026

Himanshu Gupta

Written by

Himanshu Gupta

Himanshu Gupta is a Senior Product Manager at Akamai.

View cloud pricing

Get started with Akamai Cloud

Get started with Security

Get started with Content Delivery

Try Akamai Cloud with US$100 in credits*

Get started with Akamai Cloud

Partners

Akamai Cloud

Akamai Security and Delivery

Optimize AI Inference: Real-Time NodeBalancers Metrics for AI Workloads

Introducing Akamai Cloud Pulse metrics for NodeBalancers

Observability for the agentic web

Reference architecture: AI inference with distributed GPU clusters

Why this architecture works

Where NodeBalancers fits

Key metrics for AI workload observability

Traffic volume (ingress/egress)

Burst detection (new sessions)

Concurrency (active sessions)

Serving capacity (active back ends)

Built for your workflow

A growing observability ecosystem

Get started today

Related Blog Posts