At NVIDIA GTC, the conversation is always centered on acceleration.
More GPUs. Faster interconnects. Larger models. Shorter training cycles.
The AI factory has quickly become the engine of modern innovation. Organizations are building high-performance environments designed to train, tune, and run AI models that power everything from enterprise AI applications to real-time decision systems.
AI is no longer an isolated research experiment running inside a lab. It has become industrial infrastructure.
Intelligence is now being manufactured
Across modern data centers, massive GPU clusters and distributed compute frameworks transform data into intelligence. AI pipelines ingest data, process it through machine learning and large language models (LLMs), and generate insights that drive operations across entire organizations.
In many ways, intelligence is now being manufactured.
These environments are designed to be scalable, automated, and optimized for large-scale AI workloads. They support everything from generative AI (GenAI) applications and AI agents to enterprise analytics and predictive systems.
But behind the rapid growth of this AI ecosystem, a quieter architectural shift is taking place.
AI factories are being built faster than they are being secured
As organizations accelerate the deployment of AI infrastructure, the attack surface expands in ways that traditional cybersecurity models were never designed to handle. High-speed east-west communication among workloads, shared compute fabrics, and real-time data pipelines introduce new types of vulnerabilities across the AI lifecycle.
The AI factory is not simply a larger data center. It behaves more like industrial infrastructure, where uptime, throughput, and reliability define the success of the entire system.
To secure modern AI environments, organizations first need to understand how their AI workloads actually behave. Training pipelines, inference services, data ingestion systems, and orchestration platforms all interact across distributed infrastructure.
Without clear visibility into how these components communicate, security teams cannot confidently define policies or enforce meaningful controls. Before protection can begin, teams must be able to answer fundamental questions about how workloads interact, what normal application behavior looks like, where vulnerabilities may exist within AI pipelines, and how far an attacker could move if a system were compromised.
And that reality requires a new approach to AI security.
The most expensive flat network you have ever built
Traditional enterprise networks evolved around layers of control. Traffic entered through the perimeter, moved through defined zones, and was inspected along the way by firewalls, endpoints, and other security controls.
AI infrastructure operates very differently.
Training clusters process enormous volumes of data across distributed AI workloads. AI pipelines move information among ingestion systems, preprocessing services, training clusters, and real-time inference environments.
East-west traffic dominates these architectures.
Kubernetes orchestrates dynamic workloads across nodes while automation frameworks spin up new compute instances in seconds. GPU-accelerated clusters coordinate thousands of parallel operations during model training.
In many environments, trust boundaries are implied rather than enforced. Teams prioritize performance and scalability as they deploy new AI capabilities across their enterprise ecosystem. Over time this creates an environment where everything can communicate with almost everything else.
Organizations often do not realize what they have created. They have built the most expensive flat network in their history.
Uptime: The new metric for intelligence
When vulnerabilities emerge, whether through a compromised container, a misconfigured identity, or a vulnerable library, lateral movement can spread rapidly across the environment. Attackers increasingly exploit east-west visibility gaps to deploy ransomware or other threats that propagate across compute clusters and storage platforms before defenders can respond.
High-performance fabrics move risk just as efficiently as they move data. Acceleration without containment creates fragility.
What makes this challenge even more significant is that AI factories are quickly becoming mission-critical infrastructure. These systems may power supply chains, healthcare diagnostics, financial services, or autonomous systems.
When these environments stop running, intelligence stops flowing. In the AI era, uptime becomes the new metric for intelligence.
Moving enforcement closer to the workload
NVIDIA has fundamentally reshaped modern computing through accelerated infrastructure. Alongside NVIDIA accelerated computing, NVIDIA BlueField Platform introduces a new architectural layer known as the data processing unit, or DPU.
DPUs allow networking, storage, and cybersecurity functions to run directly within the infrastructure fabric instead of relying solely on centralized appliances or host CPUs.
This shift changes how security can operate inside modern AI infrastructure.
By operating directly in the data path, BlueField enables security controls to enforce policies at line speed. Enforcement moves closer to the workload itself, becoming part of the underlying infrastructure rather than a separate inspection layer.
This is particularly important in environments running thousands of GPUs, including NVIDIA Blackwell systems, where clusters must operate in tightly synchronized training jobs.
These environments are optimized for high-performance compute. Every clock cycle matters.
The benefits of offloading enforcement
Traditional host-based security agents introduce overhead that these systems cannot tolerate. Additional CPU consumption, latency, or jitter can disrupt synchronization across distributed workloads.
AI platforms require a different approach.
By offloading enforcement into the infrastructure itself, security operates without impacting the runtime performance of the GPUs or the training lifecycle of the AI models they support.
Intelligence before enforcement
Enforcement alone, however, is not enough.
To secure an AI factory, organizations must first understand how AI workloads, applications, and data interact across the environment.
Akamai Guardicore Segmentation provides the intelligence layer that enables this visibility.
Before security teams can apply policies, they must answer four critical questions:
Which workloads communicate with one another?
Which workflows represent normal application behavior?
Where do vulnerabilities exist within AI pipelines?
How far could an attacker move if a workload were compromised?
Akamai Guardicore Segmentation continuously maps communication relationships across hybrid environments that include data centers, cloud infrastructure, Kubernetes clusters, and edge systems.
By analyzing application behavior across the entire lifecycle of AI workloads, it creates a dynamic model of how systems interact. Policies can then be defined by using workload identity, application context, and runtime behavior rather than static network addresses.
Intelligent and resilient architecture
AI workloads are often opaque systems. Machine learning pipelines may span dozens of interconnected services across storage systems, compute clusters, and orchestration frameworks.
An agentless architecture allows Akamai Guardicore Segmentation to observe these interactions without interfering with the workloads themselves. This provides deep observability into how data moves through the AI factory.
Security teams gain visibility into abnormal behavior like unexpected data movement, suspicious communications between services, or early indicators of exploitation, such as prompt injection attempts or unauthorized access to sensitive data.
When paired with infrastructure enforcement through NVIDIA BlueField, this architecture becomes both intelligent and resilient.
Akamai Guardicore Segmentation defines how workloads should communicate. NVIDIA BlueField enforces those policies at line speed within the infrastructure.
Containment in real AI environments
Security architecture only matters if it works in real operational environments.
Consider a modern AI training pipeline: Data ingestion systems, preprocessing services, distributed training nodes, and storage systems all interact to build and refine models. Without segmentation, a vulnerability in one stage of the pipeline could allow attackers to move laterally across the entire environment.
Identity-based segmentation defines exactly which workloads are allowed to communicate.
A preprocessing node may access a dataset and training service but nothing beyond that scope. If compromised, the blast radius is contained within a small segment of the environment.
The objective is not only to stop an attacker. It is to ensure the rest of the AI factory continues running uninterrupted.
Challenge: Separate research environments from production inference systems
Another common challenge is separating research environments from production inference systems. Research teams experiment with new AI models and GenAI capabilities, while production systems support real-time applications and enterprise AI workflows.
Segmentation allows these environments to coexist safely. Explicit access controls ensure that experimental workloads cannot access production data or systems.
Kubernetes-based AI workloads introduce additional complexity. Pods scale dynamically, services evolve, and automation continuously modifies infrastructure.
Identity-driven policies ensure security controls remain consistent even as infrastructure changes.
Containment becomes adaptive rather than static.
Why yesterday’s security model struggles in AI environments
Many traditional cybersecurity tools were designed for earlier generations of enterprise infrastructure; they struggle to keep pace with the behavior of modern AI workloads.
Perimeter firewalls still play a role, but they cannot see the east-west traffic moving between GPU nodes and distributed compute systems.
Detection tools may identify anomalies, but detection alone does not stop lateral movement across AI pipelines.
Traditional host-based security tools also introduce operational concerns.
In high-performance environments, these tools behave like a speed bump in a Formula 1 race. They consume CPU cycles, introduce latency, and disrupt synchronization across workloads.
For years organizations accepted a trade-off between fast AI and secure AI. As AI becomes foundational infrastructure, that trade-off is no longer acceptable.
Security must operate at the same speed as accelerated computing.
Zero Trust for AI, built into the architecture
The concept of Zero Trust has become widely discussed across the cybersecurity industry. In AI environments, Zero Trust is most effective when implemented as an architectural principle rather than an additional product layer.
Workloads must have explicit identities.
Access must follow least privilege.
Communication paths must be verified continuously.
Trust must never be assumed simply because something exists inside the network.
When segmentation is driven by workload identity and validated through real-time telemetry, Zero Trust becomes a natural outcome of the architecture.
Instead of slowing systems down, it ensures the AI factory continues to operate even when threats appear within the environment.
The CISO perspective on AI infrastructure risk
For security leaders, AI factories represent a new concentration of risk. Massive compute clusters, proprietary AI models, and sensitive datasets are consolidated within a single environment.
Protecting that infrastructure is not simply about preventing breaches. It is about maintaining the continuous availability of intelligence.
AI systems power customer applications, financial platforms, supply chains, and critical decision systems. If something goes wrong inside the AI factory, the impact can ripple across the entire organization.
The key question CISOs must answer is simple: If a system becomes compromised, how far can an attacker move?
Containment determines impact.
When segmentation limits lateral movement, and enforcement operates at infrastructure speed, incidents remain isolated rather than cascading across the environment.
Response becomes faster. Investigations become simpler. Systems remain operational. And security becomes part of operational resilience.
Building resilient AI factories
Data center modernization is often discussed in terms of compute performance and network bandwidth. Those elements are important, but they are only part of the story. Modern AI infrastructure also requires a security architecture that evolves alongside accelerated computing.
The AI factory represents the next evolution of digital infrastructure. Much like power plants or manufacturing lines, these environments must operate continuously to deliver value.
Security must support that mission.
By combining intelligent segmentation with infrastructure-level enforcement, organizations can protect AI workloads without slowing the systems that power them.
In the era of accelerated intelligence, resilience becomes just as important as performance.
A new chapter in the evolution of computing
The rise of the AI factory marks a new chapter in the evolution of computing. These environments are no longer experimental clusters running inside research labs. They are becoming the systems that power real economies, real services, and real decisions.
As organizations invest in accelerated computing, security must evolve alongside it.
The goal is not simply to stop attacks; it is to ensure that intelligence continues to flow, models continue to train, and the systems organizations depend on remain operational.
In the age of accelerated intelligence, the most successful AI factories will not simply be the fastest. They will be the ones designed to stay running.
Tags