How Harmonic Proved High-Performance AI Inference on Akamai GPUs

Mar 05, 2026

Written by

Danielle Cook has been a driving force in the cloud native industry since 2016, helping organizations adopt enterprise-ready technologies while communicating their business value. She co-authored and maintains the CNCF Cloud Native Maturity Model, co-chairs the CNCF Cartografos Working Group, and co-authored Admiral Bash’s Island Adventure. A CNCF Ambassador and founder of KubeCrash, a virtual bi-annual meeting, she champions open source and community-driven activism.

Running AI inference at scale is no longer just about model size. Today’s questions include: How efficiently can you run that model, how fast can you process real-world data, and how much infrastructure do you need to do it?

During the private beta testing of NVIDIA RTX PRO™ 6000 Blackwell GPUs on Akamai Cloud, Harmonic put these questions to the test with a demanding, image-based AI workload built around a 3-billion–parameter model.

The results were clear: Harmonic achieved high performance, efficient resource use, and the ability to push model optimization techniques without sacrificing accuracy.

“During the private beta, the NVIDIA RTX PRO 6000 Blackwell GPUs on Akamai Cloud enabled us to run our AI image workloads with accuracy, speed, and efficiency. We were able to process large volumes of images quickly while optimizing our models for performance and maintaining a very low false detection rate. The results gave us real confidence in scaling these workloads in production.”

— Moore Macauley, CTO, Video Business, Harmonic

The real test: Accuracy, efficiency, and speed

Harmonic’s workload was not a synthetic benchmark. It was a production-grade image processing pipeline where detection quality mattered.

The goals were straightforward:

Maintain a very low false-detection rate with a 3B-parameter model
Optimize GPU use and memory footprint
Maximize throughput for large image batches
Evaluate the impact of model quantization on performance and accuracy

What Harmonic found was a combination of low memory footprint, high Tensor use, and processing speed at scale that is difficult to achieve on traditional cloud GPU infrastructure.

Low memory footprint and high Tensor use

Despite the size of the model, Harmonic observed:

GPU memory use below 10%
Tensor use consistently in the 70% to 80% range

This is a strong signal that the GPUs were not bottlenecked by memory constraints, and that the workload was able to fully leverage the Tensor cores for high-throughput inference. In practical terms, this means Harmonic could run sophisticated models without needing to overprovision infrastructure just to accommodate memory overhead.

Processing speed at scale

Harmonic processed 300 images in under one minute

This level of throughput demonstrates how Blackwell GPUs on Akamai Cloud can support real-time or near-real-time AI processing at scale, making them well suited for AI-driven production systems where both low-latency and scalability are critical for success.

Quantization without compromise

One of the most telling findings from Harmonic’s testing regarded model optimization.

Harmonic evaluated 4-bit integer quantization against traditional float16 precision and observed that for test workloads there was:

No significant loss in detection performance
Slight gains in memory efficiency
Improved processing speed

This is important because quantization is often viewed as a trade-off between efficiency and accuracy. Harmonic’s results show that on Blackwell GPUs advanced quantization techniques can improve performance characteristics without degrading the quality of the results.

That opens the door for teams to:

Run larger models in smaller footprints
Reduce infrastructure costs
Increase throughput without sacrificing results

Why this matters for AI workloads at the edge and in the cloud

Harmonic’s testing highlights a broader pattern: Modern AI workloads need infrastructure that is built for inference efficiency, not just raw compute. This is achieved by combining:

High Tensor performance
Efficient memory use
Support for advanced model optimization
Fast processing at scale

Blackwell GPUs on Akamai Cloud provide a foundation for AI systems that must operate continuously, process large volumes of data, and maintain high accuracy. This is particularly relevant for AI workloads that run closer to users, devices, or data sources, where performance, efficiency, and cost all matter.

From private beta to production confidence

For Harmonic, the private beta testing was an opportunity to validate that their AI image processing workloads could run efficiently, accurately, and at high speed on Akamai’s GPU infrastructure.

The results gave them confidence that they could:

Scale inference without scaling infrastructure linearly
Optimize models aggressively by using quantization
Maintain high detection quality while increasing throughput

These are the exact characteristics that teams look for when moving AI from experimentation to production.

Find out more

If running AI workloads efficiently at scale is core to your business, you can learn more about how NVIDIA Blackwell GPUs on Akamai Cloud can support your next generation of inference.

Learn more

Mar 05, 2026

Danielle Cook

Written by

Danielle Cook