How Harmonic Proved High-Performance AI Inference on Akamai GPUs

Danielle Cook author image

Mar 05, 2026

Danielle Cook

Danielle Cook author image

Written by

Danielle Cook

Danielle Cook has been a driving force in the cloud native industry since 2016, helping organizations adopt enterprise-ready technologies while communicating their business value. She co-authored and maintains the CNCF Cloud Native Maturity Model, co-chairs the CNCF Cartografos Working Group, and co-authored Admiral Bash’s Island Adventure. A CNCF Ambassador and founder of KubeCrash, a virtual bi-annual meeting, she champions open source and community-driven activism.

Share

Running AI inference at scale is no longer just about model size. Today’s questions include: How efficiently can you run that model, how fast can you process real-world data, and how much infrastructure do you need to do it?

During the private beta testing of NVIDIA RTX PRO™ 6000 Blackwell GPUs on Akamai Cloud, Harmonic put these questions to the test with a demanding, image-based AI workload built around a 3-billion–parameter model.

The results were clear: Harmonic achieved high performance, efficient resource use, and the ability to push model optimization techniques without sacrificing accuracy.

“During the private beta, the NVIDIA RTX PRO 6000 Blackwell GPUs on Akamai Cloud enabled us to run our AI image workloads with accuracy, speed, and efficiency. We were able to process large volumes of images quickly while optimizing our models for performance and maintaining a very low false detection rate. The results gave us real confidence in scaling these workloads in production.”

— Moore Macauley, CTO, Video Business, Harmonic

The real test: Accuracy, efficiency, and speed

Harmonic’s workload was not a synthetic benchmark. It was a production-grade image processing pipeline where detection quality mattered.

The goals were straightforward:

  • Maintain a very low false-detection rate with a 3B-parameter model
  • Optimize GPU use and memory footprint
  • Maximize throughput for large image batches
  • Evaluate the impact of model quantization on performance and accuracy

What Harmonic found was a combination of low memory footprint, high Tensor use, and processing speed at scale that is difficult to achieve on traditional cloud GPU infrastructure.

 

Low memory footprint and high Tensor use

Despite the size of the model, Harmonic observed:

  • GPU memory use below 10%
  • Tensor use consistently in the 70% to 80% range

This is a strong signal that the GPUs were not bottlenecked by memory constraints, and that the workload was able to fully leverage the Tensor cores for high-throughput inference. In practical terms, this means Harmonic could run sophisticated models without needing to overprovision infrastructure just to accommodate memory overhead.

 

Processing speed at scale

 

  • Harmonic processed 300 images in under one minute

 

This level of throughput demonstrates how Blackwell GPUs on Akamai Cloud can support real-time or near-real-time AI processing at scale, making them well suited for AI-driven production systems where both low-latency and scalability are critical for success.

Quantization without compromise

One of the most telling findings from Harmonic’s testing regarded model optimization.

Harmonic evaluated 4-bit integer quantization against traditional float16 precision and observed that for test workloads there was:

  • No significant loss in detection performance
  • Slight gains in memory efficiency
  • Improved processing speed

This is important because quantization is often viewed as a trade-off between efficiency and accuracy. Harmonic’s results show that on Blackwell GPUs advanced quantization techniques can improve performance characteristics without degrading the quality of the results.

That opens the door for teams to:

  • Run larger models in smaller footprints
  • Reduce infrastructure costs
  • Increase throughput without sacrificing results

Why this matters for AI workloads at the edge and in the cloud

Harmonic’s testing highlights a broader pattern: Modern AI workloads need infrastructure that is built for inference efficiency, not just raw compute. This is achieved by combining:

  • High Tensor performance
  • Efficient memory use
  • Support for advanced model optimization
  • Fast processing at scale

Blackwell GPUs on Akamai Cloud provide a foundation for AI systems that must operate continuously, process large volumes of data, and maintain high accuracy. This is particularly relevant for AI workloads that run closer to users, devices, or data sources, where performance, efficiency, and cost all matter.

From private beta to production confidence

For Harmonic, the private beta testing was an opportunity to validate that their AI image processing workloads could run efficiently, accurately, and at high speed on Akamai’s GPU infrastructure.

The results gave them confidence that they could:

  • Scale inference without scaling infrastructure linearly
  • Optimize models aggressively by using quantization
  • Maintain high detection quality while increasing throughput

These are the exact characteristics that teams look for when moving AI from experimentation to production.

Find out more

If running AI workloads efficiently at scale is core to your business, you can learn more about how NVIDIA Blackwell GPUs on Akamai Cloud can support your next generation of inference.

Danielle Cook author image

Mar 05, 2026

Danielle Cook

Danielle Cook author image

Written by

Danielle Cook

Danielle Cook has been a driving force in the cloud native industry since 2016, helping organizations adopt enterprise-ready technologies while communicating their business value. She co-authored and maintains the CNCF Cloud Native Maturity Model, co-chairs the CNCF Cartografos Working Group, and co-authored Admiral Bash’s Island Adventure. A CNCF Ambassador and founder of KubeCrash, a virtual bi-annual meeting, she champions open source and community-driven activism.

Tags

Share

Related Blog Posts

Cloud
AI Survey: 50% of Organizations Struggle to Maintain Latency at Scale
May 06, 2026
The Akamai State of AI Inference report captures real data from the field that describes how AI inference is being built and scaled in production today.
Cloud
Akamai Cloud Is Built for What Cloud Has Become (Updated May 2026)
May 06, 2026
Explore Akamai Cloud’s Q1 2026 updates: New NVIDIA Blackwell GPUs, edge native Akamai Functions, and infrastructure built for real-time, distributed AI inference.
Cloud
Observability for Akamai Cloud: Get Started with Akamai Cloud Pulse
April 23, 2026
Learn how to use Akamai Cloud Pulse to monitor infrastructure. Explore how to collect and export metrics for Object Storage and NodeBalancers.