Running AI inference at scale is no longer just about model size. Today’s questions include: How efficiently can you run that model, how fast can you process real-world data, and how much infrastructure do you need to do it?
During the private beta testing of NVIDIA RTX PRO™ 6000 Blackwell GPUs on Akamai Cloud, Harmonic put these questions to the test with a demanding, image-based AI workload built around a 3-billion–parameter model.
The results were clear: Harmonic achieved high performance, efficient resource use, and the ability to push model optimization techniques without sacrificing accuracy.
“During the private beta, the NVIDIA RTX PRO 6000 Blackwell GPUs on Akamai Cloud enabled us to run our AI image workloads with accuracy, speed, and efficiency. We were able to process large volumes of images quickly while optimizing our models for performance and maintaining a very low false detection rate. The results gave us real confidence in scaling these workloads in production.”
— Moore Macauley, CTO, Video Business, Harmonic
The real test: Accuracy, efficiency, and speed
Harmonic’s workload was not a synthetic benchmark. It was a production-grade image processing pipeline where detection quality mattered.
The goals were straightforward:
- Maintain a very low false-detection rate with a 3B-parameter model
- Optimize GPU use and memory footprint
- Maximize throughput for large image batches
- Evaluate the impact of model quantization on performance and accuracy
What Harmonic found was a combination of low memory footprint, high Tensor use, and processing speed at scale that is difficult to achieve on traditional cloud GPU infrastructure.
Low memory footprint and high Tensor use
Despite the size of the model, Harmonic observed:
- GPU memory use below 10%
- Tensor use consistently in the 70% to 80% range
This is a strong signal that the GPUs were not bottlenecked by memory constraints, and that the workload was able to fully leverage the Tensor cores for high-throughput inference. In practical terms, this means Harmonic could run sophisticated models without needing to overprovision infrastructure just to accommodate memory overhead.
Processing speed at scale
- Harmonic processed 300 images in under one minute
This level of throughput demonstrates how Blackwell GPUs on Akamai Cloud can support real-time or near-real-time AI processing at scale, making them well suited for AI-driven production systems where both low-latency and scalability are critical for success.
Quantization without compromise
One of the most telling findings from Harmonic’s testing regarded model optimization.
Harmonic evaluated 4-bit integer quantization against traditional float16 precision and observed that for test workloads there was:
- No significant loss in detection performance
- Slight gains in memory efficiency
- Improved processing speed
This is important because quantization is often viewed as a trade-off between efficiency and accuracy. Harmonic’s results show that on Blackwell GPUs advanced quantization techniques can improve performance characteristics without degrading the quality of the results.
That opens the door for teams to:
- Run larger models in smaller footprints
- Reduce infrastructure costs
- Increase throughput without sacrificing results
Why this matters for AI workloads at the edge and in the cloud
Harmonic’s testing highlights a broader pattern: Modern AI workloads need infrastructure that is built for inference efficiency, not just raw compute. This is achieved by combining:
- High Tensor performance
- Efficient memory use
- Support for advanced model optimization
- Fast processing at scale
Blackwell GPUs on Akamai Cloud provide a foundation for AI systems that must operate continuously, process large volumes of data, and maintain high accuracy. This is particularly relevant for AI workloads that run closer to users, devices, or data sources, where performance, efficiency, and cost all matter.
From private beta to production confidence
For Harmonic, the private beta testing was an opportunity to validate that their AI image processing workloads could run efficiently, accurately, and at high speed on Akamai’s GPU infrastructure.
The results gave them confidence that they could:
- Scale inference without scaling infrastructure linearly
- Optimize models aggressively by using quantization
- Maintain high detection quality while increasing throughput
These are the exact characteristics that teams look for when moving AI from experimentation to production.
Find out more
If running AI workloads efficiently at scale is core to your business, you can learn more about how NVIDIA Blackwell GPUs on Akamai Cloud can support your next generation of inference.
Tags