SAN FRANCISCO, April 1, 2026 (Globe Newswire) — Today, MLCommons® Announcing new results from industry standard MLPerf® Inference v6.0 benchmark suite. This release includes several important advances that ensure the benchmark suite tests current real-world scenarios for AI deployments and provides a comprehensive picture of AI system performance.
Five of the 11 data center tests in MLPerf Inference v6.0 are new or updated, and this release also includes new object detection tests for edge systems. The main changes are:
● New open-weight large-scale language model benchmark Based on GPT-OSS 120B, which can be used for mathematics, scientific reasoning, and coding.
● Enhanced DeepSeek-R1 advanced inference benchmarkincludes interactive scenarios that allow speculative decoding.
● DLRMv3, 3rd generation of recommender benchmark And now we have the first continuous recommendation benchmark test in the suite. This test has been thoroughly modernized with significant engineering contributions from Meta, the world leader in recommender systems.
● First text-to-video generation in the suite benchmark;
● New Vision Language Model (VLM) benchmark Convert unstructured multimodal data from Shopify’s extensive product catalog into structured metadata.
● Upgraded single shot object detection benchmark For edge scenarios based on Ultralytics’ YOLOv11 Large model.
“This is the most significant revision of our inference benchmark suite to date,” said Frank Han, Systems Development Engineering technical staff member at Dell Technologies and co-chair of the MLPerf Inference Working Group. “The decision to update so many benchmarks in this round was driven by the extraordinary enthusiasm and collaboration of our members, who have contributed an unprecedented amount of engineering effort and intellectual property towards building new inference benchmarks. Adding these new tests will better position MLPerf Inference to keep up with the breakneck pace of evolution of AI models and technology, ensuring that our benchmarks are relevant and representative of real-world deployments.”
The open source MLPerf Inference benchmark suite measures system performance in an architecture-independent, representative, and reproducible manner. The goal is to create a level playing field for competition that fosters innovation, performance, and energy efficiency across the industry. The published results provide important technical information for customers procuring and calibrating AI systems.
“We would like to thank Meta, Shopify, and Ultralytics for their tremendous help in making these changes to the MLPerf Inference benchmark suite, and for providing datasets, task definitions, and expertise,” said Miro Hodak, senior member of AMD’s technical staff and co-chair of the MLPerf Inference working group. “These partnerships were essential to ensure that our tests included scenarios and workloads that represented the current state of the industry.”
“The MLPerf Inference benchmark plays an important role in driving transparency and accountability across the AI industry,” said Glenn Jocher, CEO and Founder of Ultralytics. “At Ultralytics, rigorous, reproducible benchmarks are at the heart of the development and validation of Ultralytics YOLO models, enabling developers and organizations to make informed decisions about real-world performance. We’re proud to be part of an ecosystem that holds the entire field to higher standards.”
“Commerce is one of the most complex areas of AI, but researchers rarely have the data to reflect that complexity,” said Kshetrajna Raghavan, Principal Engineer, Applied ML at Shopify. “Shopify, at the intersection of millions of sellers and billions of products, is uniquely positioned to address this problem. By sharing this taxonomy, we can advance the entire field.”
New tools for senders and consumers
Inference 6.0 gives submitters the option to complete benchmark tests using newly available harnesses. A new system, LoadGen++, allows LLM to run in a serving-style software stack. This is a well-known common development today. “LoadGen++ is a significant upgrade from the previous generation and represents a significant investment by MLCommons that will allow us to remain agile as we continue to create benchmark tests that track cutting-edge technology,” said Han.
In addition, Inference 6.0 results can be viewed on the new online dashboard on the MLCommons site at https://mlcommons.org/visualizer. Dashboards bring a new level of interactivity to viewing results, including advanced filtering and customized performance graphs.
Large-scale multi-node systems attract attention
Submissions to Inference 6.0 indicate that technology providers want to showcase the performance of scaled-up multi-node systems running real-world inference workloads. This round marks a new all-time high for submissions for a multi-node system, with a 30% increase compared to the Inference 5.1 benchmark from six months ago. Furthermore, in Inference 6.0, 10% of all systems submitted contained 10 or more nodes, compared to just 2% in the previous round. The largest system submitted for Inference 6.0 had 72 nodes and 288 accelerators, four times the number of nodes in the previous round’s largest system.
“As more AI applications move into production and become widely available, there is a growing demand for large-scale, high-performance systems to run them,” Hodak said. “At the same time, multi-node systems pose a unique set of technical challenges over single-node systems, requiring configuration and optimization of system architecture, network interconnects, data storage, and software layers. Stakeholders are working diligently to address these challenges and run inference workloads at scale.”
The AI community continues to embrace and invest in MLPerf inference.
The MLPerf Inference 6.0 benchmark received submissions from a total of 24 participating organizations: AMD, ASUSTeK, Cisco, CoreWeave, Dell, GATEOverflow, GigaComputing, Google, Hewlett Packard Enterprise, Intel, Inventec Corporation, KRAI, Lambda, Lenovo, MangoBoost, MiTAC, Nebius, Netweb Technologies India Limited, NVIDIA, Oracle, and Quanta Cloud Technology. Red Hat, Stevens Institute of Technology, Supermicro.
“We would like to welcome our first-time submitters: Inventec Corporation, Netweb Technologies India Limited, and Stevens Institute of Technology,” said Han. “The AI ecosystem is large and diverse, and continues to grow and evolve rapidly. And on behalf of MLCommons, we would like to thank our members, contributors, and Meta, Shopify, and Ultralytics for working with us to build and lead the most comprehensive and relevant performance benchmark suite for AI inference. We would like to thank our partners, including our partners, as we work together to ensure that our community stakeholders have access to valuable, real-world information that helps them make better decisions.”
See results
To view MLPerf Inference v6.0 results, visit the Benchmark Results Dashboard. https://mlcommons.org/visualizer.
About ML Commons
MLCommons is the world leader in AI benchmarking. MLCommons is an open engineering consortium supported by over 130 members and affiliates with a proven track record of bringing together academia, industry, and civil society to measure and improve AI. The foundation of MLCommons began with the MLPerf benchmark in 2018 and has quickly grown into a set of industry metrics to measure machine learning performance and promote transparency in machine learning technology. Since then, MLCommons has used collective engineering to build the benchmarks and metrics needed for better AI, ultimately helping to evaluate and improve the accuracy, safety, speed, and efficiency of AI technology.
For more information about MLCommons and to become a member, please visit: MLCommons.org or by email join@mlcommons.org.
For press inquiries: please contact press@mlcommons.org.
