- Two papers on MoE-specific quantization algorithms were accepted at the workshop held in conjunction with ICML 2026
- Recognition following Nota AI’s overall win at NVIDIA Nemotron Hackathon
- Enhance core optimization technologies to help run large AI models smaller and more efficiently
Seoul, South Korea, June 11, 2026 /PRNewswire/ — Nota AI, a company specializing in AI model compression and optimization, announced that two papers on MoE-specific quantization algorithms have been accepted to the Resource-Adaptive Foundation Model Inference (AdaptFM) workshop at ICML 2026, one of the world’s leading machine learning conferences.
Nota AI demonstrates global competitiveness in large-scale AI optimization with two MoE quantization papers accepted at ICML 2026 workshop
ICML is widely recognized as one of the leading global conferences on machine learning and artificial intelligence, bringing together the latest research from global technology companies, leading universities, and leading research institutions. The AdaptFM workshop focuses on technologies that enable large-scale AI models to run efficiently under limited computing resources. Researchers from global companies and research institutions such as Amazon and Meta are participating in the organizing committee, and researchers from major AI companies such as NVIDIA, Qualcomm AI Research, OpenAI, Apple, and Microsoft are also participating as members of the program committee.
This achievement is significant as it recognizes Nota AI’s accumulated technical expertise in optimizing mixture of experts (MoE) models, an architecture that is increasingly gaining traction as a core structure for large-scale language models (LLMs). The MoE model improves both performance and efficiency by activating only a subset of the expert model when needed. However, its complex structure requires a different approach to quantization, the process of making models smaller and more efficient, compared to traditional model architectures.
Nota AI previously won both the track and all-around competitions at the NVIDIA Nemotron Hackathon using data-driven MoE quantization techniques. With the acceptance of these two papers, Nota AI will once again present research results specifically designed for MoE architectures at the global research stage.
The first accepted paper, “DREAM-MoE,” proposes a method to mitigate changes in the model’s decision flow that can occur when large AI models are quantized across multiple segments. This method focuses on the fact that even a small error in a previous segment can affect the expert’s choice in a later segment. DREAM-MoE helps select experts whose quantized model is close to the original model.
The second paper, “SRA-MoE,” proposes a method to identify and prioritize critical inputs that have a significant impact on the final output of the model. Rather than treating all inputs equally, SRA-MoE is designed to prevent expert choices on these key inputs from being severely disrupted, allowing model quality to be more effectively maintained under limited resources.
Both studies demonstrated higher performance compared to state-of-the-art MoE-specific quantization methods. This shows that large-scale AI models can be run using less memory and fewer computing resources with less quality loss. As the cost, power consumption, and hardware burden of running large-scale AI models continues to increase, MoE-specific quantization technology becomes increasingly important.
Nota AI has aggressively focused its research and development efforts on optimizing large-scale AI models that require large amounts of memory and computing resources. The company is optimizing large-scale models, including Solar MoE, as part of the Sovereign Infrastructure Model Project led by the Upstage consortium. We’re also extending the quantization experience of the NVIDIA Nemotron 3 Nano to new larger models such as the Nemotron Ultra, further expanding the scope of our optimization technology.
“The acceptance of this paper reflects the continued advancement of Nota AI’s Ministry of Education-specific quantization technology,” said Myungsu Chae, CEO of Nota AI. “Following our overall win at the NVIDIA Nemotron Hackathon, we are excited to present our research at the ICML 2026 AdaptFM workshop. We continue to develop optimization technologies that make large-scale AI models more efficient and practical to use.”
In addition, Nota AI will host “Nota AI – Korea Efficient Days” during ICML 2026 at COEX in Seoul. This event brings together researchers, engineers, and business leaders from around the world visiting Korea to share research trends and efficient industrial applications of AI. Through this event, Nota AI plans to introduce its research results in large-scale AI model optimization and expand opportunities for technical and business collaboration.

