Minimax M1 model claims Chinese LLM crown from Deepseek • Register

Minimax, a Shanghai-based AI company, has released an open source inference model that challenges Chinese rivals Deepseek and American humanity, Openai and Google in terms of performance and cost.

The Minimax-M1 was released on Monday under the Apache software license, so unlike Meta's Llama family, it is offered under a non-open source community license, while Deepseek is only part of the open source license.

“In a complex, productivity-oriented scenario, the M1 features are the top of the open source model, surpassing domestic closed source models, approaching major overseas models, providing the best cost-effectiveness in the industry.”

According to a blog post, M1 is competing with Openai O3, Gemini 2.5 Pro, Claude 4 Opus, Deepseek R1, Deepseek R1-0528, and QWEN3-235B (behind AIME 2024, LiveCodeBench, Swe-Bench, Swe-Bench, Swe-Bench verified, Tau-bench, Mrcr. As usual, use a single grain of salt for the benchmark results supported by the vendor. However, the source code can be used on GitHub.

However, Minimax reveals that realizing that the context window (the amount of input that can be processed) is 1 million tokens, comparable to the Google Gemini 2.5 Pro and eight times the capacity of the DeepSeek R1, destroying major industries.

1/5th day #minimaxweek: We open source our latest LLM, Minimax-M1. Sets a new standard for long context inference. -The world's longest context window: 1M token input, 80K token output – cutting-edge agent usage between open source models – unmatched efficiency: …… pic.twitter.com/bgfdlza54n

– minimax (official) (@minimax__ai) June 16, 2025

Regarding output, the model can manage 80,000 tokens. This is better than Deepseek's 64,000 token capacity, but can spit 100,000 tokens at the prompt.

Supported by Alibaba Group, Tencent and IDG Capital, Minimax asserts lightning attention mechanisms. This is a way to calculate attentional matrices that improve both training and inference efficiency, and the M1 model gives advantages when calculating long context inputs.

“For example, if you want to perform deep inference on 80,000 tokens, you only need about 30% of the computing power of a DeepSeek R1,” the company claims. “This feature provides a significant computational efficiency advantage in both training and inference.”

This more efficient calculation method, combined with an improved reinforcement learning algorithm called Cispo (details in the M1 technical report) [PDF]), converts to reducing computing costs.

“The entire reinforcement learning phase used only 512 [Nvidia] A 3-week H800 rental cost of just $537,400 is “Minimax billing.”

Source link