Minimax's Hailuo 02 outperforms Google Veo 3 in user benchmarks with much lower video costs

Minimax has introduced the Hailuo 02, the second generation of video AI models.

The new model uses an architecture called Noise-Aware Compute Redistribution (NCR). This states that Minimax improves training and inference efficiency by 2.5 times. The NCR architecture handles long video sequences differently depending on the training stage. Early in training, when artificial noise is introduced into the data at a large extent, the video is compressed as much as possible. Later, as the training videos become more clear, the model processes them at full resolution.

Minimax-diagramm: GemeInsames Training Von Down Sample and Noise Zurfrühenkpression Ultra-Langer Video Token. — Minimax spotlights the new NCR architecture as key to Hailuo 02, but hasn't shared any technical details yet. | Image: Minimax

Compared to previous versions, Hailuo 02 has 3x more parameters and 4x more training data, and Minimax also focuses on improving data quality and diversity. The company does not disclose accurate parameter counts or dataset sizes.

According to Minimax, Hailuo 02 shows clear benefits in handling complex prompts and simulating physical processes. The company now claims it is the only model that can accurately generate complex scenes like gymnastics routines.

Video: Minimax

Hailuo 02 comes in three variations: 768p 6 seconds, 768p 10 seconds, 1080p at 6 seconds. Previous models were limited to 720p, 6-second video at 25 fps.

In the artificial analytics video arena benchmark, where users rate videos from competing AI models, hailuo 02 finished second in the image to video category. Just behind Bytedance's Seedance, ahead of Google's much-deferred Veo 3. However, this version of Veo 3 does not support the audio, a major part of its appeal.

Table of key image-to-video AI models with ELO scores: Seedance 1.0 leads with 1351 points, showing 95% CI. — In user benchmarks, Hailuo 02 outperforms Google Veo 3, but Veo also supports native audio generation. |Screenshot:Decoder

According to Minimax, since the demo was launched last August, it has used the Hailuo platform to create more than 3.7 billion videos. The company describes the initial development as highly random, but says it has attracted extensive attention from creators around the world.

The model can be accessed through a web interface, a mobile app, or an API. For API users, generating a 6-second 768p video, the price of the 1080p version is $0.49. In comparison, creating an 8-second 1080p video using Google Veo 3 costs around $3 depending on your plan.

recommendation

All Apple Intelligence Updates announced by Apple at WWDC 25

Minimax says it works to improve generation speed, stability and add new features beyond the current text-to-video image-to-video options. Competing platforms like Runway already offer more advanced features, such as shot tracking.

The Hailuo 02 release is part of “Minimax Week,” a five-day event in which a Chinese startup also announced Minimax-M1, an open source language model with a number of parameters and technical papers. In contrast, technical details regarding the training architecture for Hailuo 02 remain unreleased.

Source link