NVIDIA Kaggle Grandmaster wins Artificial General Intelligence Contest

Machine Learning


NVIDIA researchers on Friday won an important Kaggle competition that many in the field are working on to see humanity's progress toward artificial general intelligence (AGI) in real time.

Two members of Kaggle Grandmasters of NVIDIA (KGMoN), Ivan Sorokin and Jean-Francois Puget, took first place on the Kaggle ARC award 2025 public leaderboard with a score of 27.64% by building a solution evaluated on the same dataset behind the ARC-AGI-2 benchmark.

The team, called NVARC, has fine-tuned a variant of the 4B model that outperforms much larger and more expensive models on the same benchmarks at just 20 cents per task. It not only demonstrated state-of-the-art results, but also represented a breakthrough in scalable and economical AGI-style inference.

The ARC-AGI benchmark measures how well an AI system can perform abstract reasoning and generalize from a small number of examples using grid-based visual puzzles. ARC-AGI-2 is a harder updated version that removes duplication with public training data. It is explicitly designed to combat shortcuts and brute force memorization, and allows true systematic abstractions to be more rigorously tested.

The ARC-AGI benchmark has become one of the most-watched indicators of real-world progress toward general reasoning in AI. Unlike common machine learning benchmarks, ARC-AGI tasks cannot be solved by scaling, memorizing, or pattern scraping. Each puzzle is a small grid containing only a handful of examples, forcing the system to infer abstract rules and apply them to entirely new test cases. The more difficult ARC-AGI-2 score is widely recognized as an indicator of how capable an AI system is. learn from almost nothing.

That's why Kaggle ARC award 2025 leaderboards are so important. This is the most open and reproducible arena for researchers to test AGI-style inference under strict computational and time constraints.

The winning NVIDIA NVARC solution did not utilize large models or brute force searches. Instead, it's based on three ideas that every developer understands: synthetic data, training during testing, and disciplined engineering.

Powerful LLM reasoning techniques (thought chains, tool usage, even RL-style agents) did not fit within Kaggle's short runtime. So NVARC shifted strategy and did all the complex reasoning. off-line Train small models that can be incorporated into synthetic data pipelines and run quickly during evaluation.

Using stepwise puzzle generation, concept decomposition, and stepwise enriched open-weight models, the team built a diverse synthetic corpus of ARC-style tasks. The final model only needed to recognize and adapt to patterns, rather than running a full programmatic search logic. During test-time training, we learn the details of each puzzle from a small sample set. This is an essential technique for leading ARC-AGI performance.

The result is a compact, cost-effective ensemble that outperforms much larger systems, setting a new standard for ARC-AGI-2 and demonstrating how synthetic data and adaptive learning can advance inference.

To build these powerful solutions, the team leveraged the NVIDIA NeMo suite of tools, including NeMo RL for scalable reinforcement learning and NeMo Skills to streamline SDG pipelines.

Learn more about the technical details in NVARC's article on Kaggle and read our interview with ARC.



Source link