Introducing a “super intelligent agent” that enables infinite progress!
A paper with the following title was recently published: hyper agent (Super Intelligent Agents) by a meta-research team quickly became popular.
This paper combines the following ideas: gedel machine suggested by Jurgen Schmidhuberthe father of LSTM, 20 years ago Darwin’s open algorithm suggest Darwin’s Gödel machine Continuous self-iteration is possible.
Based on this idea, agents can not only complete specific tasks better, but also continuously improve their own performance.
More importantly, Continuously optimize the basic logic of “improving yourself” To realize “meta-learning”.
This is the new generation of superintelligent agents defined in the paper. hyper agent.
The paper further proposes that in the future, AI is expected to break through the boundaries of initial algorithms preset by humans through continuous self-iteration. Therefore, AI safety must be front and center.
Many netizens also sighed as follows.
What is both scary and exciting about meta-learning is that meta-level improvements can be transferred across domains. You don’t learn to get better at one thing, you learn to get better at everything.
This paper has now been accepted into ICLR 2026.
From Gödel Machine to Darwin’s Gödel Machine
Super-Intelligent Agents To understand Hyperagents, you first need to understand their basics.
gedel machine.
A Gödel machine is a hypothetical self-improving AI. It calls for the following mathematical proof.
If you have a better strategy, rewrite your own code recursively to solve the problem.
This hypothesis was first proposed by Jürgen Schmidhuber over 20 years ago.
The AI “learning method” in conventional machine learning is hardcoded It is pre-configured by humans and it can Adjust internal parameters to get closer to the goal;
Gödel machines, on the other hand, break this restriction. it is, The algorithmic framework itself As editable code, Achieving self-evolution of learning ability by autonomously rewriting programs.
However, problems also arise. Gödel machines often require an AI to prove that a change has a net benefit before self-evolving.
In other words, will the computational power costs spent on code changes be recouped through future performance improvements?
Unfortunately, such calculations are almost impossible to achieve in complex real-world tasks.
In response to this issue, the Meta team proposed the following: Darwin’s Gödel Machine (DGM)Which uses an open-ended algorithm to search among code improvement schemes proposed by a large model and obtain the ones that can empirically improve performance.
In other words, DGM uses the underlying model to propose code improvement schemes and uses the latest innovations in open-ended algorithms to search and build a growing and diverse library of high-quality AI agents.
Based on this, DGM can create various self-improvement schemes. Examples include adding a patch validation step, optimizing file viewing capabilities, enhancing editing tools, generating and screening multiple solutions to select the best solution, and automatically adding historical trial records for reference when making new changes (and analyzing failure reasons).
Experiments in the paper also show that the more computational power the DGM gains, the greater its self-improvement effect.
super intelligent agent
DGM is very powerful, but it has critical limitations. Mainly effective for programming tasks.
This is because DGM relies on important assumptions. Assessment tasks and self-correction tasks must be “coordinated.”
In the field of programming, this adjustment is natural. As you improve your programming skills, your ability to modify your own code will naturally improve as well.
That is, a logical tool for solving external programming problems can be directly translated into the ability to modify its own underlying code.
Conversely, in non-programming fields (such as writing poetry), even if your ability to write poetry improves, it cannot be directly translated to the logical level of code modification.
In tasks that lack “self-referentiality,” the chain of recursive evolution of the DGM breaks and stagnates.
Based on this, this article proposes a super-intelligent agent.
Not only can they change their own task execution behavior, but they can also change the process that generates future improvement suggestions.
This results in what is called metacognitive self-correction. So, not only can you learn how to do better, but you can also learn how to improve more effectively.
Furthermore, the paper instantiates a super intelligent agent as follows. DGM – Hyper Agent (DGM – H).
DGM – H is an extension of DGM in which both task-solving behavior and self-improvement programs are editable and evolvable. The framework is as follows.
Self-referential architecture: unifies “task agents” and “meta agents” into a single editable program.
Meta-level evolution: Hyperagents can also improve the “how to improve” itself. This eliminates the need for the system to adjust and modify tasks, resulting in cross-domain “metacognitive self-modification.”
For example, at HyperAgent, not only are athletes training, but coaches are also learning how to better coach. As a result, athlete performance and coaching levels continue to rise.
Additionally, DGM-H also improves the process of generating new agents, such as introducing persistent memory and performance tracking. These meta-level improvements are characterized by cross-domain migration and cross-run accumulation.
Experimental verification: Leap from 20% to 50%
Through experiments, Darwin’s Gödel machine Modifying your own code library allows for continuous self-improvement.
On the SWE bench, DGM automatically improved performance. 20.0%~50.0%.
On Polyglot, DGM’s performance jumped from an initial 14.2% to 30.7%, far outperforming the representative artificially designed agent developed by Aider.
These results prove that DGM can discover and implement effective self-improvement.
The key to achieving this lies in its open-ended evolutionary search strategy.
By generating new agents by sampling from existing agent libraries, DGM can explore multiple evolutionary paths in parallel.
“Ancestor” agents with slightly lower performance play a key role in discovering new methods and capabilities, avoiding premature convergence.
Additionally, DGM improvements have extensive transferability.
Agents optimized for Claude 3.5 Sonnet can also improve performance by switching to o3 – mini or Claude 3.7 Sonnet.
In the Polyglot benchmark, self-improvement of Python tasks also improved the performance of tasks in various languages such as Rust, C++, and Go.
Author introduction
Finally, let me introduce the authors of this paper.
The lead author of this paper is jenny chan From UBC, a student of Prof. Jeff Clune.
△
She graduated from Imperial College London as an undergraduate. This paper was completed during my internship at Meta. Her research focuses on reinforcement learning, self-improvement AI, and open-ended AI.
Qiao Binchen I am a PhD student and professor student at the University of Edinburgh. Oisin Mac Aoda.
He graduated from Tongji University as an undergraduate. Previously, he was part of the Meta FAIR team, where he focused on building self-improving AI systems.
△
Yang Wannan I am pursuing a PhD at New York University and am currently interning at the Meta-Superintelligence Institute. She graduated as an undergraduate from the University of Edinburgh.
△
Other authors of the paper include Jeff Clune and researcher Minqi Jiang. (Who left?)Sam Devlin and Tatiana Shavrina
