MIT researchers teach AI models to learn from their own notes

Large language models already read, write, and answer questions with amazing skill. They do this by training a vast library of texts. However, once that training is finished, the model's knowledge is mostly frozen. Teaching new facts and skills can be difficult, especially when little task-specific data exists. This gap has led researchers to ask fundamental questions about artificial intelligence. Can a model learn how to continue learning on its own?

A new framework called Self-Adaptive Large-Scale Language Models (SEAL) provides one possible answer. Developed by researchers at MIT, this approach allows language models to generate their own learning materials and decide how to train them. This idea mirrors the way people prepare for exams. Instead of rereading the textbook, students rewrite their notes, summarize their ideas, and test themselves. The facts remain the same, but the format changes to make learning stick.

“Like humans, complex AI systems cannot remain static throughout their lives. These LLMs are not deployed in static environments; they are constantly facing new inputs from users. We want to create models that are a little more human-like, models that can continue to self-improve,” says MIT graduate student and co-lead author Jyotish Pari.

SEALs apply this human habit to machines. Rather than handing models fixed training data and rigid instructions, this system allows models to restructure what they study and how they study it. The goal is lasting internal change, not better short-term answers.

Seal overview. At each iteration in the outer loop of RL, the model generates self-editing candidates (SEs) (directives on how to update weights), applies the updates, evaluates the performance of downstream tasks, and uses the resulting rewards to improve the self-editing generation policy. (Credit: arXiv)

Teach the model to rewrite the model's own lessons

At the heart of SEAL is a concept called self-editing. Self-edits are short natural language instructions created by the model itself. Learn what new training data to use and possibly adjust training settings such as learning rate and number of training steps.

The learning process is performed in two loops. In the inner loop, the model reads task-related text and generates self-edits. The compilation may include rewritten facts, extrapolated statements, or short summaries. The system then uses this synthetic data to fine-tune the model and slightly change the internal weights. The updated model is then tested on tasks such as answering questions and solving reasoning puzzles.

The outer loop decides which self-edits are worth keeping. This step uses reinforcement learning. If self-editing improves performance, the model receives a reward. Otherwise, the edit will be discarded. Over time, the model learns which types of self-generated notes help improve the model.

Standard reinforcement learning techniques are difficult to apply because each reward depends on how the model itself changes through self-editing. Researchers are instead using a simpler approach called ReST-EM. The model generates several self-editing candidates, keeps only those that lead to better results, and fine-tunes itself based on those successes.

Built-in setup of knowledge. Given a new passage, the model generates synthetic data (self-editing) in the form of “implications” of that passage. Then use LoRA to fine-tune these outputs. The updated model is evaluated on questions about the passage without access to the original text, and the resulting accuracy serves as a reward signal for reinforcement learning. (Credit: arXiv)

Turn text into useful memories

One of the main tests for a SEAL is adding new factual knowledge. In this setting, the model must read a passage and then answer a question later without looking at the passage again. Rather than training directly on the original text, SEAL asks the model to generate implications of what is read. These are short statements that restate or logically extend information.

For example, a paragraph about a historic city might lead to notes about its location, age, and cultural role. This memo will be self-edited. The model uses lightweight updates called low-rank adapters to train them. This allows for many small learning steps.

The researchers tested this idea using the Qwen2.5-7B model and a portion of the SQuAD question answering dataset. Without adjustment, the model answered approximately 33% of the questions correctly. Direct training on the original passages did little to improve their scores. When the model was trained using its own generated notes, accuracy jumped to nearly 40%. Notes generated by GPT-4.1 were even more accurate, increasing to about 46 percent.

After SEALs learned how to create better self-edits through reinforcement learning, the accuracy of their own notes increased by 47%. This result slightly exceeded the performance achieved using the GPT-4.1 notebook, even though the underlying model was smaller.

The team also tested learning from hundreds of sentences at once. Even as the number of texts increased, SEALs remained competitive. This suggests that the model learned general strategies for writing useful research notes, not just tricks about single passages.

Few-shot learning with SEALs. Left: ARC demo example. Center: The model generates self-edits that specify expansion and training hyperparameters. Right: The adapted model is evaluated based on the retained test inputs. (Credit: arXiv)

Selecting a learning plan for problem solving

SEAL was also tested on a few-shot inference task from a subset of the ARC-AGI benchmark. These puzzles ask the model to infer a visual pattern from a small grid of colored squares. The researchers used the compact model Llama-3.2-1B-Instruct without any special training on these tasks.

Here, the SEALs worked during test hours. Before answering, the model was automatically adapted using some examples provided. Self-editing took the form of recipes. You selected transformations to apply to the example, such as rotation and reflection, and selected training settings, such as learning rate and number of steps.

The model generated and tested multiple recipes for each task. Only those that led to a correct answer were reinforced. Simple in-context learning did not solve any of the selected puzzles. Test-time training without learned self-editing reached a success rate of 20%. After SEAL training, the success rate increased to more than 70%. The ideal human-designed setup reached 100%, showing room for growth, but also clear benefits.

Limitations and open questions

Shields also identified challenges. One of the big problems is crippling forgetfulness. As the model continues to adapt to new information, its performance on previous tasks gradually degrades. The system does not collapse, but old knowledge disappears as new self-edits interfere with the system.

Catastrophic forgetting due to continuous self-editing. Sequentially update the model with new passages and track the degradation of previous tasks. (Credit: arXiv)

Another concern is cost. Each self-edit must be tested through tweaking and evaluation, which can take 30-45 seconds. Extending this approach to larger models and datasets will require more efficient methods.

This framework also relies on labeled evaluation tasks. Future research could allow models to create their own exercises and tests, reducing their reliance on human labels.

Practical implications of the research

SEAL refers to language models that do not remain fixed after deployment. Systems that can rewrite learnings and tailor their own training may be able to better absorb new research, adapt to users, and operate in changing environments.

This capability could support long-running AI agents, scientific assistants, and educational tools that improve through experience. Although challenges remain, self-adaptive models provide a path to artificial intelligence that learns like humans.

“While this research does not solve continuous learning or eliminate catastrophic forgetting, it does provide a concrete path towards language models that continue to learn in a data-constrained world, rather than just being trained once and frozen,” Jyotish Pari told News' Brightside.

The research results are available online in the journal arXiv.

Source link

b"asta binance h"anvisningskod commented on IP Basics: Copyright Law (Podcast) – Copyright: I don't think the title of your article matches th
binance konto commented on AI And The Channel: It’s Go Time: Thanks for sharing. I read many of your blog posts
小艾彩票平台 commented on Create the content you envision: Hello, for all time i used to check blog posts her
天天官网 commented on 10 AI Applications to Streamline Business and Customer Experiences: After looking into a few of the blog posts on your
免费Binance账户 commented on Foreshadowing Biden’s AI Executive Order? — AI: The Washington Report | Mintz: Can you be more specific about the content of your

MIT researchers teach AI models to learn from their own notes