The framework is called the SEAL-self-applied language model. Unlike traditional approaches such as fine-tuning and in-context learning on pre-collected datasets, SEAL allows models to generate their own training examples and steps to update internal parameters. In other words, the model does not only adapt to new tasks. Update your internal structures to preserve new knowledge.
How the sticker works
At the center of the sticker is reinforcement learning. The model learns to create self-editing – textual instructions that lead to changes in internal parameters. This process is similar to a model of writing your own textbooks. Rather than simply reading the data, reformat the information into a learning-optimized version.
Training takes place in two phases. First, the model updates the weights a little based on the self-generated instructions (inner loop). The system then checks if the task performance has improved (outer loop). If the update turns out to be effective, it will be retained. Otherwise, it will be discarded. Over time, models become more effective in teaching themselves.

Interestingly, the architecture of Seal can be split into two parts. One AI module acts as a “teacher”, generating self-editing, the other acting as a “student”, updating itself based on those instructions. This setup may prove particularly valuable in enterprise applications that require highly specialized training workflows.
From theory to practice
The seal framework was tested in two areas: integration of new knowledge and learning from a few examples.
In the first case, the model was tasked with memorizing facts from the text and answering questions without accessing the original material. Traditional fine-tuning only provided minor improvements, but (through the generation of meaning and generation of synthetic examples) increased the accuracy of the response to 47%. In particular, this result outweighed similar attempts using the stronger GPT-4.1.

In the second case, the model addressed visual problems from the Abstraction and Inference Corpus (ARC). This is a benchmark designed to test the ability of AI to abstractly infer and generalize abstractly from limited data. Here, the model not only needed to find the correct answer, but also developed its own learning strategy: the data to use, how to reformat it, and what learning pace should be followed. With the seal, the model reached 72.5% accuracy. Without reinforcement learning, performance would be four times lower, and standard context learning would produce no meaningful results.
Outlook and restrictions
Researchers say the lack of high-quality training data is quickly a major obstacle to advances in AI. The seal provides a partial solution. Allows the model to generate its own useful training signals. For example, AI systems can read scientific papers and create hundreds of explanations and takeouts to improve their understanding of the subject.
However, there are limitations to this method. Frequent updates can lead to what is known as catastrophic forgetting: loss of previously acquired knowledge. To address this, researchers propose a hybrid approach. It stores factual or frequently changing information in external memory, integrating core knowledge through seals.
There are also practical constraints. Real-time editing of model parameters is not yet feasible. Instead, the proposed solution is to use a delayed learning cycle. The model collects data throughout the day and updates itself at set intervals.
Previously, Kazinform's news agency reported on how ChatGpt weakens our minds.
