AI Advances: Learn to read from position to meaning

The language capabilities of today's artificial intelligence systems are incredible. You can engage in natural conversations with systems such as ChatGpt, Gemini and many other systems. However, little is known yet about the internal processes of these networks that lead to such surprising results.

A new study published in the Journal of Statistical Mechanics: Theory and Experiment (JSTAT) reveals part of this mystery. When a small amount of data is used for training, it indicates that the neural network initially depends on the position of the word in the sentence. However, the system is exposed to sufficient data and moves to a new strategy based on the meaning of the word. In this study, we find that this transition occurs suddenly, like a phase transition in a physical system when important data thresholds cross. The findings provide valuable insights to understand the behavior of these models.

Just as children learn to read, neural networks begin with understanding sentences based on the position of words. Depending on where the word is in the sentence, the network can infer the relationship (subject, verb, object?). However, as training continues, the network “continues to school” – shifts occur. The meaning of the word is the main source of information.

This is what new research explains, and happens in a simplified model of autocatalytic mechanisms. This is the core building blocks of trans language models, such as ChatGpt, Gemini, Claude, etc., that we use every day. Trans is a neural network architecture designed to process data sequences such as text, and forms the backbone of many modern language models. Transformers specialize in understanding relationships within sequences and use autocatalytic mechanisms to assess the importance of each word relative to other words.

“To assess relationships between words,” explains Hugo Kui, a postdoctoral researcher at Harvard University and a first author of the study. “The network can use two strategies. For example, in a language like English, subjects usually precede a verb. The verb precedes an object. “Mary eats an apple” is a simple example of this sequence.

“This is the first strategy that will spontaneously appear when a network is trained,” explains CUI. “However, in our study, if training continues and the network receives sufficient data, then at a certain point in time (when the thresholds intersect), the strategy changes suddenly. The network instead begins to rely on meaning.”

“When we designed this work, we simply wanted to study which strategies, or combinations of strategies, but we adopted the network. But what we found was somewhat surprising. Beneath a certain threshold, the network was only positioned on it.

CUI describes this shift as a phase transition and borrows the concept from physics. Statistical physics research systems consist of a huge number of particles (atoms and molecules, etc.) by statistically describing collective behavior. Similarly, the neural networks that are the basis of these AI systems are made up of numerous “nodes” or neurons (named by their similarity to the human brain), each connected to many others to perform simple operations. The intelligence of the system arises from the interaction of these neurons. These neurons are phenomena that can be explained in statistical ways.

This is why we can talk about the sudden change in the behavior of the network as a phase transition, as well as water under certain temperatures and pressures.

“It is important to understand from theoretical perspective that strategy changes occur in this way,” CUI emphasizes. “Our networks are simplified compared to the complex models that people interact with each other daily, but they can provide hints for models to begin to understand the conditions that stabilize some strategy. We hope that this theoretical knowledge can be used in the future to make the use of neural networks more efficient and secure.”

The study by Hugo Cui, Freya Behrens, Florent Krzakala and Lenka Zdeborová is entitled “Phase transition between position learning and semantic learning in a resolvable model of DOT-Product Anteress,” published in JSTAT as part of the Machine Learning 2025 special issue and is included in the Neurips 2024 Comperations procedure.

/Public release. This material of the Organization of Origin/Author is a point-in-time nature and may be edited for clarity, style and length. Mirage.news does not take any institutional position or aspect, and all views, positions and conclusions expressed here are the views of the authors alone.

Source link