The EPFL team has created a new large-scale language model that is structured similarly to the human brain. This gives users more control and allows them to move away from “black box” AI.
When a standard large-scale language model (LLM) is faced with a problem, it tries to solve the problem by matching it to similar information it has seen before, and bases its answer on those past patterns. But how we decide what information to use and what value we assign to different pieces of information can be somewhat puzzling to outsiders.
LLM MiCRo (Mixture of Cognitive Reasoners) is architecturally divided into four specialized areas that function like different parts of the human brain, giving users more control over how they approach questions and better understanding how they arrive at their answers. The model, presented at the International Conference on Learning Representations, comes from the NLP Lab, part of the School of Computer and Communication Sciences (IC), and the NeuroAI Lab, part of EPFL’s IC and Life Sciences Schools.
4 experts
To create MiCRo, the researchers identified four areas of the brain that specialize in different functions they call “experts”: language, logic, social reasoning, and world knowledge.
“The brain is organized into specialized regions, each tailored to handle a specific function. So far, this division of labor is not very clear in current language models,” says Badr Alkamisi, a doctoral candidate leading the study. “We selected four brain regions that are well known to neuroscientists and gave the model its own specialized modules, each of which was trained to resemble one of those brain regions.”
LLMs typically work as a stack of layers that can handle problems and questions. For MiCRo, each layer is divided into four different experts. For example, feed the model a sentence from layer 1, such as “The cat is sleeping.” Then, within this layer, the router becomes modular and adaptive, such as being able to choose one expert for the first word “the” but a different expert for the second word “cat.”
“Each word of the sentence could be asked to a different expert,” Alkamisi explains. “So a single sentence is actually processed by multiple experts at each tier.”
Consider a prompt like this: “Emma wants to split the cost of a 60 Swiss Franc dinner between her three friends, but she knows Jake lost his job last week and can’t say she’s struggling with pride.” A purely mathematical module handles arithmetic operations. 60 Swiss Francs divided by 3 equals 20 Swiss Francs each. But the social reasoning module recognizes something more subtle, such as Emma’s awareness of Jake’s situation, his implicit pride, and the implicit suggestion that she might quietly cover his share. Both types of reasoning are required to fully understand what is going on. MiCRo routes each aspect of a prompt to the expert best suited to handle it.
“If you look at how the model works, you’ll see that words related to social aspects are routed to the social experts, and when you do the mathematical part, those numbers are routed to the logic experts.”
This separation makes it easier to understand how the model is “thinking” and why it makes certain decisions. This also means you have control over your decisions. For example, you can decide whether to increase the influence of social experts or suppress logic experts, depending on the type of model you want to use in a particular situation.
“In traditional LLM, you can do this through prompts by telling the model whether to make the output more social or more emotionally relevant,” AlKhamissi says. “But here, this is done by intervening in the architecture itself without any prompting.”
“virtuous cycle”
To create MiCRo, the EPFL team worked with Greta Takkut, a neuroscientist at Harvard University and the Massachusetts Institute of Technology, to understand which parts of the human brain are activated by different problems and applied that learning to the model.
To identify regions in the brain similar to “logic” experts, neuroscientists gave humans demanding tasks, such as difficult math equations, and less demanding tasks, such as easy math equations, and recorded brain activity to see which brain areas were most active for the demanding and less demanding tasks. AlKhamissi’s team then did the same to the model, feeding it a demanding mathematical equation to see which experts were most activated.
“What’s great is that we used exactly what we do in neuroscience in our model, and the model was able to uniquely identify those experts.”
Neuroscience informs models, but models also inform our understanding of the brain, potentially allowing neuroscientists to discover the contribution of different disciplines to a particular problem or question. For example, a sentence activates the language domain by 20%, the mathematics domain by 50%, and the social reasoning domain by 40%.
“In my doctoral research, I was interested in this virtuous cycle between neuroscience and AI, where in one direction we take neuroscience discoveries and insights about the brain and integrate them into language models,” AlKhamissi says. “And now, with models like MiCRo, we can explore in another direction and consider how AI models can be used to better understand the brain.”
