The brain's timing when understanding speech is consistent with the gradual layers of modern large-scale language models (LLMs), according to new research.
This evidence comes from direct brain recordings collected while people listened to a single 30-minute story.
Brain recordings were analyzed in parallel with model representations of systems such as GPT 2 and Llama 2.
When comparing deeper model layers, we observed a pattern of later peaks of work in the language domain, suggesting more integrated processing at these points.
Collaborators from Jerusalem, Princeton, and the Industrial Research Institute focused their work on specific brain regions, such as Broca's area and superior temporal gyrus.
How meaning is constructed in the brain
The researchers used electrocorticography. Electrical recordings from thin grids placed over the cortex during clinical monitoring. This technique captures fast activity associated with local neural firing.
There is long-standing evidence that the high-frequency power of these recordings tracks the activity of nearby neurons.
In Broca's area, the peak match between the brain signal and the model layer shifted anteriorly over time as the layer deepened.
A correlation of 0.85 between layer depth and delay has been reported. This pattern indicates a temporary accumulation of information rather than a single step jump.
Link to deep language model
The study was led by Dr. Ariel Goldstein of the Hebrew University of Jerusalem. His research focuses on how the brain encodes natural language and its links with deep language models.
“What struck us most was that the brain's temporal unfolding of meaning corresponds so closely to a series of transformations within a large-scale language model,” Dr. Goldstein said.
“These systems are built in very different ways, but both seem to converge on a similar gradual build toward understanding.”
Layers reveal signals in the brain
The researchers observed the clearest temporal progression in higher language areas rather than in early auditory cortex.
This is not surprising since these later regions integrate context accumulated over hundreds of milliseconds.
Words that the model predicted well had stronger and earlier matches than words that it didn't predict, suggesting a shared expectation about what would happen next.
Overall network processing time
At the temporal pole, the separation between the earliest and latest layer-aligned peaks was >500 ms. This suggests that there is a longer period of time near the apex of the language pathway.
These results reflect previous work on the temporal receptive window, the period during which previous inputs shape responses in different parts of the cortex.
This study reveals a progressively longer processing window from the sensory cortex to the narrative hub, and this hierarchy is replicated in new data.
Across the ventral language stream, the superior anterior temporal gyrus and temporal pole showed steeper timing gradients than the middle superior temporal gyrus.
This pattern fits a hierarchy in which representations spread over longer spans as processing climbs the path.
Context and word meaning
Classical symbolic features did not predict time-locked brain activity very well. This includes phonemes, which are perceptually distinct units of sound that distinguish words, and morphemes, which are the smallest units of meaning.
Contextual embedding resulted in stronger matches. These are vector representations that encode the meaning of a word along with its surrounding context.
This does not make the rules irrelevant, but it does suggest that distributed contexts may carry more load during natural listening.
Research limitations
Independent evidence shows that using similar podcast-style stimuli, we already associate brain signals with next-word prediction, surprise, and contextual expressions.
This study helps explain why layered sequences appear in the new data, without suggesting that the cortical and transformer structures are identical.
Similarity is not identity. Transformers are designed to process long stretches in parallel during training, whereas cortical circuits operate with biological constraints and serial timing.
Be careful when declaring equivalence. Limits are important. The samples were taken from nine epilepsy patients who had electrodes fitted for clinical reasons, and coverage varies by individual.
Future tests that manipulate predictability and control for acoustic details will help separate true expectations from simple takeover of previous context.
Ideas that go beyond the layers of the brain
Alongside the paper, the authors published a public dataset from nine participants. This dataset contains a direct recording of every word of a 30-minute story.
This dataset anchors claims with shareable evidence and facilitates direct comparisons between semiotic and learning-based theories.
When benchmarks are clear and accessible, they determine progress. It combines natural speech with subsecond neural dynamics and open model sampling codes to turn theories into testable claims, allowing you to prove better ideas than just propose them.
The research will be published in a journal nature communications.
—–
Like what you read? Subscribe to our newsletter for fascinating articles, exclusive content and the latest updates.
Check us out on EarthSnap, the free app from Eric Ralls and Earth.com.
—–
