summary: New research has identified structural and evolutionary principles that govern how both children and artificial neural networks absorb language. This research bridges cognitive linguistics and deep learning to demonstrate the power of “iterative learning.” Iterative learning is the process by which a language reshapes itself over multiple generations, becoming increasingly structured and easier to learn structured data.
By building a deep linear neural network modeled after a child’s gradual learning stages, the researchers demonstrated that structural regularities emerge naturally from communication pressures and systematic errors in transmission.
important facts
- iterative evolution paradigm: Iterative learning posits that human language is not a static construct, but an evolving system that reshapes itself with each generation to maximize structural efficiency and reduce the cognitive burden of learning.
- Child brain simulation: Researchers built a deep linear neural network designed with structural learning properties similar to a child’s brain and exposed a continuous version of the computer brain to data properties that mimic human language.
- error-driven architecture: Children acquire language within a structured hierarchy, but they occasionally make non-arbitrary mistakes due to overgeneralization of the data (for example, assuming that all winged birds fly until they encounter a penguin). In transmission from generation to generation, these non-arbitrary mistakes filter the data, preserving highly structured and easily learnable linguistic patterns, while unstructured elements are systematically forgotten.
- absolute depth: To map the precise neural basis of this evolutionary trajectory, the team deployed a deep linear network. Experiments have proven that iterative learning is successful only when the network has sufficient depth and multiple processing layers. Shallow networks with fewer layers cannot fully capture the structured regularities that make languages learnable.
- The intersection of modern AI: This study demonstrates that the structural emergence seen in large-scale generative AI tools is rooted in the same cognitive principles found in child development. The architecture of a learning network and the complexity of its environment determine how effectively the learning network can absorb and transmit language.
- intersection of cognition: Lead author Dr. Devon Jarvis says that while deep linear networks and iterative learning existed as isolated concepts in separate literature, their combination proves that languages can evolve to be learned, especially by children, based on a step-by-step method that favors processing data and reusing information.
sauce: University of the Witwatersrand
New research from South Africa’s University of the Witwatersrand has important implications for understanding both human language development and the behavior of large-scale artificial intelligence language models.
Culture is important, and so is an understanding of “repetitive learning.” Iterative learning assumes that language evolves over generations (within humans and computers) and becomes more structured.
“We built a computer brain with similar characteristics to a child’s brain and compared it to the behaviors seen in a child’s brain. We then fed it data with similar characteristics found in human language and observed how this generation (version) of computer brains learned.”
“We show that computer brains pick up on structure in data, in the same way that children prefer certain properties of language for learning. We also show that datasets (language) become more structured over generations because it’s easier to learn,” said lead author Dr Devon Jarvis, lecturer in the School of Computer Science and Applied Mathematics (CSAM) and fellow at the Wits Institute for Machine Intelligence and Neural Discovery (MIND).
Their findings were recently published in a paper titled: Compositionality and systematicity emerging from iterative learning in deep linear networks in a prestigious magazine Proceedings of the National Academy of Sciences (PNAS).
it all starts in childhood
Jarvis explains that children have an amazing ability to rapidly learn language during the early stages of development. They learn about the world in hierarchies. Start with basic concepts and gradually understand more complex concepts.
“First, they learn that plants and animals are different. Second, they learn that there are different types of animals. But at some point, they develop a deeper understanding of the world that they haven’t yet reached,” Jarvis says.
Let’s take penguins as an example. Children learn that birds have wings and can fly. ah! However, they are confused that penguins cannot fly. Here they overestimate and make mistakes, which helps them learn new information. That is, penguins cannot fly, but they can swim. And slowly, they developed a structured understanding of the world with increasing precision.
“While this gradual acquisition of knowledge has its benefits, the study focused on the impact on generations of learners. Children learn some language from their parents and eventually pass it on to their children. Because language is complex, this transmission is error-prone.”
“Like the penguin example, these mistakes are not arbitrary but result from overgeneralization of knowledge. The end result is that the easy-to-learn parts of language are remembered and reused, while the less structured parts are forgotten. Individuals are inherently good at learning, but it is only under the pressure of communication that we discover the depth of their intelligence,” Jarvis explains.
Not all neural networks are equal
The researchers used deep linear neural networks, mathematical models that mimic the way the brain processes information, to study the neural basis of this process. They found that iterative learning only works well if the network has sufficient depth, multiple processing layers, and a sufficiently complex language. Shallow networks with fewer layers were unable to capture the structured regularities that make languages learnable.
This suggests that the architecture of a learning system and the richness of its environment, whether biological or artificial, play an important role in how well linguistic structures are absorbed and transmitted. This point also applies to recent advances in generative AI models, which are highly dependent on the scale of emerging capabilities.
Jarvis continued, “Some of this research has been published in various publications for some time. Deep linear networks are a well-established model of child development, and iterative learning has been known to linguists for many years.”
“But combining these two perspectives seems to yield a useful point: that languages evolve to be learnable based on very specific properties: the way children learn in stages, preferring to reuse information rather than learning new things.”
“The fact that this was demonstrated in a very simple version of the technology that underpins the modern AI tools boom is also encouraging, and suggests that there are fundamental principles of cognition at the intersection of multiple disciplines.”
Answers to key questions:
a: Because those mistakes are not a coincidence. They are predictable signs that the brain is trying to find order. When children overgeneralize a rule, such as assuming that penguins fly because they have wings, they are using structured shortcuts. As parents have transmitted speech to their children for generations, the messy, unstructured and difficult parts of the language are forgotten, and the easy, rule-based parts are retained and reused.
a: It depends on the depth and layers of processing. Wits researchers found that shallow networks with very few layers are completely blind to the hidden regularities of complex languages and fail to transmit structured information. Deep networks, which reflect children’s ability to learn the world in hierarchical structures, require multiple layers of depth to successfully assimilate, organize, and inherit language structures.
a: Proving that the fundamental principles of human cognition are exactly the same forces that drive modern artificial intelligence. The modern boom in generative AI tools relies heavily on large computational scale and layered depth to achieve their breakthrough capabilities. This study shows that even a very simple and deeply linear version of this technology accurately reproduces the way human language evolves to the point where it can be learned.
Editorial note:
- This article was edited by the editors of Neuroscience News.
- Journal articles were reviewed in full text.
- Additional context added by staff.
About this AI and language learning research news
author: Shirona Patel
sauce: University of the Witwatersrand
contact: Shirona Patel – University of the Witwatersrand
image: Image credited to Neuroscience News
Original research: Open access.
“Compositionality and systematicity emerge from iterative learning in deep linear networks” by Devon Jarvis, Richard Klein, Benjamin Rothman, and Andrew M. Sachs. PNAS
DOI:10.1073/pnas.2509739123
abstract
Constructibility and systematicity emerge from iterative learning in deep linear networks
Humans have an amazing ability to systematically generalize, that is, to combine aspects of previous experience to make inferences about new situations. Languages provide one of the primary examples of this ability, and modern machine learning draws much of its inspiration from linguistics.
A recent example is iterative learning. This is the procedure in which a generation of the network learns from the output of previous learners. As a result, the “language” of the network, or output labels for specific inputs, is refined towards a compositional structure.
Here, we theoretically study the emergence of compositional languages and the ability of simple neural networks to exploit this compositionality and generalize systematically.
Building on previous theoretical work on linear networks that mathematically defines systematic generalization, we apply the analysis of shallow and deep linear networks to iterative learning procedures by a) deriving the precise dynamics of learning over generations; b) Refine the definition of systematicity to understand the benefits and limitations of iterative learning.
We find that iterative learning promotes systematic generalization to standard training paradigms by revealing the compositional substructure of the output labels.
Our results confirm the long-held assumption that configurational structures require multiple generations of iterative learning to emerge and can outperform single-generation networks trained with optimal early stopping.
However, for the network to process the input systematically and ignore features that do not generalize, the network must be trained on a very large dataset. Therefore, we define “weak systematic generalization” to explain this new systematicity from the perspective of scale.
