Cultural accumulation has driven infinite and diverse advances in human capabilities throughout history. Combining individual exploration and intergenerational information transmission, cultural accumulation builds an ever-expanding body of knowledge and skills. Given the enormous success of cultural accumulation in nature, exploring its possible applications in artificial learning systems represents a promising, yet underexplored, direction.
In a new paper Generations of artificial intelligence: Cultural accumulation in reinforcement learningA research team from the University of Oxford and Google DeepMind presents a way to achieve cultural accumulation in reinforcement learning (RL) agents, opening up new avenues for modeling human culture through artificial systems.

The research team emphasizes that the potential of RL agents to accumulate culture is largely untapped. Traditional RL approaches typically focus on improvement within a single lifetime. Existing generational algorithms are unable to capture the open-ended, emergent nature of cultural accumulation that allows for a balance between innovation and imitation.
Based on establishing the social learning capabilities of RL agents, the researchers found that a training setup that balances social and independent learning promotes cultural accumulation. These accumulating agents perform better than lifetime-only trained agents with comparable cumulative experience.

The researchers present two formulations of cultural accumulation in RL: in-context accumulation, which involves rapid adaptation to new environments, and in-weight accumulation, which involves a slower process of updating weights. The in-context setting is similar to short-term knowledge accumulation, while the in-weight setting represents long-term skill-based accumulation.


The effectiveness of both models is demonstrated by sustained intergenerational performance improvements in a range of tasks requiring exploration under partial observation. In each task, accumulating agents outperform agents learning in a single lifetime, even with the same total experience budget. Notably, this cultural accumulation arose solely from individual agents maximizing independent rewards with no additional losses.
To the researchers' knowledge, this is the first study to present a general model for realizing emergent cultural accumulation in reinforcement learning. This breakthrough opens new avenues for creating more open-ended learning systems and provides new opportunities for modeling human culture.
paper Generations of artificial intelligence: Cultural accumulation in reinforcement learning It's on arXiv.
author: Hecate | Editor: Chan Chan
