How will we know if a bot actually achieves artificial general intelligence (AGI)? The folks at Google DeepMind have come up with an empirical, scientifically-grounded framework for measuring progress toward AGI, and we’re looking for a few talented developers to put it into practice.
In the past, “artificial intelligence” was used to refer to essentially indistinguishable machines that perform a variety of thinking-like tasks. But as machine learning applications, starting with OpenAI’s ChatGPT, captured the public’s imagination and the tech industry’s growth hype spun into hyperdrive, the term “AI” was even defined to mean computer programs that use large-scale matrix multiplication to successfully perform complex tasks with relatively little oversight. With these goalposts moving, what AI previously meant has been replaced by the loosely defined term “AGI.”
DeepMind wants to be precise about that definition. A team at Google’s AI R&D shop reported this week that it has developed a “cognitive taxonomy” to measure the technology industry’s progress toward universally useful AGI, along with a three-step test to benchmark the performance of AI systems against human capabilities.
Apologies to those expecting groundbreaking psychological insights and ideas here, but what the researchers are proposing is simple. As the team explains in their paper, running an AI model and a human on the same cognitive benchmark (such as DeepMinders) provides a good estimate of whether a single AI can match or exceed human performance in all 10 areas of classification. The classification is divided into two main areas.
First, the eight fundamental components of human cognition (perception, generation, attention, learning, memory, metacognition, and executive function) previously defined by other researchers.
According to DeepMind researchers, these eight components combine in different ways to form two equally important composite abilities. problem-solving ability and social-cognitive ability, defined in the paper as the ability to process and interpret social information and respond appropriately in social situations.
The concept that DeepMind’s AI capabilities are mapped to 10 areas of its classification and matched against human performance – click to enlarge
Classification methods are all well and good, of course, but without a system to test how an AI model performs against humans, it’s not very useful. So the Google team proposed a hackathon to help the community.
“We are launching a new Kaggle hackathon to design assessments for the five cognitive abilities with the largest assessment gaps: learning, metacognition, attention, executive function, and social cognition,” the team explained.
The contest has a $200,000 prize pool, with several entries already published and underway. Two teams in each of the five areas will each receive $10,000, and the four overall winners will each receive $25,000.
At this point, it is widely believed that AGI is a long way off, with some experts declaring it a complete fantasy and a waste of time. There isn’t even a clearly agreed-upon definition of it, other than that it is an AI that can perform well in a variety of fields. The DeepMind team wasn’t too clear about what they thought AGI meant, beyond saying it was “often used as an abbreviation to describe various types of advanced AI systems.” While we’re worrying about definitions, we need to actually start measuring our progress.
The researchers hope that by doing something to help measure progress toward AGI, they can “move the conversation about AGI from subjective claims and speculation to grounded, measurable scientific endeavors.”
Hackathon winners will be announced in June. ®
