OpenAI reports it is nearing a breakthrough in “inference” AI, and publishes framework of progress

An illustration of a robot with multiple arms.

OpenAI recently unveiled a five-tiered system to measure progress toward developing artificial general intelligence (AGI), an OpenAI spokesperson said in a statement to Bloomberg. The company unveiled the new classification system to employees at a company-wide meeting on Tuesday, aiming to provide a clear framework for understanding AI advancements. However, the system describes hypothetical technologies that don't yet exist and could best be interpreted as a marketing stunt to attract investment funds.

OpenAI has previously said that AGI — a vague term describing the hypothetical concept of AI systems that can perform new tasks like humans without specialized training — is now its main goal. The pursuit of technology that can replace humans in most intellectual tasks has driven much of the hype around the company over the years, but such technology would likely be hugely disruptive to society.

OpenAI CEO Sam Altman has previously stated that he believes AGI could be achieved within the next decade, and much of his public messaging has been about how the company (and society at large) will deal with the disruptions that AGI will bring. In that sense, a ranking system that communicates the AI milestones the company has achieved on the road to AGI makes sense.

OpenAI's five levels, which it plans to unveil to investors, range from current AI capabilities to systems that could potentially govern entire organizations. The company believes its technology (such as GPT-4o, which powers ChatGPT) is currently at Level 1, which includes AI capable of conversational interaction. But OpenAI executives have reportedly told staff that it's looking to reach Level 2, called “Reasoners.”

Bloomberg lists OpenAI's “Five Stages of Artificial Intelligence” as follows:

Level 1: Chatbots, conversational AI
Level 2: Reasoner, human-level problem solving
Level 3: Agents, systems that can take action
Level 4: Innovators, AI that can help invent
Level 5: Organizations, AI that can handle organizational tasks

A Level 2 AI system is reportedly capable of basic problem-solving abilities on par with a human with a PhD who has no access to external tools. During the all-hands meeting, OpenAI executives demonstrated a research project using the GPT-4 model, which researchers believe shows signs of approaching human-like reasoning capabilities, according to a person familiar with the discussions who spoke to Bloomberg.

The higher levels of OpenAI's taxonomy describe hypothetical AI capabilities of increasing power: Level 3 “agents” can perform tasks autonomously for days. Level 4 systems generate new innovations. At the pinnacle, Level 5, AI is envisioned managing entire organizations.

The classification system is still under development, and OpenAI plans to gather feedback from employees, investors and executives and refine the levels over time.

Ars Technica asked OpenAI about the ranking system and the accuracy of Bloomberg's reporting, but a company spokesperson said they had “nothing to add.”

The problem with ranking AI capabilities

OpenAI is not the only company trying to quantify AI levels of competence. As Bloomberg points out, OpenAI's system is similar to the levels of self-driving cars that automakers are planning. And in November 2023, Google DeepMind researchers proposed their own five-tiered framework for evaluating AI progress, indicating that other AI research labs are also exploring ways to rank things that don't exist yet.

OpenAI's classification system is also somewhat similar to Anthropic's “AI Safety Levels” (ASL), which was first published by the makers of the Claude AI assistant in September 2023. Both systems aim to classify AI capabilities, but focus on different aspects: Anthropic's ASL focuses explicitly on safety and catastrophic risks (such as ASL-2, which refers to “systems that show early signs of dangerous functionality”), while OpenAI's levels track general functionality.

But AI classification systems raise questions about whether AI progress can be meaningfully quantified, and what constitutes progress (or, as in the case of Anthropic, what constitutes a “dangerous” AI system). The tech industry has a history of overestimating AI capabilities, and linear progression models like OpenAI's risk fostering unrealistic expectations.

There is currently no consensus in the AI research community on how to measure progress toward AGI, or whether AGI is a well-defined or even achievable goal. As such, OpenAI's five-tier system should be seen as a communications tool to attract investors that showcases the company's ambitious goals, rather than a measurement of scientific or technological progress.

Source link