New “AI scientists” are making progress, but their fundamental limitations are becoming clear

Many of the most exciting discoveries in science involve highly specialized knowledge and connections between distant facts. scientists need to combine deep analysis by wide Reasoning strategies.

As with many information-rich tasks, researchers are turning to artificial intelligence (AI) systems to speed up their work. AI tools can support critical steps such as idea generation, reviewing existing work, and analyzing data.

Modern systems use large-scale language models (LLMs) to enable scientists to interact naturally and directly with the vast body of knowledge captured in the language of the scientific literature.

But language alone has its limits when it comes to science, as two new systems described in a paper just published in Nature show.

What AI is bringing to science

Many organizations, such as Sakena AI, are trying to automate the entire scientific process. To date, these efforts have primarily focused on computer science, and “experimentation” has primarily involved designing and writing code.

But the Agents4Science conference held at Stanford last October featured a wide range of AI-generated papers. They covered topics ranging from mechanical engineering and protein design to a system called BadScientist that intentionally produces “compelling but unhealthy” research.

I have previously raised concerns about the impact that AI scientists will have on the scientific ecosystem. Recent studies support these concerns, showing that the quantity of both articles and peer reviews is increasing but the quality is decreasing, identifying fabricated references within published works, and discovering fabricated and misleading images.

What scientists are doing with AI

Clearly, we cannot trust an AI system to perform the entire scientific process on its own. But what about leveraging AI to help scientists achieve more, faster?

This is the purpose of two new systems described in Nature: Robin, created by the nonprofit Future House, and Google DeepMind’s Co-Scientist.

Both systems aim to collaborate with scientists to accelerate scientific discovery. Both are also “multi-agent” AI systems. That is, it is constructed as a collection of specialized agents, each targeting a specific step in the scientific discovery process, and coordinated by a “supervisor” agent.

The agents that make up Co-Scientist are intended to reflect on abstract cognitive tasks, such as a “reflection agent” that acts as a critical scientific peer reviewer evaluating the quality of a hypothesis. A “ranking agent” debates research hypotheses in a “tournament” using multiple interacting LLMs to simulate debates about the relative merits of two hypotheses.

Robin’s agents, on the other hand, are more focused on specific tasks related to drug repurposing, with the aim of identifying new drugs for specific diseases. One agent focuses on selecting experimental tests, and the other agent analyzes complex biomedical data.

How do the results stack up?

Co-Scientists can evaluate the quality of the generated proposals using a method called Elo ratings, which is well known for ranking chess players. Co-Scientist’s self-assessments of the novelty and impact of its output are in very good agreement with human expert preferences and judgments made by other LLM systems.

In a drug repurposing experiment, collaborators selected 30 drug candidates as promising treatments for a type of cancer called acute myeloid leukemia. Expert (human) oncologists refined the list, and five drugs were tested in the lab. Of these, three showed some positive results, and one appeared particularly promising.

Other experiments showed the potential for collaborators to explore multiple drug combinations.

Notably, the collaborators’ predictions have not been compared with the numerous targeted computational and machine learning methods for drug repurposing developed through decades of computational biology research. This means we don’t know whether a new general-purpose tool will perform better than a more specific AI approach.

Neither system has yet reached the point where the hypotheses can be directly tested, and actual physical experiments are required. Both also rely heavily on human input to define important scientific questions, sense-check predictions, and prioritize predictions for further investigation.

Co-scientists primarily focus on generating hypotheses through sophisticated reasoning agents, leaving validation and interpretation to subsequent steps. Robin also uses agents to analyze data generated from real-world experiments.

Robin was used to propose 30 drug candidates for a condition called dry age-related macular degeneration. The top five were selected for testing.

Robin also made suggestions for experiments, but some suggestions were overridden by human scientists. Through several rounds of brainstorming and analysis, two drugs were identified as promising.

When we tested Robin’s individual agents, we found that agents that delved into early research performed better at this task than the general-purpose LLM. Analytical agents were less successful at questions related to statistics and bioinformatics and relied heavily on prompts provided by humans.

Language alone has its limits

AI helps scientists navigate the vast amount of documented knowledge that humans have acquired over thousands of years. The use of computation to find patterns in large datasets, integrate dispersed information, and drive new discoveries from existing literature has already contributed to scientific progress for decades.

New models such as Robin and Co-Scientist represent a shift towards working directly in the global realm. language It’s not the realm of raw data, it’s the realm of science. This enables more natural collaboration between scientists and machines through language-based “discussions.”

However, more natural does not necessarily mean more effective. Language-based communication can be imprecise and ambiguous, and science must be specific.

We are planning to introduce a model that combines these advantages. They aim to connect structured quantitative data to concepts and relationships that explain the underlying core facts.

Such models are based on scientific reasoning. structure of knowledge. These allow us to connect scientific evidence ranging from genome sequences and protein structures to cell imaging.

Words are a means of communicating science. An AI tool that can easily understand the information hidden in all those words is certainly valuable. But the complexity of the natural world means that AI (collaborative) scientists can only be truly effective if they can go beyond just stringing words together to model the full complexity of the systems the words represent.

Source link