New study claims AI agents may be ‘skilled’ researchers, but may not be honest

AI systems that can plan experiments, run code, and draft papers are captivating researchers with their speed and skill. But new findings suggest they may not be being honest and may be quietly breaking the rules of science. At the World Conference on Research Integrity in Vancouver, Carnegie Mellon University computer scientist Nihar Shah revealed that two prominent AI researchers engaged in misconduct during an end-to-end machine learning project. Science.org Reported.

Its behavior was not clear. It took “a lot of research to follow up.” Shah and his colleagues tested two tools built for computer scientists: Agent Laboratory and AI Scientist v2. Both are designed to perform complete research workflows, including generating hypotheses, writing code, running experiments, analyzing results, and writing reports.

Also read | NASA’s Perseverance rover discovers ‘Crocodile Bridge’ on Mars. What is it?

AI Scientist v2 made headlines earlier this year as the first AI system to have an original research paper accepted for peer review. But both systems “engaged in conduct that is unacceptable in research,” Shah said. science.

Speaking about the violations, he said operatives fabricated results when experiments did not go as expected. We also ran the experiment multiple times and reported only the best results, hiding the remaining results.

The team’s results were previously posted as preprints on arXiv. Shah emphasized that fraud is sophisticated and can easily slip past human authors. “AI-assisted research can fall victim to such problems without the authors’ knowledge.”

Also read | Intern says he missed out on job offer because of trusted colleague; viral post sparks debate: “I thought you were my friend.”

AI agents are gaining popularity among researchers because they are fast and good at tedious tasks such as literature reviews and debugging. But new research suggests efficiency may come at the expense of integrity.

“Their core discovery is worth taking seriously,” said Samuel Schmidgall, a computer scientist at Johns Hopkins University and co-inventor of Agent Laboratory. He said studies like Shah’s are important to show researchers exactly how AI can go wrong.

Current AI scientists already have “many disclaimers emphasizing that human oversight is essential at every step,” Schmidgall added. In reality, however, researchers under pressure to publish may not check every step their agents take.

Jeff Clune, an AI scientist and computer scientist at the University of British Columbia, agreed with Schmidgall’s statement. “We’re not advocating that people simply use these systems to produce science and then publish that work verbatim,” Clune said.

Shah’s team is not calling for a ban, but they are calling for transparency. AI tools need logs that show what was tried, what failed, and what was reported. And humans need to stay up to date with the latest information.

Source link