Research groups say it can be fixed.

Credit: Pixabay/CC0 Public Domain

AI has the potential to help doctors find early markers of disease and policymakers avoid decisions that lead to war. But growing evidence reveals deep flaws in how machine learning is used in science, a problem that spans dozens of fields and is responsible for thousands of erroneous papers.

A multidisciplinary team of 19 researchers led by Princeton University computer scientists Arvind Narayanan and Sayash Kapur has published guidelines for the responsible use of machine learning in science.

“When you move from traditional statistical methods to machine learning methods, there are far more ways to shoot yourself in the foot,” said Narayanan, a computer science professor and director of Princeton University's Center for Information Technology Policy.

“Without intervention to improve scientific and reporting standards when it comes to machine learning-based science, we risk rediscovering these crises one after the other, not just in one field, but in many different scientific fields. ”

The authors say their work is an effort to eradicate this smoldering credibility crisis that threatens to engulf nearly every corner of the research enterprise.A paper detailing the guidelines will be published May 1 in his journal scientific progress.

Because machine learning has been employed in nearly every scientific field, and there are no universal standards to protect the integrity of these methods, Narayanan believes that the current crisis, which Narayanan calls the reproducibility crisis, is a challenge in social psychology. He said it could be much more serious than the replication crisis that occurred. More than ten years ago.

The good news, according to the authors, who have backgrounds in computer science, mathematics, social science, and health research, is that a set of simple best practices can help resolve this emerging crisis before it gets out of hand. That's what it means.

“This is a systemic problem with a systemic solution,” said Kapur, the graduate student who worked with Narayanan to organize the consensus-based effort to create the new checklist.

This checklist focuses on ensuring the integrity of research using machine learning. Science relies on the ability to independently reproduce results and verify claims. Otherwise, you won't be able to reliably build new work on top of old work, and your entire enterprise will collapse.

While other researchers have developed checklists that apply specifically to field-specific problems, such as medicine, the new guidelines start with basic methods and apply to any quantitative field.

One of the main points is transparency. This checklist describes each machine learning model, including the code, data used to train and test the model, hardware specifications used to generate results, experimental design, project goals, and model limitations. Researchers are asked to provide detailed explanations. Research results.

According to the authors, the standard is flexible enough to accommodate a wide range of nuances, such as private datasets and complex hardware configurations.

Although the increased stringency of these new standards may delay the publication of certain studies, the authors believe that widespread adoption of these standards will potentially increase the overall rate of discovery and innovation. I believe it will improve significantly.

“What we ultimately care about is the pace of scientific progress,” says lead author Emily Cantrell, a sociologist who is pursuing a Ph.D. At Princeton.

“Ensuring that the papers that are published are of high quality and provide a solid foundation for future papers can accelerate the pace of scientific progress. That's where our focus should be: focusing on scientific progress itself.”

Mr. Kapur agreed. The error was painful. “On a collective level, this is just a huge time reduction,” he says. That time costs money. And once that money is wasted, it can have devastating downstream effects, limiting the types of science that attract funding and investment, and inadvertently undermining ventures built on flawed science. and potentially discourage countless young researchers.

In working towards a consensus on what the guidelines should include, the authors said they aimed to strike a balance. That is, it is simple enough to be widely adopted, yet comprehensive enough to catch as many common mistakes as possible.

They say researchers could adopt the standards to improve their work. Reviewers can use checklists to evaluate papers. Journals can then adopt that standard as a requirement for publication.

“There are a lot of avoidable mistakes in the scientific literature, especially in applied machine learning research,” Narayanan says. “And we want to help people. We want to keep honest people honest.”

For more information:
Sayash Kapoor et al., REFORMS: Consensus-Based Recommendations for Machine Learning-Based Science, scientific progress (2024). DOI: 10.1126/sciadv.adk3452. www.science.org/doi/10.1126/sciadv.adk3452

Magazine information:
scientific progress