Tools like ChatGPT can help, but transparency is essential, say Mohammad Hosseini and Serge Holbach.
In December 2022, we asked ChatGPT to write an ironic review of the first preprint covering Covid-19 research. The preprint, however, is “another example of questionable research coming out of China” and is unreliable due to “a lack of transparency and credibility in the Chinese research community.”
In January 2023, I asked ChatGPT to repeat the task. “The purpose of the review is to provide an impartial and objective assessment of the strengths and weaknesses of the studies, not to be cynical or negative.”
The interaction of the two shows that generative artificial intelligence is being developed at an alarming speed. Also, large-scale language models (LLMs) such as ChatGPT and Bard have assisted peer review, highlighting both their potential to alleviate some of the problems that have plagued the system in recent years, as well as their potential to create new pitfalls. increase. We will discuss this in the next article. recent papers.
Automating peer review predates generative AI. It has become common for computers to assist in certain tasks such as screening references, detecting plagiarism, and checking compliance with journal policies. However, generative AI can significantly increase both the number of automated tasks and the degree of automation, which can benefit certain stakeholders within peer review systems.
For example, generative AI helps editors find reviewers and write decisions, helping reviewers more efficiently produce constructive, respectful, and readable reports. It also helps editors, reviewers, and authors focus on the content of their manuscripts and peer-review reports, not on issues such as grammar and formatting.
We can also increase peer review diversity further by assisting qualified reviewers who have difficulty writing in academic English. Doing so may make the review less difficult and the review more constructive. All of this will increase the scale and efficiency of our review system, perhaps facilitating innovative publishing systems based on post-publication review and preprint curation.
cause for concern
However, there are reasons for concern. The LLM developers didn’t reveal how the model is trained. LLM is known to “hallucinate” and its output can be unreliable. Political and commercial considerations may make technology inaccessible and unfair in some locations. All of this creates concerns about bias, inclusivity, equity and diversity.
Enhancing Bias
Furthermore, since LLMs are trained on historical data, they are inherently conservative, so tools using such models can reproduce or amplify existing biases. Concerns about data security, intellectual property rights, copyright, and confidentiality of authors and subjects are raised by the uncertainty of how the model uses the data and prompts. It also means that while rapid enhancements in technology are desirable or necessary to keep up with the research frontier, AI-assisted reviews are not always reproducible.
In addition to all of this, there are certain issues associated with using generative AI in peer review. Being an author, reviewer and editor means being part of a community and is educational in nature. The review process builds the social foundation of the academic field and forms the platform on which norms and standards are discussed. What would be the impact on the research community if a significant part of the review process were outsourced to AI tools? So what does “peer” mean in peer review?
We believe generative AI can be used productively to support peer review, but only under certain conditions. At least transparency is required. Users, whether editors, reviewers, or others, must declare when and how they use generative AI and how it affects their work. Reviewers and editors also need training on the use of generative AI, but journals, preprint servers, and review platforms should have clear policies about acceptable use.
Such policies require oversight. Some have suggested that generative AI might be able to detect and track problematic uses, but this puts us in an arms race, which is a problem in its own right.
Thus, while generative AI has the potential to make reviews more efficient and reduce reviewer fatigue and shortages, its use is not without risks, some of which are difficult to predict. I have. Therefore, responsibly unlocking the potential of LLMs in review requires careful, controlled and transparent experimentation leading to policy development. It needs to start with conversations between researchers, publishers, journals and technology companies.
Mohammad Hosseini is a researcher at Northwestern University in Chicago. Serge Horbach is a researcher at Aarhus University, Denmark.
This article was also published in Research Europe
