Human and AI detectors can detect AI-generated academic papers

global

Specific artificial intelligence content detectors and experienced human peer reviewers can accurately identify AI-generated academic papers even after they have been paraphrased, thus upholding academic integrity in scientific publishing.

This was the main message that emerged from a study published in 2010 titled “The Great Detectives: Human and AI Detectors Catching Medical Text Generated by Large-Scale Language Models.” International Journal of Educational Integrity May 20th.

The study noted that the application of AI in academic writing raises concerns about accuracy, ethics and scientific rigor as some AI content detectors may not be able to accurately identify AI-generated text, especially paraphrased text.

“Effective approaches and guidelines to govern the use of AI in specific sectors are urgently needed,” the study noted.

result

Therefore, in this study, we purposefully selected 50 rehabilitation-related articles from four peer-reviewed journals and used ChatGPT to generate another 50 articles. We then used Wordtune to paraphrase the ChatGPT-generated articles.

Six popular AI content detectors (Originality.ai, Turnitin, ZeroGPT, GPTZero, Content at Scale, GPT-2 Output Detector) were used to identify AI content in the original articles, ChatGPT-generated articles, and AI-paraphrased articles. Additionally, four human reviewers (two student reviewers and two professor reviewers) were employed to distinguish between the original articles and the AI-paraphrased articles.

The study found that Originality.ai correctly detected 100% of ChatGPT-generated and AI-paraphrased texts, while ZeroGPT accurately detected 96% of ChatGPT-generated articles and 88% of AI-paraphrased articles. Turnitin also demonstrated a 0% misclassification rate for human-written articles, but was only able to identify 30% of AI-paraphrased articles.

The study found that professor reviewers correctly identified at least 96% of AI-rephrases based on “inconsistent content” (34.36%), followed by “grammatical errors” (20.26%) and “insufficient evidence” (16.15%), but misclassified 12% of human-written papers as AI-generated.

They noted that, on average, students were only able to identify 76% of the articles that were paraphrased by the AI.

Significance of the study

In a joint statement sent to University World News“This is the first study to compare the accuracy of a range of commonly used AI content detectors with human reviewers in distinguishing peer-reviewed articles from artificial intelligence-generated or AI-paraphrased articles in the rehabilitation field,” said four of the study's authors, Fadi Al Zoubi, Jay Liu, Kelvin Hui and Arnold Wong, all of The Hong Kong Polytechnic University.

“Our findings highlight that the accuracy and misclassification rates of the AI content detectors we investigated vary,” the researchers added.

“For example, one tool commonly used in academic settings showed perfect accuracy in recognizing human-written articles but struggled to identify content that was paraphrased by an AI,” the researchers explained.

“Similarly, experienced faculty reviewers detected at least 96% of AI-rephrased papers, while undergraduate and graduate students were less accurate and more likely to misclassify human-authored papers,” the authors wrote.

“These findings highlight the crucial importance of continued development and refinement of AI detection tools to balance high detection rates of AI-generated content with minimizing misclassification of human-authored text,” the researchers continued.

“Furthermore, the results suggest the importance of improving the ability of inexperienced human peer reviewers to distinguish between AI-generated and human-written content, thereby enhancing the integrity and reliability of scholarly research in the digital age.”

Actionable Insights

“Our study provides academics, universities, publishers, and peer reviewers with actionable insights for harnessing the potential of AI content detectors while protecting the integrity of scholarly articles amid the growing use of generative AI techniques,” the researchers said.

With regards to academia and universities, the authors said it is crucial for universities to establish comprehensive guidelines on the ethical use of generative AI in assignment creation and educate students about the authenticity of their work when utilising AI-generated content.

Additionally, educational institutions should implement a dual-tier screening process that combines multiple AI content detectors and human evaluation to detect AI-generated content in student submissions to ensure a fair and integrity-driven academic environment.

Regarding publishers and peer reviewers, the authors state: “Our study demonstrates the effectiveness of current peer review systems in distinguishing between AI-rephrased and human-written papers.”

“However, to further strengthen the credibility of this process, we encourage journals to incorporate at least one proven AI detection tool as a preliminary screening measure.

“This step will help identify potential plagiarism and AI-generated content before the peer-review stage, streamlining the peer-review process and preserving the academic value of published works,” they explained.

Maintain a positive attitude

“Given the rapid advances in generative AI technology and the associated evolution of content detectors, it is essential that scholars, universities, publishers, and peer reviewers remain vigilant and stay vigilant with the information,” Al Zoubi, Liu, Hui, and Wong stressed.

“By staying abreast of technological advances and continually improving our strategies and policies for detecting and managing AI-generated content, the academic community can help protect the integrity and authenticity of scholarly works.

“This proactive stance is not only about mitigating risks, but also about leveraging the opportunities that generative AI offers to enhance research, learning, and knowledge dissemination.

“As we navigate this ever-changing landscape, our study reminds us of the importance of balancing innovation with ethical considerations and quality control to ensure that academic and scientific discourse remains trustworthy,” the authors conclude.

What the experts say

Dr Mike Perkins, director of the Research and Innovation Centre at British University Vietnam, said: University World News“This is an interesting study that adds another layer to the ongoing debate about the ability of so-called AI text detectors to determine whether text is human-generated or the output of GenAI tools.”

“However, given that the authors used ChatGPT 3.5 to generate the test content, caution should be used when extrapolating results.”

“The methods of text production also do not reflect a collaborative writing process between AI and humans, and this will be an important area of investigation,” explained Perkins, lead author of the study “Guidelines for Academic Publishers for the Use of AI: A Thematic Analysis Supported by ChatGPT,” published in 2024.

Dr. Ahmed Elkatat, head of research planning in the Office of the Vice-President for Research and Graduate Studies at Qatar University, expanded further: University World News“The study highlights that these tools perform relatively well at identifying content generated by early AI models like GPT-3.5, but struggle with more advanced models like GPT-4.”

“This highlights the need for continuous improvement of AI detection techniques to keep up with advances in AI text generation and maintain academic integrity in educational settings,” explained Elkatat, who is lead author of a 2023 study titled, “Assessing the Effectiveness of AI Content Detection Tools to Distinguish between Human-Generated and AI-Generated Text.”

“This study offers important insights into the evolving landscape of AI and its impact on academic integrity. The inconsistencies observed in detection tools, particularly false positives in human-authored content, highlight the need for a multi-pronged approach that combines AI detection tools with manual review processes,” Elkatat argued.

“This approach helps to reduce the risk of academic misconduct and increase the reliability of the assessments.”

“Further research and development is essential to improve these tools and adapt them to the advanced capabilities of new AI models. Moreover, the rapid development of AI-generated text means that future AI detectors will face ongoing challenges.”

“The rate of false positives and human text misidentification is likely to increase, requiring more advanced and nuanced detection methods,” Elkatat concluded.

Source link