
Last May, Devrim Çavuşoğlu, an engineer at the Turkish software company OBSS, was looking at conference reviewer feedback on a paper he and his colleagues had submitted when one comment caught his eye: The reviewers had noticed similarities between Çavuşoğlu's work and another paper that had been accepted for a different conference on computational linguistics.
When Çavuşoğlu first scanned the other papers, he found eerily similar ideas to his own. “I thought, this is like something I'd written,” he recalled. “How could it be so similar? Were we thinking the same thing?”
Çavuşoğlu checked the accompanying source code and found that other authors had directly copied and built upon his published code without any attribution, a violation of the license that accompanied the paper. “I was shocked, to be honest,” Çavuşoğlu told Retraction Watch.
In July 2021, Çavuşoğlu and his team released the software to the public. jury, The software was made publicly available on the code-hosting platform GitHub. Designed to assess the quality of natural language generation models, the software is licensed for others to use as long as they credit the source. They decided to submit a paper describing their work to a conference in early 2023, and received comments from the conference's peer reviewers a few months later, in May.
The plagiarized paper was published in October 2022, the time between when Çavuşoğlu and his team made their code public and when they submitted their paper on the research. Çavuşoğlu said they were unaware of the other paper. “NLG-Metricverse: An End-to-End Library for Evaluating Natural Language Generation” was published by five authors from the University of Bologna in Italy and describes software they claim to have created. They mention Çavuşoğlu's project in the paper but did not disclose at the time of publication that they had used his existing code in their research.
Çavuşoğlu drew parallels with the conference committee that accepted the paper: It took eight months, and several follow-ups from Çavuşoğlu, for the committee to decide to retract it. The notice about the source code was published last week, on May 21, and noted that the retraction was “at the request of the publications chair.”
Çavuşoğlu sent an email in August last year to the committee of COLING 2022, the conference that accepted the other papers, and to the Association for Computational Linguistics (ACL), which runs the conference and publishes its proceedings. In his letter, he said: jury They identified nine instances of suspected plagiarism, including identical sections of code and sections with only minor changes. Two months later, in October, they received a response from Leo Wanner, one of the COLING 2022 committee members, who wrote that the group was looking into the issue.
Çavuşoğlu told Retraction Watch that what irritated him most was that the authors wrote: jury The paper mentions it as one of the other available resources and further attributes its own project to it. jury On the table. “They copied our source code and system architecture,” he said, but nothing was said about them building software on top of it. jury.
In his letter, he jury The authors were aware of the plagiarism and therefore reveal that the plagiarism was intentional.
While improving code is common in open source software, not crediting the original work still constitutes plagiarism, Çavuşoğlu said. If proper attribution had been done (usually by adding a license file and crediting the original author to the file's repository), “there would have been no problem with using the software,” Çavuşoğlu said. “But they didn't include that.”
The authors of the plagiarized papers did not respond to requests for comment.
In December 2023, a few months after Çavuşoğlu first noted the similarities, the GitHub file was updated to include: jury.
“Indeed, the paper you have brought to our attention lacks the necessary references to your work, which constitutes plagiarism,” Wanner wrote to Cavusoglu last month, adding that the conference's publication chairs had “long ago” asked the ACL Anthology, a library of research papers published at ACL conferences, to retract the paper.
“It wasn't easy to get in touch with the ACL authorities,” Cavusoglu said, noting that it took a long time for the plagiarism ruling to be made. “We've been working really hard to get to where we are now.” Wanner did not respond to a request for comment.
A PDF of the paper related to the code has been watermarked with a retraction, and a note was added to the GitHub page for the research last week, about eight months after Çavuşoğlu first emailed the committee.
No notice was given explaining the reason for the retraction. Matt Post, who manages the ACL's conference papers library, said updates are processed in batches each month, and the latest update is expected to be published after the end of May.
How about a retraction watch? Tax-deductible donations to support our work,Free Daily Digest Or paid weekly updates, Please follow us On Twitterlike us on facebookor contact us at your RSS ReaderIf you find a retraction, Not listed in the Retraction Watch databaseyou can Let us know hereIf you have any comments or feedback, please email us at team@retractionwatch.com..
