It is necessary to maintain the integrity of research even in the age of AI

Justin Zobel* There is no doubt that artificial intelligence technology poses significant challenges to the ability to ensure research integrity, especially in doctoral research.

Emergence of digital support tools – Artificial Intelligence (AI) technologies such as ChatGPT and Stable Diffusion are often grouped under the umbrella of Artificial Intelligence (AI) Technologies, and it is important to understand how they can be used constructively when there is a problem. It has caused widespread debate in the global academic community, even questioning the meaning of the term. ‘plagiarism’.

Much of the discussion of conduct, or rather misconduct, has been about the challenges that arise from students using these tools to complete coursework, but there are also serious challenges to research integrity.

Especially for PhD or Research Masters students, known at the University of Melbourne as Postgraduate Researchers, research integrity needs to be considered in what these tools are used for and how they work.

The Appearance of Sophisticated Knowledge

A broad category of Digital Assisted Tools (DATs), including AI, has been developed over decades.

Many of us use DATs routinely and arguably. DAT corrects spelling in documents, recommends alternative queries to search engines, and suggests suggested words in text messages.

The DATs that have sparked the most recent debate are, in some ways, an integration and scaling up of years of technology rather than something new.

But they are truly remarkable and provide a mechanism for interacting with a vast collection of human discourse. This mechanism, at this early stage, looks as disruptive as the advent of the web, search engines and smartphones.

Over the years, students have used similar tools based on similar technologies when writing essays and summarizing papers.

Other established applications include language translation, clarification, paraphrasing, and reformulation of document presentation.

In some ways, these modern tools are not new challenges.

But they are undoubtedly much richer than their predecessors.

Best of all, the text and images generated are new, and the controlled use of randomness in making selections ensures that repeated prompts never yield the same output.

This confounds current approaches for detecting violations of research integrity.

But in essence, these tools are just pattern matching machines.

Humans consider what knowledge they want to convey, create statements that describe that knowledge, and revise, edit, and relate to ensure that the content is accurate and consistent.

In contrast, DAT is only concerned with generating sequences of words implied by words already seen or generated.

They do not use knowledge banks or databases that capture meanings, facts, and concepts, but only inferred relationships from observed texts.

That’s why they look like they’re making things up or making things up. No external verification.

Even if all inputs are “true” or “correct”, the output can be erroneous. This is because there is no semantic awareness, meaning that semantic consistency is part of the production process.

Yet the output they produce seems very human.

On the one hand, there is a disturbing dissonance between their lack of reasoning and their lack of use of factual information.

On the other hand, they appear confident and sophisticated in their knowledge.

Superficial plausibility, and the impression that you are interacting with a cognitive being, is psychologically compelling but deeply misleading.

Serious policy issue

Academic publishers have responded to the advent of DATs.

for example, Nature and Elsevier It issued a policy statement restricting the use of AI-generated text in publications, prohibiting the inclusion of AI as authors, and setting guidelines for recognizing the provenance of texts.

The University of Melbourne has likewise issued a statement on the use of these tools in research writing, which aligns closely with those of the publishers mentioned above.

Explain how the use of DATs may violate research integrity policies.

Briefly, it requires that material produced or substantially modified by DAT be recognized as such, and that AI cannot be listed as an author.

The lack of strong or agreed-upon tools to detect the use of DATs in dissertation writing (and other material produced by PhD holders) is not enough to take a position on what constitutes ethical practice. It doesn’t mean it needs to be weakened.

Some uses of DAT are detected by other means. For example, obvious changes in style, semantically inconsistent texts, and discrepancies in knowledge between writing and speaking. And you’d expect a dedicated thesis supervisor to eventually find that the candidate is working hard. with deception.

But these technologies undoubtedly pose a major challenge to our ability to ensure that work is done properly.

understanding and communication

The advent of these new technologies calls into question assumptions about teaching methods, teaching purposes, and whether and when DAT use really matters.

In my view, the use of DATs by PhD candidates is indeed problematic for a variety of reasons.

The obvious is the same as for coursework students. Candidates are asked to provide their own original text, as the thesis exam aims to assess the candidate’s comprehension and communication skills.

If texts from other sources are provided, their evaluation is compromised.

Currently, it is not possible to reliably identify which text was generated by DAT. This difficulty can be exacerbated as DATs become more sophisticated.

There are stylistic indicators, but these are just indicators, not the ironclad evidence needed to prove the tools were used.

That said, the unreliability of DAT-generated text means that it is very risky to include more than a small piece in your paper.

However, there are various other concerns.

PhD candidates can be misled by essays and absurd summaries on topics of interest.

Inability to communicate clearly and lack of basic knowledge can be masked not only in papers, but also in emails, proposals, progress reports, etc.

Some candidates already use DAT for translation to understand content in other languages. Organizing the translated output in another her DAT can lead to even more garbled content.

There are also legal issues.

One is ownership.

Unauthorized use of DAT-generated text is against current copyright law.

Another is intellectual property (IP) disclosure.

If the prompt entered into the DAT is about innovation, retention of the prompt by the DAT means the IP is lost to the author.

Some speculate that these AIs herald a future where human communication skills are no longer needed.

But such a future does not yet exist for us and, in my view, remains far away.

Until that arrives, we will continue to expect PhD candidates to speak up, but we will be concerned if they are unable to do so without digital or other assistance.

Banner: Generated by MidJourney in response to the prompt ‘Robot typing at the desk in the university library’.

* Justin Zobel Pro Vice-Chancellor (Graduate and International Studies). Redmond Barry He is Distinguished Professor, Department of Computing Information Systems, Faculty of Engineering and Information, University of Melbourne.

This article was originally posted on tracking.unimelb.edu.au.

Source link