Misuse of AI forces higher education institutions to rethink assessment

Many college students are now using artificial intelligence to complete or cheat on assignments, suggesting that universities need to change the way they evaluate students, a new study from Cornell University has found.

An analysis of survey responses from more than 95,000 students at 20 public research universities in the United States found that nearly one-third regularly used generative AI (GenAI), such as ChatGPT and other models that generate text, video, and code, to complete assignments, and 9% used it to cheat.

“Assessment reform is necessary and urgent,” said study co-author Rene Kizilcec, associate professor of information science in the Cornell Ann S. Bowers College of Computing and Information Sciences and director of the Future of Learning Lab. “The fact that students are abusing GenAI is an issue for the validity of the assessment, and it is an issue for the reliability of university credentials.”

The new study, “The Uses and Misuses of Generative AI to Call for Assessment Reform in Higher Education,” was published in the journal Science on May 21.

Kizilcec partnered with Igor Chirikov, a senior research fellow at the Center for Higher Education Research and director of the Student Experience in the Research University (SERU) consortium at the University of California, Berkeley, to investigate the use and abuse of AI among college students. SERU sends out a survey to undergraduate students each year, asking for their opinions on engagement, belonging, affordability, and other topics.

Questions about GenAI use collected during the 2023-24 school year were the largest survey of its kind at the time, allowing researchers to categorize responses by category.

“We wanted to provide a more evidence-based approach to how students are actually using and, more importantly, abusing AI,” Chirikov said. “Even this early evidence shows that we face very serious challenges, and universities need to address them.”

Overall, 37% of students report using AI at least monthly, with higher adoption rates in fields that require large amounts of data analysis. The percentages vary, with 62% of computer science students reporting regular use, compared to 24% of liberal arts students.

The study also showed demographic differences in the use of GenAI. The researchers found that 33% of female students reported using GenAI regularly, compared to 45% of male students. Regular ridership was also lower for people from underrepresented racial minorities at 29%, compared to 39% for white and Asian students.

These demographic differences may reflect equity gaps in the use of AI tools, the researchers said. Furthermore, they warn that these gaps may widen as GenAI tools become more specialized and expensive.

“These disparities can shape both the learning and the familiarity with tools that students have as they graduate from college and then enter the labor market,” Chirikov said.

To accurately estimate the rate of cheating (something students may be reluctant to admit), the researchers used a technique called a list-randomized experiment. They provided a short list of statements and asked students, not which statements were true, but how many statements were true. By including additional statements about fraud in some studies and not in others, we may be able to estimate the rate of AI abuse.

Overall, the number of students who used AI to cheat was lower than anecdotal reports suggested, the researchers said. Users who use GenAI daily have the highest fraud rate at 26%, compared to 7% for monthly users.

“As we expect the use of GenAI among students to only increase, for better or worse, we also expect the misuse of GenAI to increase, and that is concerning,” Kizilsek said.

The study’s authors call for changes in the way universities evaluate students to promote academic integrity. They suggest three strategies. Professors can return to a highly controlled testing environment with just a pen, paper, and a proctor. We can set clearer guidelines for allowing the use of AI. Alternatively, you can tailor your assessment to incorporate AI in a way that shows off your professional skills.

Due to the differences between disciplines, the researchers suggest that professional societies play a role in determining how best to assess students’ disciplinary learning in the AI era. However, they caution that universities also need to be mindful of inequalities in AI literacy and access among students.

“If we rely on GenAI or are not careful in how we implement new assessments that integrate GenAI, we could inadvertently exacerbate long-standing educational disparities,” Kijirusetsu said.

Ivan Smirnov from the University of Technology Sydney is a co-author of the paper.

Patricia Waldron is a writer in the Cornell Ann S. Bowers College of Computing and Information Sciences.

Source link