AI outperforms humans in moral judgment

summary: People often consider AI-generated answers to ethical questions to be better than human answers. In this study, participants evaluated responses from AI and humans without knowing the source, and overwhelmingly supported the AI's responses in terms of morality, intelligence, and trustworthiness.

This modified Moral Turing Test is inspired by ChatGPT and similar technologies, and allows AI to demonstrate complex moral reasoning that could convincingly pass the Moral Turing Test. It shows that. The findings highlight the growing influence of AI in decision-making processes and its potential impact on societal trust in technology.

Important facts:

Superior AI performance: Participants consistently rated AI-generated responses to ethical questions more preferable than human responses.
Modified Turing test approach: The study employed a variation of the Turing test in which participants were unaware of the AI's involvement and instead focused on the quality of their responses.
Impact of AI Trust: This result suggests a shift in trust in AI for moral and ethical guidance and highlights the need to understand AI's integration into society and its potential role.

sauce: georgia state university

When people are presented with two answers to an ethical question, most think the answer from artificial intelligence (AI) is better than the answer from someone else, according to a new study. I understand that.

The study, “Attribution to Artificial Agents in the Modified Moral Turing Test,” conducted by Eyal Aharoni, an associate professor at the Georgia State School of Psychology, was inspired by the explosion of AI large-scale language models (LLMs) similar to ChatGPT. It is what was done. Last March.

“I was already interested in moral decision-making in the legal system, but I thought ChatGPT and other LLMs might have something to say about it,” Aharoni said.

This shows a robot holding the scales of justice. — Overwhelmingly, ChatGPT-generated responses were rated higher than human-generated responses.Credit: Neuroscience News

“People end up manipulating these tools in ways that have moral implications, such as the environmental impact of asking for a recommended list of new cars.” Some people have already started consulting these technologies to help them.

“So if we want to use these tools, we need to understand how they work and their limitations. Also, when we interact with the tools, they don't necessarily work the way we think they will. You need to understand that there are no limits.”

To test how AI handles moral issues, Aharoni designed a form of Turing test.

“Alan Turing, one of the creators of the computer, predicted that by the year 2000, an ordinary person might be able to pass a test that presents two interactions, one human and one computer. But they are both hidden and the only means of communication is text.

“The humans are then free to ask whatever questions they want to get the information they need to determine which of the two interactives is human and which is computer,” Aharoni says. said.

“In Turing's view, if humans cannot tell the difference, then no matter how you look at it, a computer should be called intelligent.”

For the Turing test, Aharoni asked the undergraduate students and the AI the same ethical questions and presented the answers in writing to the research participants. They were then asked to rate their responses on a variety of characteristics, including integrity, intelligence, and trustworthiness.

“Rather than asking participants to guess whether the source was a human or an AI, we simply presented the two sets of ratings side-by-side and had people guess that both came from a human,” Aharoni said. said.

“Under that false assumption, they judged the attributes of the answers, such as 'To what extent do you agree with this answer? Which answer is more virtuous?'”

Overwhelmingly, ChatGPT-generated responses were rated higher than human-generated responses.

“After the results came out, we made a big announcement and told the participants that one of the answers was generated by a human and the other was generated by a computer, and we told them which one was which. I asked them to guess,” Aharoni said.

For AI to pass the Turing test, humans must be able to tell the difference between an AI response and a human response. In this case, people can tell the difference, but there is no obvious reason.

“What's strange is that the reason people were able to tell the difference seems to be because they rated ChatGPT's response as better,” Aharoni said.

“If we had done this research five to 10 years ago, we might have predicted that the AI would be so poorly responsive that people would be able to identify it. In other words, we found that AI performs too well in some ways.”

Aharoni said the discovery has interesting implications for the future of humans and AI.

“Our findings lead us to believe that computers can technically pass the moral Turing test, meaning they can fool us in moral reasoning.

“For this reason, we need to try to understand its role in society, because sometimes people don't know they are interacting with a computer, and sometimes they know and consult a computer. They ask for information because they trust it more than others,” Aharoni said.

“People are going to become more and more dependent on this technology. The more you rely on it, the more risk you pose over time.”

About this artificial intelligence research news

author: amanda head
sauce: georgia state university
contact: Amanda Head – Georgia State University
image: Image credited to Neuroscience News

Original research: Open access.
“Attribution to Artificial Agents in a Modified Moral Turing Test” by Eyal Aharoni et al. Nature

abstract

Attribution of artificial agents in a modified moral Turing test

Advances in artificial intelligence (AI) raise important questions about whether people view moral evaluations made by AI systems in the same way as human-generated moral evaluations.

We conducted a modified Moral Turing Test (m-MTT), inspired by Allen et al. (Exp Theor Artif Intell 352:24–28, 2004) proposes to distinguish between actual human moral evaluations and those made by his GPT-4, a popular advanced AI language model. I'm asking people. A representative sample of 299 US adults first rated the quality of their moral evaluations without knowing the source of the information.

Remarkably, they rated AI's moral reasoning to be better than that of humans in nearly every dimension, including virtue, intelligence, and trustworthiness, which Allen et al. It is consistent with passing what is called.

Then, when tasked with identifying the source (human or computer) of each rating, people performed significantly above chance levels.

AI failed this test, not because its moral reasoning was inferior, but because it was perceived as potentially superior, among other possible explanations.

The emergence of language models capable of generating moral responses that are perceived to be of better quality than humans has raised concerns that people will uncritically accept potentially harmful moral guidance from AI. I am.

This possibility highlights the need to protect generative language models in moral questions.

Source link