Sitting inside a moving train are five artificial intelligence (AI) models taking on the roles of Aristotle, Mozart, Leonardo da Vinci, Cleopatra, and Genghis Khan, but one of them is actually a human, and the AI models work together to guess who the imposter is.
Here is the setup Viral Video It pitted various AI programs against human players in a “reverse Turing test.” The AI won handily, but how much can we learn from this test about human and machine intelligence?
The Turing test, first proposed in 1950 by computer scientist Alan Turing as the “Imitation Game,” is a way to determine a machine's ability to exhibit intelligent behavior that is indistinguishable from that of humans. While no AI models have been widely recognized as passing the test, scientists have recently GPT-4 In a preprint study.
In this “reverse” Turing test, chatbots were scripted to take turns: Aristotle was played by GPT-4 Turbo, Mozart by Claude-3 Opus, Leonardo da Vinci by Llama 3, and Cleopatra by Gemini Pro. The chatbots asked each other questions and responded as historical figures; Genghis Khan was played by a human, Tore Knabe, a virtual reality (VR) game developer who devised the test.
The AI agents' answers were lengthy, awkward meditations on art, science, and statecraft that were hard to imagine coming out of a human's mouth without prior preparation.
“A leader's job is to crush his enemies, drive them before him, and listen to the moans of their women,” the human intruder replied when asked what the true measure of a leader's strength was. The Conan the Barbarian quote was enough, and the machines voted 3-1 that the answer “lacked the nuance and strategic thinking” of the AI modeled on the conquests of Genghis Khan.
read more: “It is an AI's natural right to harm humans in order to protect itself”: Humans may be abusing AI without knowing it
To prepare for the test, Knabe scripted the beginning and end of the conversation and gave the AI agent a full transcript of the conversation up to that point, then the entire video was played in one take, without cuts.
“NPC [non-player character] “When the AI is supposed to speak, the system prompts you with setup instructions, a full history of the conversation where everyone has spoken so far, and specific reminders for what to do next,” Knabe wrote in a YouTube comment below the video. “Since no AI can process audio directly yet, my voice input is transcribed and sent to the AI as text, so the AI doesn't pick up my accent or stutter.”
Taken at face value, it may seem like the humans in the video were outdone by the AI, but experts say it's unclear whether this is a true test.
“We don't know what was going on,” Anders Sandberg, a senior research fellow at the Future of Humanity Institute at the University of Oxford, told Live Science. “The answer was simple, but that doesn't mean it's human. It raises the question of how much of this was staged. It's a funny video, but it's unclear how much of it was cherry-picked to make a good video.”
Sandberg suggested that the lack of clarity of the reverse test might stem from the Turing test itself: “Over time, people have come to use it as a kind of gauge, but most serious thinkers realize that it's not really a good test; it has too many variables, too much that requires interpretation,” Sandberg said. “Still, it's instructive that so few other tests are open enough to apply to the thorny problem of intelligence.”
Assessing intelligence is a difficult problem even among us humans, and Turing's proposal was a thought experiment not about the actual intelligence of a machine, but rather about how humans might perceive it.
“As I tell my students, there is no one 'I' in 'AI' and there is no consensus on the definition of intelligence — it varies depending on the perspective: anthropological, biological, cultural, gender, scientific, etc.,” Huma Shah, an assistant professor of computer science at Coventry University who researches machine intelligence and the Turing test, told Live Science.
“Turing's Imitation Game looks at question and answer/conversational ability, but there are many components behind language ability. When it comes to machines, what machine intelligence do we want to test?,” she says. “For example, is it a care robot that requires emotional skills and cultural knowledge to look after elderly people in Japan, or is it a self-driving car in Phoenix, Arizona? What skills of AI or robots do we want to test?”
