Scarlett Johansson's complaint against OpenAI sets new benchmark for the development of machine intelligence

Machine Learning


Sound Patterns

Credit: Pixabay/CC0 Public Domain

Over 2,000 years ago, the ancient Greek philosopher Aristotle developed a way of constructing an argument, which he called “rhetoric,” and explained how the logic of an argument or speech text, the needs and understanding of the audience, and the speaker's authority can be used as a strategy to persuade others.

Politicians and actors have long realized that there is no more effective way to win the hearts and minds of their audiences than to use emotion, rather than relying solely on the logic of an argument or trust in the speaker.

With the launch of the GTP-4o last week, we may have seen the perfect machine for the job. While most see this as a fantastic breakthrough invention that could benefit a huge number of people, some are more cautious.

Actress Scarlett Johansson said she was “shocked” and “outraged” to hear the new GTP-4o speak, despite having previously declined a request from OpenAI to sample her voice.

Sky, one of five voices used by GPT-4o, bears an uncanny resemblance to the actress who played the AI ​​Samantha in the 2013 film Her, about a man who falls in love with his virtual assistant. Adding fuel to the debate, OpenAI founder and CEO Sam Altman tweeted “Her” on the day GPT-4o was released, seemingly highlighting the comparison between Sky and Samantha/Johansson.

OpenAI later posted that it was “in the process of pausing the use of Sky” on X, and created a webpage on May 19 to explain that a different actress had been hired. The company also detailed how it selected voice actors.

The reference to the film Her shortly after GPT-4o was announced has raised awareness of the technology among the general public, perhaps making its capabilities seem less frightening.

This is fortunate, as rumors of a partnership with Apple have raised privacy concerns ahead of the release of iOS 18 next month. Similarly, OpenAI is partnering with Microsoft on a new generation of AI-powered Windows systems called Copilot+PC.

Unlike other large-scale language models (LLMs), GTP-4o (or omni) is built from the ground up to provide an integrated understanding of not only text, but also visual and audio – truly multi-modal, going far beyond the capabilities of “traditional” LLMs.

It can pick up on nuances in speech, such as emotions, breathing, background noises, and bird calls, and integrate them with visual information.

It's an integrated multimodal model (meaning it can handle photos and text) that responds and can be interrupted at the same speed as normal human conversation (320 milliseconds on average). The result is surprisingly natural, changing tone and emotional intensity appropriately. It can even sing. Some have complained about how “frivolous” the GTP-4o is. It's no wonder actors worry.

This is truly a new way of interacting with AI. It represents a subtle shift in our relationship with technology, offering an entirely new type of “natural” interface, sometimes called EAI (empathetic AI).

The speed of this advancement has many government agencies and police departments worried. It remains to be seen how best to respond if the technology is weaponized by rogue nation states or criminals. With the rise of audio deepfakes, it's becoming harder to tell what's real and what's not. Even Johansson's friends thought it was her.

In a year when elections are due to take place involving over 4 billion voters, and in a year when targeted deepfake audio-based fraud is on the rise, the dangers of weaponized AI should not be underestimated.

As Aristotle discovered, persuasion often depends not on what you say, but how you say it. We are all plagued by unconscious bias. An interesting report from the UK on accent bias highlights this point: some accents are more authentic, authoritative, and even trustworthy than others. For this very reason, people working in call centers are now using AI to “westernize” their voices. In the case of the GTP-4o, how you say it may be just as important as what you say.

If an AI can understand the needs of its audience and use logical reasoning, perhaps the last thing it needs is a way to get its message across, as Aristotle pointed out 2,000 years ago. This could give rise to an AI that has the potential to become a superhuman rhetorical master, with the power to persuade audiences to no end.

Courtesy of The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.conversation

Quote: Scarlett Johansson's complaint against OpenAI is a new benchmark in machine intelligence development (May 23, 2024) Retrieved May 23, 2024 from https://techxplore.com/news/2024-05-scarlett-johansson-complaint-openai-benchmark.html

This document is subject to copyright. It may not be reproduced without written permission, except for fair dealing for the purposes of personal study or research. The content is provided for informational purposes only.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *