Major update to ChatGPT-4o enables audio-video talks with 'emotional' AI chatbot

On Monday, OpenAI debuted GPT-4o (o stands for “omni”). This is a major new AI model that can ostensibly communicate in real time using voice, read emotional cues, and respond to visual input. OpenAI says it runs faster than OpenAI's previous best model, GPT-4 Turbo, and will be available as a service for free to ChatGPT users through an API and will be rolled out over the next few weeks. .

OpenAI reveals new voice conversation and visual understanding capabilities in a YouTube livestream titled “OpenAI Spring Update” with OpenAI CTO Mira Murati and employees Mark Chen and Barrett Zoff did. The livestream also included a live demonstration of GPT-4o in action.

OpenAI claims that GPT-4o responds to voice input in about 320 milliseconds on average. This is similar to response times in human conversation, according to a 2009 study, and much shorter than the typical 2-3 second delays experienced with previous models. OpenAI used GPT-4o to train an entirely new AI model end-to-end using text, vision, and audio in a way that all inputs and outputs are “processed by the same neural network.” says.

OpenAI Spring Update.

“GPT-4o is the first model to combine all these modalities, so we are still just scratching the surface of what this model can do and explore its limits,” OpenAI said. say.

During the livestream, OpenAI demonstrated GPT-4o's real-time voice conversation capabilities, demonstrating its ability to have natural and responsive interactions. The AI assistant seemed to be able to easily sense emotions, adjust its tone and style to suit the user's requests, and even incorporate sound effects, laughter, and songs into its responses.

Expanding / OpenAI CTO Mira Murati saw the debut of GPT-4o during OpenAI's Spring Update livestream on May 13, 2024.

OpenAI

Presenters also highlighted the visual comprehension enhancements of GPT-4o. By uploading screenshots, documents with text and images, or graphs, users will be able to have conversations about the visual content and receive data analysis from GPT-4o. In a live demo, the AI assistant demonstrated its ability to analyze selfies, detect emotions, and make light-hearted jokes about the images.

Additionally, GPT-4o shows speed and quality improvements in more than 50 languages, covering 97 percent of the world's population, according to OpenAI. The model also showcased real-time translation capabilities, facilitating conversations between speakers of different languages with near-instant translation.

OpenAI first added conversational audio capabilities to ChatGPT in September 2023. It utilized Whisper, his AI speech recognition model, for input and custom speech synthesis technology for output. To date, OpenAI's multimodal ChatGPT interface uses three processes: transcription (voice-to-text), intelligence (processing text as tokens), and text-to-speech, with each step increasing the delay. was doing. GPT-4o is said to perform all these steps at once. According to Murati, it “reasons across audio, text, and vision.” They called this the “Omni Model” in a slide that appeared on the screen behind Murati during the livestream.

OpenAI announced that all ChatGPT users will have access to GPT-4o, and paid subscribers will have access to rate limits that are 5x higher than free users. His GPT-4o in API format is reported to have 2x faster speeds, 50% lower cost, and 5x higher rate limits compared to GPT-4 Turbo.

<em>she</em>In , the protagonist speaks with an AI personality through wireless earbuds similar to AirPods. ” src=”https://cdn.arstechnica.net/wp-content/uploads/2023/10/her_2-640×344.jpg” width=”640″ height=”344″ srcset=”https://cdn.arstechnica .net/wp-content/uploads/2023/10/her_2-1280×689.jpg 2x”/><figcaption class=

The features demonstrated during the livestream, as well as numerous videos on OpenAI's website, are reminiscent of the 2013 sci-fi movie's conversational AI agent. she. In that movie, the protagonist develops a personal attachment to her AI personality. With GPT-4o's simulated emotional expressiveness powered by OpenAI (which you could also call Artificial Emotional Intelligence), similar emotional attachments on the part of humans can be created by his OpenAI, as we have already seen in the past. It is not inconceivable that this could be developed by an assistant.

Murati acknowledged that GPT-4o's real-time audio and video capabilities pose new challenges in terms of safety, and said the company will continue safety research and test users during iterative deployments in the coming weeks. He said he would seek feedback from the public.

“GPT-4o also engaged more than 70 external experts in areas such as social psychology, bias and equity, and misinformation to identify risks introduced or amplified by newly added modalities. “We have an extensive external red team,” OpenAI said. “We have applied these lessons [sic] Build safety interventions to improve the safety of interactions with GPT-4o. We will continue to mitigate new risks as they are discovered. ”

ChatGPT updates

Also on Monday, OpenAI announced several updates to ChatGPT, including a ChatGPT desktop app for macOS. OpenAI says it will be available to ChatGPT Plus users starting today and will become “more broadly available” in the coming weeks. OpenAI is also streamlining his ChatGPT interface with a new home screen and message layout.

And, as briefly mentioned above, using the GPT-4o model (once it becomes widely available), ChatGPT Free users will be able to use the web browsing, data analysis, and GPT stores that were previously limited to ChatGPT Plus. and memory functions. , Teams, and Enterprise subscribers.

Source link