OpenAI's new GPT-4o lets you interact with the same model using audio or video

GPT-4 offered similar functionality, giving users multiple ways to interact with OpenAI's AI products. However, they were siled into separate models, which could result in longer response times and possibly higher computing costs. GPT-4o integrated these features into his one model, which Murati called the “omni model.” That means faster responses and smoother transitions between tasks, she said.

The result, the company's demonstration suggests, is a conversational assistant much like Siri or Alexa, but capable of responding to more complex prompts.

“We're looking at the future of interaction between ourselves and machines,” Murati said of the demo. “We think GPT-4o will really shift that paradigm to the future of collaboration and make this dialogue more natural.”

OpenAI researchers Barret Zoph and Mark Chen described a number of applications of this new model. The most impressive thing was the facility where you can have live conversations. You can interrupt the model while it is responding, and the model will stop, listen, and adjust course.

OpenAI also showed off the ability to change the tone of the model. Chen asked the model to read a bedtime story “about robots and love” and immediately jumped in, requesting a more dramatic voice. This model became increasingly theatrical until Murati immediately requested a switch to a convincing robot voice (which was excellent). There were a few predictably short pauses during the conversation while the model reasoned what to say next, but it stood out as his surprisingly naturally paced AI conversation.

Source link