OpenAI has raised the bar in the competitive world of generative artificial intelligence by introducing a new model that it hopes will attract more users to its platform and fend off all challengers.
GPT-4o is an updated version of the underlying large-scale language model technology that powers ChatGPT. Last week, it was rumored that it would launch as a search engine to compete with Google, but Reuters reported that OpenAI has postponed its launch.
OpenAI CEO Sam Altman denied any launch, only posting on X that the company is “hard at work on some new things we think people will like.”
The “o” in the name stands for “omni,” and the California-based company promotes GPT-4o as being for everyone. This makes sense because “omni” means “all” or “all.” Does OpenAI want to be ubiquitous? In our lives?
What is GPT-4o?
Short answer: According to OpenAI, GPT-4o is a “new flagship model capable of inferring audio, vision, and text in real time.”
Short answer: This is OpenAI's fastest AI model.
OpenAI said in a blog post on Monday that the name “Omni” stands for “a step toward more natural human-computer interaction.”
It is also natively multimodal, allowing it to accept any combination of text, audio, and images as input, and also produce output of any combination of text, audio, and images.
How fast is GPT-4o?
OpenAI claims that GPT-4o can respond to voice input in just 232ms, with an average of 320ms. This is approximately the same response time in human conversation, according to multiple studies.
As a result, GPT-4o uses fewer tokens in language, the basic unit of AI that can calculate text length and include punctuation and spaces. The number of tokens varies by language.
Languages highlighted by OpenAI that use fewer tokens in GPT-4o include Arabic (53 to 26), Gujarati (145 to 33), Hindi (90 to 31), and Korean (45 to 27). ), Chinese (34 to 24). .
For perspective, some comparisons can be made to a 1968 study by Robert Miller. Response time in human-computer conversational transactions – It details the three dimensions of computer mainframe responsiveness.
Research has shown that a response time of 100 milliseconds is perceived as instantaneous, and less than a second is fast enough for users to feel like they have control over the information. If the response time exceeds his 10 seconds, the user's attention will be completely lost.
How does GPT-4o work?
The simplest answer is that OpenAI has simplified the process of converting input to output.
OpenAI's previous AI models used voice mode to communicate with ChatGPT with an average delay of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4). Three separate models were used in the audio mode. One simple model transcribes audio to text, GPT-3.5 or GPT-4 takes the text and outputs it, and a third simple version converts the text to audio.
“This process means that GPT-4, its primary source of intelligence, loses a lot of information. GPT-4 cannot directly observe tone, multiple speakers, background noise, “They cannot vocalize or sing or express emotions,” OpenAI said.
However, with GPT-4o, OpenAI integrates all these capabilities into a single model with end-to-end capabilities spanning text, vision, and audio, significantly reducing the amount of time consumed and information processing. I was able to.
“All inputs and outputs are processed by the same neural network,” OpenAI said. Neural networks are AI techniques that teach computers to process data similar to the human brain.
Still, OpenAI said it has “just scratched the surface” of GPT-4o's capabilities and limitations, given that it is the first model to integrate all these modalities.
What you can do with GPT-4o do not have do?
Speaking of limitations, OpenAI acknowledged “some” of limitations across the GPT-4o model, including the response inconsistency featured in the failed reel. We also demonstrated how well GPT-4o is good at irony.
Additionally, OpenAI said it continues to improve the model's behavior after training. This is important in addressing safety concerns, which are a major failure point in modern AI.
The company tested the model with more than 70 experts in the fields of social psychology, bias, equity, and misinformation to identify risks that could permeate, as well as serve as guardrails for audio output. He said he has created a new safety system to
“We will continue to mitigate new risks as they are discovered. We are aware of a variety of new risks associated with the GPT-4o audio modality,” OpenAI said in a statement.
How much does GPT-4o cost?
Good news. It's free for all users, and paid users get “up to five times the capacity limits” of free users, Mira Murati, chief technology officer at OpenAI, said in the announcement presentation.
However, if you are not a paid OpenAI user, it costs $5 and $15 for input and output of 1 million tokens, respectively.
Allowing free use of GPT-4o is beneficial to OpenAI and complements the company's other paid services.
In August, OpenAI launched ChatGPT Enterprise monthly plans. Its price depends on user requirements. This is his third tier after the basic free service and the $20/month Plus plan.
The company launched its online ChatGPT store in January, giving users access to more than 3 million custom versions of GPT developed by OpenAI's partners and community.
As competition intensifies in the world of generative AI, OpenAI hopes to attract more users. And a lot of things happen to them.
How does OpenAI stack up against its biggest rivals at this point?
OpenAI's move to introduce new, free, and fast large-scale language models shows how the company has its hands full against competitors in generative AI.
Google's Gemini, perhaps the biggest rival in this field, is testing human experts at massively multitasking language understanding, one of the widely used techniques for testing AI's knowledge and problem-solving abilities. It was the first AI model to win.
Gemini can be accessed with the Google One AI Premium plan for $19.99 per month. This includes 2 TB of storage, 10 percent back on Google Store purchases, and other features in Gmail, Google Docs, Google Slides, and Google Meet.
The company launched Gemma in February. It aims to help developers and researchers “build AI responsibly” and is intended for unglamorous tasks like basic chatbots and summarization jobs.
Meanwhile, Anthropic announced Claude 3 in March. This is a direct challenge to OpenAI, the leader in generative AI.
The company, backed by Google itself and Amazon, has three tiers: Haiku, Sonnet, and Opus, each with enhanced functionality to suit users' needs.
Haiku costs $0.25 per million tokens (MTok) for input and $1.25 for output, while Sonnet costs $3 and $15. Opus is the most expensive, at $15 and $75.
For comparison, OpenAI's GPT-4 Turbo costs $10 for input, $30 for output, and has a smaller context window of 128,000 MTok.
Microsoft, OpenAI's biggest backer, charges $20 a month for its Copilot pro service, which promises faster performance and “everything” the service has to offer. If you don't want to pay, there is a free Copilot tier, but it obviously has limited functionality.
And then there's OpenAI's friend-turned-foe, Elon Musk's xAI Grok.
The current version of Grok, Grok-1.5, is available only to subscribers of X's Premium+ tier, which starts at $16 per month or $168 per year.
Local groups are also looking for leaders. On Monday, Abu Dhabi's Technology Innovation Institute introduced Falcon 2, the second version of its large-scale language model, to compete with models developed by Meta, Google and OpenAI.
Also on Monday, Core42, a division of Abu Dhabi artificial intelligence and cloud company G42, launched Jais Chat, a bilingual Arabic-English chatbot developed in the UAE. Free to download and use on Apple's iPhone.
Updated: May 15, 2024, 10:34 a.m.
