Fundamentals of AI | OpenAI’s “smartest” new o3 and o4-mini models launched, what it means for AI models to “reason” | News explained

AI Basics


On April 16, OpenAI released two new artificial intelligence (AI) inference models, OpenAI o3 and o4-mini. The company says these are the latest in a “series of models trained to think longer before responding.” The company calls these the “smartest models” it has released and says they “represent a step change in ChatGPT’s capabilities for everyone from curious users to advanced researchers.”

In training both models, the company said: Using “reinforcement learning”a technology previously used by other AI companies, including Chinese startup DeepSeek. OpenAI also claims that compared to previous versions, the new model should “feel more natural and conversational, especially as it references memories and past conversations to make responses more personalized and relevant.”

What exactly are the processes underlying these improvements, and how are they different from what we’ve experienced with AI chatbots and programs to date? Let’s explain.

First, why is inference important in the world of AI?

When large-scale language models (LLMs) such as ChatGPT and Google Gemini were first released, their appeal lay in their fast and fairly consistent responses, even if some of them were flawed.

Essentially, these tools recognize patterns in large amounts of data and generate responses to user prompts through a series of predictions and calculations. At a basic level, it predicts the next most likely word in a sequence of words.

“Once a chatbot starts responding to you…a tremendous number of calculations are performed to determine what the first word of the response should be. After the chatbot has output (say, 100 words), it considers the prompt and the first 100 words it has generated so far to decide which word makes the most sense,” Princeton University researchers Arvind Narayanan and Sayash Kapur write in their book. AI Snake Oil: What artificial intelligence can and can’t do, and how to tell the difference.

Where do these LLMs get the data to make their calculations and predictions? Primarily from the internet, everything from Wikipedia articles to books. There has been an understanding among AI companies that one way to improve LLM is to feed more data into it. With more data, patterns can be better understood and translated into more sophisticated responses.

Story continues below this ad

But by 2024, AI companies were tapping into all the text on the internet.

Questions then arose about possible next steps to improve the LLM. In September 2024, OpenAI released its first inference model, the o1 model. This model is one that “thinks before responding” and “can generate a long internal chain of thought before responding to the user.” This model was trained through reinforcement learning.

What is reinforcement learning?

in Reinforcement Learning: Introductioncomputer scientists Andrew Burt and Richard S. Sutton, known for pioneering algorithms in reinforcement learning techniques, wrote, “Whether we are learning to drive a car or having a conversation, we are acutely aware of how the environment responds to our actions, and we seek to influence what happens through our actions. Learning from interaction is a fundamental idea underlying nearly every theory of learning and intelligence.”

They explain that every action, even something as simple as making breakfast, involves evaluation and interaction with one’s surroundings to produce the desired effect. Sutton and Barth developed a reinforcement learning computational algorithm based on the concept of “reward” in the 1980s.

Story continues below this ad

“The field of artificial intelligence (AI) is generally concerned with building agents, entities that know and act. A smarter agent is one that chooses a better course of action. The concept that some courses of action are better than others is therefore central to AI. Reward (a term borrowed from psychology and neuroscience) refers to signals provided to an agent related to the quality of its actions. Reinforcement Learning (RL) is the process of learning to behave better when given this signal,” the citation says. Sutton and Barth will be awarded the Turing Award, considered the Nobel Prize for computer scientists, in 2024.

“It’s a bit like training a dog,” said OpenAI researcher Jerry Turek. new york times About the approach. “If the system works, I give them a cookie. If it doesn’t work, I say, ‘Bad dog.'”

So how are inference models different?

Inference models arrive at answers to user queries in complex and multiple ways. “In previous models like ChatGPT, you start responding as soon as you ask a question… This model can be slow. It allows you to think through the problem in English, break it down and look for angles to provide the best answer,” Jakub Pachocki, chief scientist at OpenAI, previously said. new york times.

Through “reasoning,” the model considers different approaches and solutions to a prompt, recognizing patterns to arrive at an answer. OpenAI claims that the o3 model is “best suited for complex queries” where “the answer is not immediately obvious.”

Story continues below this ad

The jury is still out on whether this means that these AI systems “reason” or “think” like humans, and whether this is a means to an end. The next frontier of AI. But for now, that appears to be the approach AI research companies are taking in pursuit of continuous improvement.





Source link