When OpenAI’s ChatGPT became publicly available in late 2022, it started a huge craze around generative AI chatbots. Since then, chatbots have managed to maintain a large portion of the market share despite the presence of many strong competitors such as Gemini, Grok, Claude, Qwen, DeepSeek, and Mistral.
However, in a study by British company Prolific, ChatGPT ranks eighth in terms of best AI models, behind several Gemini models, Grok models, DeepSeek models, and even a model from French company Mistral. The company has created a proprietary benchmark called Humanine, which it says is “built to understand AI performance through the lens of natural human interaction.”
“Current evaluations are heavily biased toward metrics that are meaningful to researchers but opaque to everyday users, such as accuracy on specialized datasets or performance on difficult inference tasks. This creates a disconnect between what is optimized and what people actually value,” the company said in a blog post.
The company also noted that even leaderboards tailored to human preferences can be inadequate if they are not designed according to scientific rigor. He added that platforms that require everyone to vote for their favorite model are susceptible to sample bias and may have an overrepresentation of tech-savvy users.
The new leaderboard aims to address this issue with automated quality monitoring to ensure participants are approaching the task thoughtfully.
ChatGPT ranks below these AI models
According to Humaine research, the top 10 AI models are:
1. Gemini 2.5 Pro (Google)
2.Deep Seek v3 (Deep Seek)
3. Magistral Medium (Mistral)
6. Gemini 2.5 Flash (Google)
7. Deep Seek R1 (Deep Seek)
10. Gemini 2.0 Flash (Google)
Notably, this study was published in September, at a time when Google had not yet released its Gemini 3 Pro model and xAI had not yet rolled out its Grok 4.1 and Grok 4.1 Thinking models.
Considering Gemini 2.5 Pro has consistently topped various leaderboards since its launch, it’s not at all surprising that Gemini 2.5 Pro is currently at the top of the benchmarks. However, the fact that OpenAI models are not ranked in the top five, and even lags behind the likes of DeepSeek, Grok, and Mistral, is a surprising development if the results are to be believed.
The researchers did not say why ChatGPT was listed so low in the rankings, but they did note that Google’s Gemini 2.5 Pro consistently ranks as the top model in the “Overall Winner” metric.
