Which AI chatbot has excellent calculation accuracy?

Gemini, ChatGPT, Grok: Which AI chatbot has the best calculation accuracy?

Researchers conducted an amazing study in which they analyzed the accuracy of five AI models using 500 everyday math prompts. An interesting result showed that the AI had a 40% chance of getting the answer wrong.

Omni Research on Calculation in AI (ORCA) primarily shows that when AI chatbots are asked to perform routine mathematics, their accuracy varies widely between AI companies and between different types of mathematical tasks.

The selected models are:

Gemini 2.5 Flash (Google)
ChatGPT-5(OopenAI)
DeepSeek V3.2 (Deep Seek AI)
Grok-4(xAI)

The results demonstrated that no AI model scored higher than 63% on everyday math. Outstanding leaders, Geminis (63%) still get nearly 4 out of 10 questions wrong.

Grok achieved a similar score at 62.8%, while DeepSeek came in third at 52%, followed by ChatGPT at 49.4%.

AI accuracy peaks in math and conversation, but hits record lows in physical tasks

Performance varies by category. For math and transformations (147 out of 500 prompts), Gemini leads with 83%, followed by Grok with 76.9% and DeepSeek with 74.1%.

According to euro news, ChatGPT scored 66.7 percent in this category, and all five models had an accuracy of 72.1 percent, the highest among the seven categories.

To avoid errors in any case, users are advised to use a calculator or double check with another prompt.

4 big mistakes made by AI models

Experts categorized mistakes into four types and noted that the main challenge lies in converting real-world situations into correct formulas.

calculation error

In such cases, the AI may understand the question and formula, but fail during the actual calculation. This category includes precision and rounding issues (35%) and calculation errors (33%).

broken logic error

This type of error is one of the most serious as it indicates that the AI is having a hard time understanding the actual cause of the problem. These include errors in methods or formulas such as the use of incomplete mathematical approaches (14%) and incorrect assumptions, which account for up to 12% of errors.

Misunderstanding instructions

Misreading instructions mainly occurs when the AI is unable to correctly interpret the question. Examples include using the wrong parameters, logic errors, and providing incomplete answers.

AI question deviation

The AI has been observed to simply reject or ignore questions rather than attempting a specific answer. The weakness is rounding, especially in multi-step calculations. If an error occurs at any point, the final result will be much different than usual.

Nevertheless, the study used a state-of-the-art model that is freely available to the public.

The study concludes with the following insights: If you want accurate answers to difficult word problems, ChatGPT is great. Want to take a photo of your receipt or receive an immediate response when your Gemini wins? Finally, if you need speed and concise answers, Grok is a solid choice.

This result shows that significant improvements are still needed to achieve reliable mathematics and conversational logic.

Source link

www.binance.bh registrera dig commented on New Microsoft Teams app promises faster speeds and lower memory usage: I don't think the title of your article matches th
Harrison Clark commented on What is Generative AI? Everything You Need to Know: Your writing style is so engaging and makes even t
binance h"anvisningsbonus commented on ARMA INFOCON 2023 to Offer Dedicated Microsoft Education Track; Trade Show and Conference to take Place October 8-11 in Detroit, MI: Thank you for your sharing. I am worried that I la
Chace Weaver commented on AI platform Hugging Face says hackers have stolen authentication tokens from Spaces: The real show starts on my private page so click t
stakes commented on Create the content you envision: That is a good tip particularly to those new to th

Which AI chatbot has excellent calculation accuracy?

AI accuracy peaks in math and conversation, but hits record lows in physical tasks

4 big mistakes made by AI models

calculation error

broken logic error

Misunderstanding instructions

AI question deviation

RECENT POSTS

Chinese company outperforms American competitor in video generation using artificial intelligence ᐉ News from Fakti.bg – Technozone

Harvey rival Lexroom raises $50 million to power AI for 1 million lawyers in Europe — TFN

Midco announces historic connectivity agreement to support AI infrastructure

AI accuracy peaks in math and conversation, but hits record lows in physical tasks

4 big mistakes made by AI models

calculation error

broken logic error

Misunderstanding instructions

AI question deviation

Related Posts