2025-12-12T21:26:30.691Z
We now have a sobering picture of how often AI models get the facts right. This week, Google DeepMind introduced the FACTS benchmark suite, which measures how reliably AI models produce factually accurate answers.
We test the model in four areas: answering factoid questions from inside knowledge, effectively using web searches, basing answers in long documents, and interpreting images. The best model, Google's Gemini 3 Pro, reached an accuracy of 69%, while other leading models fell well below that.
For context, if any of the reporters I manage submitted a story that was 69% accurate, I would fire them.
Beyond journalism, this number should matter to companies betting on AI. Although models excel in speed and fluency, their factual reliability still lags far behind human expectations, especially for tasks involving niche knowledge, complex reasoning, or precise grounding in source material.
In fields such as finance, medicine, and law, even a small factual error can have a huge impact. This week, my talented colleague Melia Russell looks at how law firms are responding to the rise of AI models as sources of legal truth. That's awkward. She details how one company fired an employee for submitting a document full of fake cases after it was drafted using ChatGPT.
The FACTS benchmark is both a warning and a roadmap. By quantifying where and how models fail, Google hopes to accelerate progress. But for now, the important thing is clear. AI is improving, but it's still wrong 1 in 3 of the time.
Sign up for BI's Tech Memo newsletter here. Please contact us by email. abarr@businessinsider.com.
