Companies want AI to be better than the average person. That's not easy

Hello, Welcome to the Eye of AI… In this edition, Meta steals Apple's top AI researchers… Energy Executive warns that AI data centers can destabilize electrical grids.

We have committed to bringing you additional insights from the “Professional Future” roundtable that we attended Oxford last week, to the “Professional Future” roundtable that the business school said last week. One of the most interesting arguments was about the performance standards companies use when deciding whether to deploy AI.

Most companies use existing human performance as a benchmark for which AI is judged. But beyond that, decisions become complicated and subtle.

Simon Robinson, executive editor at news agency Reuters, who began using AI in a variety of ways in the newsroom, said his company had promised not to deploy AI tools in news production unless the average error rate is better than the people who perform the same task. So, for example, the company can deploy AI and on average AI software can do this with fewer errors than human translators.

This is the standard company that most companies use, and is better than humans On average. However, in many cases this may not be appropriate. Utham Ali, BP's Global Responsibility AI Executive, said the oil giant wants to see if a large-scale language model (LLM) can act as a decision support system and can advise human safety and reliability engineers. One of the experiments conducted was to check that LLM could pass the safety engineering tests that BP should take to all safety engineers. LLM — Ali didn't say which AI models were there.

However, according to Ali, 8% of questions the AI system missed were suspended by the BP team. How often do humans miss these specific questions? And why did the AI system get those questions wrong? The fact that BP experts had no way of knowing why LLM missed the question meant that the team wasn't confident in deploying it.

The concerns that BP had apply to many other AI uses. Take AI to read medical scans. These systems are often evaluated using average Performance may not tell us what we need to know compared to human radiologists. For example, we don't want to deploy AI that is better on average than human doctors when detecting abnormalities, but we also had a higher chance of missing out on the most aggressive cancer. Often performance is on a subset of the most important decisions that are more important than average performance.

This is one of the toughest issues with AI deployment, especially in high-risk domains. We all want these systems Superhuman In decision making Like a human In the way they make decisions. However, with current methods of building AI, it is difficult to achieve both at the same time. There are many similarities about how people should treat AI, but I think there might be some of the best, whether it's an intern, junior employee, trusted colleague, or a mentor. alien. ai is a bit like that old one's cone head Saturday Night Live Sketch – It's clever, even great things, even some things that are brilliant, including passing on oneself as a human being, but it doesn't understand like a human being to “think” how we do it.

At this point, a recent research paper is coming home. We found that the mathematical capabilities of AI inference models can be used to obtain answers using a step-by-step “chain of thoughts.” Doing so is more than twice as likely to make the model get the answer wrong. why? No one knows for sure.

We have to decide how comfortable we are with the alien nature of AI. The answer depends heavily on the domain in which AI is deployed. Take self-driving cars. Already, autonomous driving technology has advanced to a point where its widespread deployment is likely to lead to on average much less road accidents than on average having the same number of human drivers on the road. However, the mistakes that self-driving cars make are alien. The sensors were unable to distinguish the white side of the truck from the cloudy sky beyond that, so they suddenly suddenly invade traffic that was on or plow directly onto the side of the truck.

As a society, if we care more about saving lives than anything else, it may make sense to immediately allow the widespread deployment of self-driving cars, despite these seemingly strange accidents. But our insecurities about doing so tell us something about ourselves. We value the illusion of control, predictability and perfection, not just lifesaving. We are deeply uncomfortable with a system in which some people could be killed for reasons that are beyond explanation or control, even if the total number of deaths has decreased from the current level. We are uncomfortable with inspiring unpredictability into our technology systems. We prefer to rely on people who know they are deeply incorrect, but we believe that if we apply the right policy, it's perfect, rather than technology that may be less errors but doesn't understand how to improve them.

So there's more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

Before reaching the news, the US paperback edition of my book, Learning AI: Survival Guide to the Future of Our Superpower, I'm coming out from Simon & Schuster today. Consider Pick up a copy For your bookshelves.

Also, if you want to Do you know more about how to use AI to transform your business? Are you curious about what AI means to the fate of companies and countries? Then join the Ritz-Carlton in Singapore's Millennia on July 22nd and 23rd to Fortune Brainstorming AI Singapore. This year's theme is the age of intelligence. Accents from DBS Bank, Walmart, Open, Arm, Qualcomm, Standard Chartered, Temasek and our founding partners, as well as many more executives will join together with leading government ministers in Singapore and the region, top academics, investors and analysts. We dive deep into modern AI agents, explore data centers built in Asia, explore how AI systems create business value, and explain how AI can take responsibility and safely deploy. You can apply to attend here And as a loyal eye to AI readers, I can offer free tickets to the event. Just use the discount code bai100jeremyk When you check out.

Note: The above essay was written and edited by Fortune staff. The following news items were selected and edited and fact-checked by the author of newsletters created using AI.