Companies want AI to be better than the average person. That's not easy

AI For Business


Hello, Welcome to the Eye of AI… In this edition, Meta steals Apple's top AI researchers… Energy Executive warns that AI data centers can destabilize electrical grids.

We have committed to bringing you additional insights from the “Professional Future” roundtable that we attended Oxford last week, to the “Professional Future” roundtable that the business school said last week. One of the most interesting arguments was about the performance standards companies use when deciding whether to deploy AI.

Most companies use existing human performance as a benchmark for which AI is judged. But beyond that, decisions become complicated and subtle.

Simon Robinson, executive editor at news agency Reuters, who began using AI in a variety of ways in the newsroom, said his company had promised not to deploy AI tools in news production unless the average error rate is better than the people who perform the same task. So, for example, the company can deploy AI and on average AI software can do this with fewer errors than human translators.

This is the standard company that most companies use, and is better than humans On average. However, in many cases this may not be appropriate. Utham Ali, BP's Global Responsibility AI Executive, said the oil giant wants to see if a large-scale language model (LLM) can act as a decision support system and can advise human safety and reliability engineers. One of the experiments conducted was to check that LLM could pass the safety engineering tests that BP should take to all safety engineers. LLM — Ali didn't say which AI models were there.

However, according to Ali, 8% of questions the AI ​​system missed were suspended by the BP team. How often do humans miss these specific questions? And why did the AI ​​system get those questions wrong? The fact that BP experts had no way of knowing why LLM missed the question meant that the team wasn't confident in deploying it.

The concerns that BP had apply to many other AI uses. Take AI to read medical scans. These systems are often evaluated using average Performance may not tell us what we need to know compared to human radiologists. For example, we don't want to deploy AI that is better on average than human doctors when detecting abnormalities, but we also had a higher chance of missing out on the most aggressive cancer. Often performance is on a subset of the most important decisions that are more important than average performance.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *