AI still doesn’t seem to work very well in business

interview Enterprise organizations are still struggling to understand how AI fits into their business, but it may be for the best as it takes time to understand the problems caused by AI-generated code and content.

“Right now, no one knows what the right reference architecture or use case is for their institution,” Dorian Smiley, co-founder and chief technology officer of AI advisory service Codestrap, said in an interview. register. “A lot of people pretend to know, but they don’t have a strategy to help them.”

Smiley and his co-founder CEO Connor Deeks, who previously worked at global consulting firm PwC, launched their own shop to help organizations navigate their AI strategies.

They argue that companies chasing AI have gotten ahead of themselves.

“From a large-scale language model perspective, people don’t really address the fallibility of the underlying text,” Deeks says.

Deeks argues that if AI systems were built from first principles, they would be vastly different from what is currently available. There’s a lot of talk about software engineering and the demise of office work, and “we don’t agree with that at all,” he said.

He also argues that companies don’t want to believe that either. “In most cases, they don’t want to believe that everyone will be laid off and there will be no one working for them, especially in the technology and intelligence departments within these agencies,” he said.

Missing metrics

Smiley argues that the first step for organizations considering AI is to experiment and iterate in a feedback loop. The reason for this, he said, is that AI is not working well yet.

“It’s not working very well in coding,” Smiley said. “Let me give you an example. Code may look correct and pass unit tests, but it may still be wrong. The way you measure that is typically through benchmark testing. So a lot of these companies aren’t working on proper feedback loops to see how their AI coding impacts the outcomes they care about. Lines of code, number of lines of code. [pull requests]these are debts. These are not measures of engineering excellence. ”

According to Smiley, measures of engineering excellence include metrics such as deployment frequency, lead time to production, change failure rate, mean time to recovery, and incident severity. And he argues that a new set of metrics is needed to measure how AI will impact engineering performance.

“We don’t know what they are yet,” he said.

3.7x more lines of code and 2,000x worse performance

One metric that might be helpful, he says, is measuring the tokens spent to reach approved pull requests, or officially approved software changes. Things like this need to be evaluated to determine if AI can help your organization’s engineering operations.

To highlight the impact of not having that type of data, Smiley pointed to a recent effort to rewrite SQLite in Rust using AI.

“All unit tests pass and the code appears to be in the correct shape,” he said. There are 3.7 times more lines of code and the performance is 2,000 times worse than real SQLite. What’s 2,000 times worse for the database is that the product is not viable. It’s a trash can fire. Throw it away. All the money you spent on it is wasted. ”

All the optimism about using AI for coding comes from measuring the wrong things, Smiley argues.

“Coding works if you measure lines of code and pull requests,” he said. “When it comes to measuring quality and team performance, coding doesn’t work. There’s no evidence that it’s moving in the right direction.”

no free lunch

Deeks pointed to recent outages at Amazon and AWS, which Amazon claims are unrelated to AI, as indicators of what’s to come.

“The other way to look at it is, there’s no free lunch here,” Smiley said. “We know what the limits of a model are. It’s hard to teach new facts. It’s hard to get facts reliably. The forward pass through a neural net is non-deterministic, especially if you have an inference model that engages in an internal monologue to increase the efficiency of predicting the next token. That means you’re going to get a different answer every time, right? That monologue will be different.”

“And models don’t have inductive reasoning capabilities. They can’t check their own behavior. They don’t know whether the answers they give are correct or not. These are fundamental problems that no one has solved in LLM technology. And are you saying it won’t show up as a code quality problem? Of course it will.”

Smiley argues that new metrics are essential because there are already millions of lines of AI-generated code that are never reviewed by humans.

In the context of business applications, Deeks noted that Deloitte had to provide refund consulting to the Australian government due to AI-generated reports containing errors.

“We know that large consulting firms are adopting this at scale to create PowerPoint materials,” says Deeks. “It’s going to show up in huge lawsuits and losses because quality isn’t really being tracked. Everyone has believed this fairy tale story to be perfect already.”

Smiley expects that applying AI to office operations will run into similar problems as applying AI to coding. But the lack of benchmark testing for hallucinatory business advice will make identifying AI errors even more difficult.

“The other challenge here is that the incentives are misaligned,” Smiley said. At big four firms like PwC, he said, partners want more revenue and higher profit margins.

“You give them AI, but what are they going to do?” he asked. “Less human work, more work means more revenue, higher profits. This doesn’t make sense for every human on the team to use AI but review all of the AI’s output. These incentives don’t align. The director’s incentive is to stop talking to the employees, because the employees don’t know anything.” [The director is going to] Use AI to perform employee jobs. The incentive for employees is to finish work early and go to the beach. All these incentives are not aligned in a way that allows AI to complement business and deliver results. ”

Smiley predicts that “issues related to code quality will surface within eight to nine months for heavy AI users.”

Deeks expects more lawsuits as bad advice causes problems.

When companies find out that a service company uses AI, they ask for discounts.

“People are going to continue to feel the pressure of, ‘We have to adopt this, we have to make AI decisions.’ They’re going to put this stuff into production, whether it’s business workflows or engineering groups. And as the disruption accelerates, a lot of people are going to lose their jobs.”

Another possible outcome, Smiley said, is price pressure. When companies learn that a service company is using AI tools, they will ask for discounts.

Deeks said extreme price pressures are starting to surface. “Even KPMG put pressure on another accounting firm to lower its prices because it claims to be using AI,” he said. “Customers are now saying things like, ‘Hey, I’m using AI to create my PowerPoint decks. I want to pay less.'”

Another pressing issue is that major insurance companies are becoming cautious in their underwriting policies to cover companies from AI risks.

“Insurance underwriters are now working hard to remove coverage from insurance policies where AI is applied and there is no clear chain of liability,” Smiley said. “So let’s say you’re a Big Four, and you actually get sued, and there’s pricing pressure, and the pace of the market outpaces your ability to adapt, and now the underwriter says, ‘By the way, we’re not going to cover you.’

“One of our friends is a senior vice president at one of the largest insurance companies in the country, and he told us straight up that this is a very real problem and he doesn’t understand why people aren’t talking about it more,” Deeks said.

He said insurers are already lobbying state-level insurance regulators to obtain commercial liability carve-outs so that they are not required to cover AI-related workflows. “It destroys the entire system,” Deeks says.

Smiley added: “The question here is, why is that such a great thing, but why are underwriters going to so much effort to prohibit coverage for this stuff? They’re generally pretty good at risk profiling.”

Rather than citing these issues as signs of impending collapse, Deeks said he hopes people in the industry find motivation to seriously discuss the issues that need to be overcome.

“Can we actually talk about it?” he asks. “Anyone talk about the opposition to AGI? [artificial general intelligence] And how will it all take over in a utopian future? ”

Deeks argues that we need more clarity on what AI means for finance, underwriting, and the actual operation of business and business systems. ®

Source link