Tax payment using AI is likely to backfire spectacularly.

Machine Learning


Tax season, the most dreaded time of the year, is upon us. But if you were hoping that the latest AI technology could help you with your tedious paperwork and perhaps find a way to save you a few bucks, think again.

After testing some of the four major AI chatbots: new york times We found that everyone had trouble selecting and filling out the correct form and was fumbling with important calculations. In total, the bots miscalculated the taxes owed to the IRS by an average of more than $2,000.

“The problem with taxes is that every tiny detail matters, and you can’t get every detail right,” said Benedict Evans, an analyst who writes a technology newsletter. new york times.

“These models improve dramatically every six months,” he continued. “But they still give you roughly the right answer, which is not what you want.”

AI can help process and summarize large amounts of information, but it struggles with accuracy in almost every area. Chatbots often fabricate false factual claims even when asked to summarize a single document. AI programming assistants inject errors into your code. The image generator produces strange visual artifacts and inconsistencies.

The challenges are the same in arithmetic. Combine this with the Byzantine tax code and all of its highly specialized forms, and you’ve got yourself an arduous and expensive interaction with the IRS, if not a disaster.

To test your AI models (OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Gemini, xAI’s Grok), new york times We had them solve a series of tax scenarios covered in training materials from TaxSlayer, a tax service. Only after providing the model with very specific instructions, such as where to place each piece of information in each IRS document, did the AI’s performance begin to improve.

Some might argue that that defeats the point of using automation tools in the first place. The average person uses overpriced tax software because they don’t know the nitty-gritty of the process. SSoftware like TurboTax and TaxAct are “procedural, following ‘if-then’ logic built for mathematical precision,” explained Erik Brynjolfsson, a senior research fellow at the Stanford Institute for Human-Centered AI. new york times — Large-scale language models, on the other hand, are predictive engines that “can perform superhumanly at many tasks, but fail at some tasks that seem easier to humans.”

What are the top examples of how a hallucinating LLM ruins your tax homework? TurboTax’s own technology experiment. When a tax software company deployed an “Intuit Assist” chatbot to answer tax questions, it isolated irrelevant answers. Even when the answers were on topic, they were often wrong.

Learn more about AI: Provides grammatical manuscript review by an AI version of a recently deceased professor.



Source link