Tokenmaxxing is not an AI strategy • The Register

Machine Learning


How much does AI cost? This is a simple question, but an important one. The answers will determine the fate of companies and shape society. But this is also a question that cannot be answered in a meaningful way without additional context.

One possible response is “too much.” According to Stanford HAI’s 2026 Artificial Intelligence Quotient Report, U.S. private AI investment reached $285.9 billion in 2025. That money brings economic benefits but puts stress on environmental resources, utilities, and communities.

“The power capacity of AI data centers will increase to 29.6 GW, comparable to the state of New York at peak demand, and annual GPT-4o inferred water usage alone could exceed the drinking water needs of 12 million people,” the report states.

Second, over-reliance on prompt slot machines takes a toll on human performance as skills atrophy or don’t develop at all.

However, it is difficult to measure it in the short term. And given the current U.S. administration’s lack of regulatory restraint and indifference to public concerns, it’s probably easier to focus on the financial details until government and industry are forced to consider public concerns.

You can start with tokens, which are currently the basic unit for selling the inputs and outputs of AI models. Token pricing has become a major concern for developers with AI subscription plans, as plan providers like Anthropic and GitHub push customers away from token-subsidized subscriptions and towards pay-as-you-go consumption.

Devansh, a machine learning researcher, head of AI at legal startup Iqidis, and founder of an AI community group called Chocolate Milk Cult, did the math in a post published earlier this year. In very specific circumstances, the answer is approximately $0.0038 per million tokens.

This is the base cost of inference on an Nvidia H100 GPU, rented at a cost of $2.50 per hour and generating 185 tokens per second at 100 percent utilization.

But as Devansh observes, no one runs at 100% utilization. At 30% utilization, the price will be ~$0.013 per token. 10% would be ~$0.038 per token.

Anthropic currently charges $5/M tokens (input) and $25/M tokens (output) for its latest model, Opus 4.7. For Google’s Gemma 4 26B A4B, the weighted average input price as of this writing is $0.096/M tokens per OpenRouter.

If you run the numbers on different hardware, different prices, different energy costs, different models, and different utilization rates, you will get different results.

“If you just look at what the labs are offering as a cost per API, that is a very good signal for the western labs of the cost of the token,” Devansh said. register In a telephone interview.

“Some say Anthropic is going to get about a 50% gross margin. But in reality, the cost of a token is actually a lot of variables rolled into one. There’s a model, there’s research behind the model, and there’s continuous updates to the model that people don’t see. So you have to take all of that into account. This is 1. It’s not just the cost of inference per call, it’s actually not a very good way to look at the system.

Devansh said organizations tend not to focus on the specific cost of the tokens because they are focused on providing services that customers value.

“In a lot of legal work, you can actually pass the cost on to the client, and the client won’t complain because they want transparency about what’s done and how it’s done,” he said. “From that perspective, as long as you can justify the cost, you don’t have to worry too much about how much it’s going to cost. … As long as you can consistently deliver value, I don’t think predicting costs is that much of a concern.”

Companies like Meta and Shopify made headlines for treating token usage as a key performance indicator, and employees answered the call by making heavy use of AI tools to demonstrate their value. This can quickly become costly and may not be very useful for more meaningful business metrics.

“Is token spending directly correlated to productivity?” Devansh said. “Absolutely not. I’ve done this research extensively. … There used to be lines of code, number of words typed, and other kinds of stupid productivity metrics. So this is just the latest in that stupid era. I think middle managers will always try to justify themselves and find ways they can rank people without using their wits.”

But one of the problems with LLM, Devansh says, is that we don’t know the best way to apply it. So there’s potential value in just incentivizing people to use tokens in case they come up with new kinds of workflows that provide signals about what works and what doesn’t.

said Bob Venero, CEO of IT consultancy Future Tech Enterprise. register He said his company tends to work with Fortune 100 clients, many of whom launch big-budget AI projects without really thinking about what they want to accomplish.

Venero said his company’s goal when engaging with customers is to understand desired business outcomes, which may or may not involve AI.

Future Tech’s recent work with Northrop Grumman involved AI. The IT industry helped defense companies implement the Nvidia Enterprise AI Factory to run AI workloads related to their projects.

Venero said businesses are struggling to assess the impact of AI in their environments, measure ROI, and discover how the technology can help them.

“So you have to do a lot of upfront work to identify where you want to spend your money and what the results will be, especially when costs are triple what they were six months ago,” he said, citing “Ramageddon” or the RAM shortage due to the AI ​​computing boom.

Venero points to OpenAI’s commitment to buy memory chips from Samsung and SK Hynix, as well as OEMs like Micron’s shift to high-bandwidth memory, as catalysts for the current RAM crisis. He said calculating the ROI for AI deployments is complicated because everything is becoming more expensive.

Cloud providers can help by offering pay-as-you-go pricing, but there are some concerns about that, he said.

“I’m not a big fan of off-premise AI,” he said. “From our perspective, it’s a little scary.”

Security concerns aside, Benero said the productivity risks of relying on the cloud are significant for large organizations. He pointed to Microsoft Office 365. “Has Office 365 ever gone down?” he said. “Many times. And those outages happen far too often.”

He said it’s probably acceptable for a company to lose $1,000 per minute of downtime due to a cloud outage. “If it costs $1 million a minute, you need to think about the controls you need to put in place, and that’s probably an on-premises solution,” he said.

AI can worsen cloud stability through the introduction of poorly reviewed code and stress on infrastructure due to heavy use of AI. Venero said customers “are definitely aware of that. And if they’re not, we’re educating them.”

Considering the capacity challenges created by OpenClaw’s sudden popularity, Venero said: “People threw this into their environment and it did some crazy things. So we definitely need to have a conversation in the ecosystem about risk and the three different pillars of risk associated with it.”

And hyperscalers are contributing to the problem by emphasizing speed at the expense of quality, he said. “Right now, it’s a competition. Who’s going to win? Who’s going to take the most? And everyone’s putting their all into it. And it’s just creating this incredible mess.”

“What we want our customers to do is take a step back,” he said. “Think about what you want to achieve and why. Look at the investments involved and the appropriate schedule for doing it, and measure the results.”

By approaching AI thoughtfully and intentionally, your AI projects are more likely to be put into production.

Venero said that of the companies he has seen, 15% of prototypes can go live before they are educated about AI. With guidance, that number would be 45 or 50 percent, he said.

“This is very use case specific,” he said. “And if you get the outcomes you’re aiming for and measure those outcomes, you’ll be successful. If you’re not, if you’re doing AI for AI’s sake, you’re going to have a 5% success rate.”

Asking how much AI costs probably shouldn’t be your first question. Citing the pressure some employees feel to demonstrate their value by spending tokens, Benero said the questions to ask are: “Why? And what are we going to use them for?” ®



Source link