“Inference Whale” crashes AI coding party

AI For Business


The AI coding sector has its problems.

Heavy users of AI coding services have earned enormous costs, with some major startups forced to overhaul their pricing structures and products to avoid major losses.

The “Inference Whale” has forced industry insiders to question whether AI products, which are “resold inference,” can survive in the long term, as some people in the business call these customers.

Inference refers to how an AI model is executed. The new inference model divides the user's requests into multiple steps, increasing the inference cost. Costs can quickly rise when applied to AI coding services where developers set up automated agents for long-term tasks.

This is a problem for AI coding services as it is often offered through monthly subscription plans. Many plans allow unlimited use of fixed monthly fees, giving them an advantage when several users attack the service with huge projects.

These startups are narrowed between relatively fixed revenue streams and rapidly rising backend costs, as they have to pay for the underlying AI model.

“If you're reselling purely AI inference, the wind can change dramatically, so your business is very fragile and vulnerable,” said Eric Simons, CEO of Stackblitz, Startup, a popular AI coding service called Bolt.New.

Claude Code Whale


Bowhead whale (Valene Mistysetus) infringement from Jardins "Naturalist Library"

Destruction of Bowhead Whale from Jardine's “The Naturalist Library”

Reuters/Science Photo Library



Humanity offered the popular Claude Code service earlier this year through an unlimited plan of $200 a month. Some subscribers went vicious, using thousands of dollars worth of AI reasoning over weeks or months.

Someone has built a website to rank these AI coding whales. The Claude Code Leaderboard lists one developer at the top who burned about 11 billion tokens.

Tokens are a way for AI models to delete queries into digestible data chunks. Industry pricing is based on the number of tokens processed. According to this leaderboard, this top-ranked developer's token usage is almost $35,000.

This is comparable to the $200 a month he was charged. Even if it's more than a year, humanity will win around $2,400, resulting in much higher inference costs.

Humanity is changing pricing

It is a plan to change the pricing of humanity, as it is clearly unsustainable. The $200/month plan will remain in place, but the startup will introduce weekly rate restrictions starting August 28th.

If users blow away these new weekly rate limits, they will need to purchase additional capacity.

“We have identified extreme use by a small number of customers that impact the capabilities of the wider community,” a human spokesman told Business Insider.

The startup said it has also seen “policy violations” such as account sharing and reselling access.

“We are committed to supporting advanced use cases in the long term, but in the meantime we need to ensure consistent performance for all developers,” added a human spokesman.

Swedish whale

I tracked one of the whales near the top of the Claude Code leaderboard.

Sweden-based developer AlbertÖrwall said he is using a $200 Claude code subscription a month to build his own vibe coding platform along with open source agent tools.

“I was probably constantly running three or four fairly long tasks while I was working, in parallel, and that was when I really took off,” he said of Claude's use of code.

Even with these big projects being ruled out, Örwall said his regular workflows in his regular workflows are likely to earn an inference cost of $500 per day under a subscription of just $200 per month.

“So I speculate that my workflow may not be sustainable for humanity,” he added.

The cursor also responded

With the new prices for Anthropic, Örwall said he would hold a $200/month subscription for a while, as the weekly limit actually feels like the budget's meaning.

“I'll avoid paying anything beyond my $200 subscription,” he said. He said that the project could be developed to change the way he writes the code and avoid violations of new rate limits.

“The reason I originally switched from cursor to Claude Code is because the pricing used in cursors was too high,” adds Örwall.

Cursor is another popular AI coding service, often using Anthropic's AI model as the underlying intelligence powered by its products.

Cursor recently switched $20 a month from unlimited requests with usage-based pricing for “fast” requests. This means that users will be charged an additional fee to exceed certain limits.

This change, coupled with a lack of clear communication, has caused confusion and frustration among some users who expected unlimited use.

Cursor announced its first change in mid-June. After that, we updated more details about two weeks later, and again in early July.

“The new model allows you to spend more tokens per request on longer Horizon tasks,” the startup wrote in a blog post, apologizing for surprising users with a new unexpected invoice.

“The cost for most users was pretty constant, but the most difficult requests are an order of magnitude higher than the simplest ones.”

Inference costs have not been reduced

The industry-wide assumption is that inference costs will drop dramatically, making these AI coding services more economically viable.

But in reality, this hasn't happened so far. Instead, when a new top AI model is announced, all AI coding services will integrate it, in addition to higher prices.

“This is the first false pillar of a 'cost removal' strategy,” Ethan Ding, CEO of Startup TextQL, wrote in a recent blog. “There is demand during the period of “best language model.” And the best models always cost roughly the same. Because that's the edge of today's reasoning. ”

Developers and other AI users usually want the best, not the major intelligence of last month.

“No one opens Claude and says, 'What do you know? Let my boss use a silly version to save some money.' We are cognitively greedy creatures,” Din writes. “We want the best brain we can get.”

Even with lower inference costs, rising agent AI workflows means developers will set up longer automated projects that generate more tokens.

If a project uses 100 million tokens rather than 1 million, the cost of the initiative will remain high even if the price per token drops.

“A $20 per month subscription won't even support users who do a $1 deep search run per day,” Ding said. “But that's exactly what we're competing for. All improvements to model capabilities are improvements in computational quantities that can be meaningfully consumed.”

“There is no way under the subscription model to offer unlimited use in this new world,” he added. “Mathematics is fundamentally broken.”

Sign up for BI's Tech Memo Newsletter here. Please contact me by email abarr@businessinsider.com.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *