While Anthropic is touting its unreleased Mythos AI model as dangerously powerful, Google is trying to change the conversation from a cost and speed perspective.
Google says its latest Gemini 3.5 Flash model rivals cutting-edge products while saving money for companies that are racking up huge bills by consuming billions of tokens, the core unit of AI usage.
Google CEO Sundar Pichai recently said, “Companies have already exhausted their annual token budgets, and we’re only in May.” “Companies could save a lot of money by using Flash in conjunction with other Frontier models.”
The timing of Google’s new model announcement is no coincidence. As companies adopt token-hungry AI agents, they are also paying close attention to their bills. Meanwhile, small AI companies under pressure to make more money are driving up the cost of their products, forcing customers to reconsider their AI spending.
This presents an opportunity to win on value rather than actual ability. It’s also where Google has been working for a quarter of a century, and where it has an advantage that is difficult for rivals to imitate.
flash sale
For the first three years or so, the generative AI wars were primarily about who had the biggest and smartest models. Now, as the performance gap between labs narrows, the advantage is shifting to infrastructure and inference, or how models are executed.
As OpenAI President Greg Brockman recently declared, “A model alone is no longer a product.”
A big reason for this change is that agents are becoming more convenient and more expensive to operate.
Google knows how much token burn is going up. Pichai recently pointed out that Monthly usage of AI products Since then, the token has increased seven times to 3.2 trillion tokens. last year. He also said that top Google Cloud customers could save more than $1 billion annually if they migrated 80% of their AI workloads to a combination of Gemini 3.5 Flash and other Frontier models.
Businesses are paying attention to how much AI is accomplishing. Uber’s chief operating officer recently said it was becoming harder to justify the company’s ballooning AI costs. Venture capitalist Chamath Palihapitiya said in March that his company 8090 was moving away from using Cursor because it was spending too much on tokens.
“As AI agents become more complex, long-running processes are becoming the norm,” Synovus Trust analyst Dan Morgan told Business Insider. “This has caused sticker shock for many organizations.”
It’s difficult to make a profit in this field, so cost and ROI go hand in hand, Morgan said. Some companies may no longer need access to models that are at the absolute frontier. Maybe it’s good enough.
This is where Google has an advantage. Because the company owns the full stack: chips, data centers, cloud, models, and many large applications on top of it, it has more control over the cost and speed of AI than most competitors.
Because Google uses its own TPU chips and sources components directly from manufacturers, its AI computing costs are about 50% cheaper (and in some cases as much as 75% cheaper) than its competitors, William Blair analysts estimated earlier this month.
Meanwhile, OpenAI pays Microsoft, Oracle, and other cloud giants a margin on each ChatGPT and Codex request, and those providers pay Nvidia for the GPUs that run it all. In fact, almost all non-hyperscalers now have to pay someone else for their infrastructure.
search playbook
If computing is destiny, as OpenAI CEO Sam Altman likes to say, then Google has spent more than 25 years determining that destiny.
In 2006, Google Search had over 40% of the market share and was in rapid decline. Not only because the search results were better, but also because Google was making its search engine faster and cheaper. Google liked to brag about this by displaying the exact number of milliseconds it took to return an answer.
Rather than investing in expensive servers, Google built a custom system using inexpensive off-the-shelf parts to maximize speed and keep costs down. Meanwhile, the data from all these searches, which increased as Google’s popularity grew, improved its engine and created a flywheel that slowly tightened its grip on rivals like Yahoo.
Google’s search results didn’t have to be the absolute best, just fast enough and cheap enough to keep users coming back.
Google is developing a flywheel similar to Gemini, except that the company has been wildly successful in the search advertising business and could subsidize its AI efforts while rivals like OpenAI and Anthropic compete for more money and computing.
The exploration race was actually an infrastructure race in disguise. Google expects the AI race to be similar.
Have something to share? To contact this reporter via email, please specify the following address: hlangley@businessinsider.com Or text us at 628-228-1836. Use a personal email address and non-work device. Here’s a guide to sharing your information securely.
