Token maxing is over. Now it’s all about Modelmaxxing.

Twice a week, Morgan Linton tells 16 engineers which AI models to use and when.

Business Insider spoke with Linton, chief technology officer at Lake Tahoe-based AI startup Bold Metrics, 50 minutes before the engineering team was due to rise. He was going to have one team use Claude Favre on low and another team use GPT-5.5 on high. Third, using Cursor with Composer 2.5 gives “absolutely perfect results,” he said.

Being specialized in using the model means that Linton does not need to set a hard token cap.

“My team now uses the best of the best, but we use it much more efficiently,” he said.

The first half of 2026 was marked by the word “tokenmaxxing” in the AI community. This refers to companies encouraging their employees to use tokenmaxxing. AI as possible. But after reviewing the AI bills their employees were racking up, companies from Uber to Microsoft are taking a more cautious approach.

Founders, software engineers, UX designers, and even non-technical vibecoding enthusiasts are turning to model switching as a cost-saving hack. They route the most difficult and intellectually challenging tasks to more expensive Frontier models, and offload easier and more repetitive tasks to older, cheaper models.

Also, as companies reduce AI budgets and cap usage, this token hygiene strategy could help them reap more profits.

Goodbye, tokenmaxxing

Of course, there are good reasons to use the latest model. OpenAI’s Kaylin Voss wrote on LinkedIn that a better model would “reduce retries, monitoring, and wasted effort.”

However, some tasks may not be worth it at all depending on the cost. Coinbase CEO Brian Armstrong was one of the first to mention this in an X post on June 7th.

“80% of workloads will be running on 99% cheaper models within 12-18 months,” he wrote, adding that the remaining 20% will continue to run on modern models where “maximizing IQ is key.”

Chris Maconi was never a fan of TokenMax. The co-founder of Huntsville-based AI startup Hetura said he runs his company with a “human-involved” attitude and has no intention of setting up bots overnight to keep coding. Model selection is part of this anti-token maxing outlook.

Maconi remembers the OpenClaw hype cycle. The AI agent encapsulated in the Mac Mini was especially token-intensive given its 24/7 usage and extensive autonomy. When setting up OpenClaw, Maconi started with a cheaper Gemini model before switching to Anthropic’s Haiku.

“I wouldn’t hesitate to try some of these lower-end models and see if they can provide the intelligence that we need,” Maconi said.

Extend your tokens in creative ways

Tanvi Pithal, a 29-year-old user experience designer at Big Tech, said she learned the hard way to use models more efficiently.

Pisal uses tools like Figma, ChatGPT, and Claude to brainstorm and create product requirements documents. She has a corporate subscription to ChatGPT and pays for the basic Claude Pro package for $20 per month. Initially, she said she would use Claude to brainstorm the UX from scratch, but despite “wasting months of tokens” in the process, the task was still not completed.

“So what I do now is first design everything in Figma and then put those screenshots into Claude. I leave the UI alone and tell Claude to build out the overall functionality and flow,” Pisal added. “This design-first process saves you tokens.”

She also chooses to brainstorm ideas using ChatGPT, which is available for free thanks to the Enterprise plan, and brings her refined ideas to Claude to create a more polished document.

Alejandra Thomas, a New York City-based software engineer and technology content creator, said she runs tests on every new model released to see what’s great about each.

“I try not to use the most expensive or advanced model just because it’s available. For simple tasks, I always use a lighter model or don’t use it at all,” Thomas said.

Ed Stevens, CEO of AI sales company Scoot, said he likes to “pick a horse and ride it.” His engineers develop a model, try it out for a few months, and then decide whether it’s satisfactory. If there’s a shiny new model, or if they think they can do the same thing for less, they’ll change horses, Stevens says.

The idea of squeezing the juice out of each token embodies the idea of scarcity, said behavioral economist and Duke University professor Dan Ariely.

Ariely said the token budget is reminiscent of cell phones back in the day, when talk time was limited. He said people will try to maximize their call time at the end of the month, even if it means calling someone they don’t really want to.

“Tokens create a model of scarcity where people can’t use them as much as they want. It creates a usage goal and a mentality that if people don’t reach their goals, it’s wasted,” he said. He added that users switch to other companies’ models to save cash when they reach their token limit, as they don’t want to pay extra fees for each use beyond the limit.

There’s a tool for that

Maximizing your AI model may sound like a hassle, but luckily you don’t have to make these switching decisions yourself.

Model routing startups are all the rage. These companies offer software that directs tasks to specific models (sometimes including open source) based on complexity. These are venture hits, and cash is raining down on startups like OpenRouter.

David Gilmore runs one of these companies, Leyline. His tools intercept requests and determine whether they can be moved to a cheaper, often open-source model. He says many of his company’s clients fall prey to “FOMO moments.” Then you get an API bill and realize you need to scale back.

The number of companies using routing platforms is slowly increasing, Lamp’s chief economist Ara Karazian told Business Insider. Last year, Kharazian found that about 1% of companies were using model routers. This year it is 5%.

San Francisco-based investment firm BlockSpaceForce uses OpenRouter, Fireworks, and Together AI. Spencer Yang, the firm’s managing partner, also suggested considering cheaper models first to see if your job requires a more expensive model.

“In fact, the models themselves are getting very good at assessing their own complexity,” Yang says.

Some companies continue to use the latest and most expensive models by default. Maconi, co-founder of Hecura, blamed it on laziness.

“People don’t want the hard work of figuring out which models are better at which features,” he said. “They just want to ride on the hype.”