AI bills are out of control. Cloudflare can fix it now.

There isn’t a CIO on the planet today who isn’t concerned about spending on AI. CFOs are also becoming increasingly nervous.

Many companies have encouraged their employees to use AI as proactively as possible for fear of falling behind. The edict was clear. “Hurry up, we’ll settle the bill later.” And most of the time, it worked. AI has been a real game changer for the teams we collaborated with.

But the costs are real. We’ve heard countless horror stories about huge bills and painful overspending on tokens.

Today, we are announcing a closed beta of Cloudflare AI Gateway spend management and identity-driven budgeting and routing using Cloudflare Access and your existing identity provider.

As we speak to hundreds of companies about their AI strategies, we’ve begun to see a common story. That means the company provides access to Frontier models to all engineers through a shared API key. Usage begins. At the end of the month, you receive an invoice from the finance department, but no one can explain where the money went. Was it a machine learning team training a new pipeline? Was it an intern running Claude Opus on email triage? Was it a runaway continuous integration job that burned through 50 million tokens in a weekend? No one knows because the API key doesn’t tell who used it.

Without guidelines, staff typically reach for the largest model available. And why wouldn’t you? If you don’t have the budget, visibility, or routing logic, it makes sense to use the most powerful model for everything. The problem is that most tasks don’t require a frontier model. A code review overview does not require the same model as a complex architectural refactoring. Log parsers do not require the same model as customer-facing content generators. It should be easy to choose right Choose the right tool for the job rather than defaulting to the most powerful and expensive tool. And you need to be able to easily see where your spending is going.

You can’t calculate the ROI of your AI spending without visibility into what you’re spending. And without control, you can’t protect your ROI. Every other item in your business has a budget, and per-team attribution and AI spend is no exception.

AI gateway It sits between the application and the AI provider. Rather than calling OpenAI, Anthropic, Google, or other providers directly, requests are first routed through the AI Gateway.

This gives you some useful tools right away.

But AI Gateway didn’t have an easy way to answer who was spending what, or how to set limits on AI spending.

You can check the total usage across your account. But what we didn’t know was that Jane from Engineering spent $2,000 on Claude this month, while the entire data science team only spent $400. You can’t set a budget like “engineers get $5,000 a month on a Frontier model, interns get $200 a month on a Kimi K2.6.”

That changes today.

Spending limits: AI usage budget

AI Gateway now supports spending limit As a core feature. These are true cost control measures in the form of budgets set in dollars rather than tokens, track cumulative spending across all requests, and operate independently of traditional rate limits.

You can apply restrictions to any combination of dimensions, including models, providers, or administrator-defined custom attributes such as users, teams, and applications. Windows can be fixed (resetting at the first of the month, Monday, or midnight) or rolling, daily, weekly, or monthly.

AI Gateway calculates the cost per request based on model pricing and tracks cumulative spending against limits in real time. Easily track model spend in the analytics dashboard and filter by model, provider, or custom attributes.

You have options for what happens when you reach your budget limit. AI Gateway blocks further requests by default. Or you can set the rules dynamic route Route requests to a fallback model after spending limits are reached, ensuring that hard spending limits don’t disrupt engineer workflows. We are working on adding the ability to send alerts when limits are reached.

Spending limits are available in open beta for all AI Gateway users on all plans. Configure in the gateway settings on the dashboard or via the API.

We are already tracking token costs within Cloudflare. Every Cloudflare employee uses AI tools every day to route millions of requests and billions of tokens through our AI gateways every month. We faced the same questions that any company faces when it comes to this size. Who uses what and how to budget it?

We solved this problem by allowing AI Gateway to add an ID to every request. When an employee authenticates through Cloudflare Access, the employee’s identity is extracted from a JSON Web Token (JWT) and attached as metadata in the AI Gateway request. This allows you to see token consumption per user, team-level usage breakdown, and cost attribution across your organization, all in one place.

Identity-driven budgets and policies (closed beta)

In addition to spending limits, today we are also announcing identity-driven budgets and policies in closed beta.

AI Gateway spending limits allow you to set budgets by model, provider, or custom attribute. However, your application must pass its metadata, and AI Gateway will trust anything it receives. An ID is required to get verified automatic attribution.

When combined with Access to CloudflareAI Gateway can see who is making each request, not only which account, but also which employee, which identity provider (IdP) group, which service, etc.

Here’s how it actually works:

You can set a budget for each user. For example, you can set $500 per month for individual contributors and $2,000 per month for senior engineers. If a user reaches the limit, requests may be downgraded to a cheaper model or blocked.

You can set model policies for each team. For example, ML teams acquire Claude Opus and GPT-4o. Brand design teams have access to generated images and video models. Interns will use Workers AI’s open source model. These policies map directly to existing IdP groups, the same identity provider groups you already manage.

for CI/CD pipelines and autonomous agents, Access the service token Each agent can be given a named ID. This week, you can see that the code review bot used 5 million tokens and the documentation generator used 500,000 tokens. If one agent becomes out of control, apply budget policies without affecting other agents.

All AI Gateway log entries include the authenticated identity (email, IdP group, service token name). Export these to an analytics platform to see a breakdown of costs by user and team without having to build anything custom.

Internally, we create a Cloudflare Access application for the AI Gateway endpoint and configure policies based on IdP groups. When a developer or agent makes a request, they are authenticated via OAuth using the typical CLI device code flow. AI Gateway validates the token and extracts the ID. You don’t have to write custom workers, parse JWTs yourself, or rely on the honor system’s metadata headers.

we recently wrote about How we built our in-house AI engineering stack. This is what we have made available today – so you can use it too and don’t have to build it yourself.

If you would like access to the closed beta, please Sign up here.

What’s next: From cost management to cost optimization

You need to set a budget. But once you’ve decided on a budget, how can you make the most of it?

In reality, not all requests require a frontier model. Summarization tasks can be performed on smaller, cheaper models without significant quality loss, but large-scale code refactoring may require modern techniques. But without control, people will almost always choose the most advanced model.

That solution comes next. We’re building intelligent task-based routing with AI Gateway. Each request can be analyzed and automatically routed to the model that provides the best results at the lowest cost. This is currently under development, so please follow us. developer documentation and Change history.

You can start using AI Gateway for free. Spending limits are now available to all users.

If you don’t have it yet, Create a gateway and point your application there. From there, set spending limits via your dashboard or API. Start by setting limits in monitor mode to understand current usage patterns before you start enforcing them.

Need per-user attribution and team-based policies? sign up The closed beta of identity-driven budgeting sets up Access integration.

We want to hear how you are currently managing your AI costs. Join the conversation Cloudflare Community or reach out your hand Discuss your broader AI security strategy.

Source link