Google AI releases Gemini 3.1 Pro with 1 million token contexts and 77.1 percent ARC-AGI-2 inference for AI agents

Machine Learning


Google has officially kicked the Gemini era into high gear. gemini 3.1 proThe first version update for the Gemini 3 series. This release is more than just a minor patch. This is an attack targeting the “agent” AI market, focusing on the stability of inference, software engineering, and reliability of tool usage.

For developers, this update signals a transition. We are moving from a model of simply “chatting” to a model of “working.” Gemini 3.1 Pro is designed to be the core engine for autonomous agents that can navigate file systems, execute code, and reason about scientific problems with success rates that rival, and in some cases exceed, the industry’s most elite frontier models.

Lots of context, accurate output

One of the most immediate technology upgrades is handling scale. Gemini 3.1 Pro preview maintains extensive features 1 million tokens Input context window. To put this into perspective for software engineers, it allows you to feed an entire medium-sized code repository into your model, ensuring it has enough “memory” to understand dependencies between files without losing plot.

But the real news is 65,000 tokens Output limit. This 65k window is a significant jump for developers building long-form generators. Whether you’re generating a 100-page technical manual or a complex multi-module Python application, your model can now complete the job in a single turn without suddenly hitting a “max tokens” wall.

double down on reasoning

If Gemini 3.0 is about introducing “deep thinking,” Gemini 3.1 is about streamlining that thinking. The performance improvement on rigorous benchmarks is notable.

benchmark Score Measurement details
Arc-AGI-2 77.1% Ability to solve completely new logic patterns
GPQA Diamond 94.1% Graduate level scientific reasoning
Psycode 58.9% Python programming for scientific computing
terminal bench hard 53.8% Agent coding and terminal usage
Humanity’s Last Exam (HLE) 44.7% Reasoning for something close to human limits

of 77.1% The ARC-AGI-2 diagram is the heading diagram here. The Google team claims this equates to more than double the inference performance of the original Gemini 3 Pro. This means that the model is much less likely to rely on pattern matching from training data and has a better ability to “figure it out” when faced with new edge cases in the dataset.

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

Agentic Toolkit: Custom tools and “anti-gravity”

The Google team has a clear strategy for developer devices. Along with the main model, we launched a specialized endpoint. gemini-3.1-pro-preview-customtools.

This endpoint is optimized for developers who mix bash commands and custom functions. In previous versions, the model often had a hard time prioritizing which tools to use, and would sometimes display hallucinatory searches when reading local files was sufficient. of customtools The variants are specifically tailored to favor tools such as: view_file or search_codebecoming a more reliable backbone for autonomous coding agents.

This release also deeply integrates with: Google antigravitythe company’s new agent development platform. New features are now available to developers. “Moderate” thinking level. This allows you to toggle the “inference budget”. This means you can use high-level thinking for complex debugging, while going down to medium or low for standard API calls to save on latency and cost.

Breaking API changes and new file methods

For those already building on top of the Gemini API, there is a small but important breaking change. in Interaction API v1 Beta,field total_reasoning_tokens has been renamed to total_thought_tokens. This change is consistent with the introduction of “thought signatures” in the Gemini 3 family, i.e., encrypted representations of the model’s internal reasoning that must be passed back to the model to maintain context in multi-turn agent workflows.

The appetite for data in models is also increasing. The main updates to file handling are as follows:

  • 100MB file limit: The previous 20MB limit for API uploads has been increased by 5x. 100MB.
  • Direct support from YouTube: Now, YouTube URL Directly as a media source. This model “watches” videos via a URL rather than requiring manual upload.
  • Cloud integration: support for Cloud Storage bucket Use a private database signed URL directly as a data source.

economics of intelligence

Gemini 3.1 Pro Preview pricing remains bullish. For prompts with less than 200,000 tokens, the input cost is: $2 per million tokensthe output is $12 per million. For contexts over 200k, the price is $4 for input and $18 for output.

When compared to competing products such as Claude Opus 4.6 and GPT-5.2, the Google team positions Gemini 3.1 Pro as an “efficiency leader.” According to data from artificial analysisGemini 3.1 Pro currently holds the top spot in the Intelligence Index, but runs about half the cost of its closest Frontier product.

Important points

  • Large 1M/65K context window: The model is 1 million tokens Significantly upgrades output limits while enhancing input windows for large data and repositories. 65,000 tokens For generating long-form code and documentation.
  • Leap in logic and reasoning: performance at Arc-AGI-2 benchmark reached 77.1%represents more than twice the inference power of previous versions. Also, 94.1% GPQA Diamond for graduate level science tasks.
  • Dedicated agent endpoint: Google team introduces specializations gemini-3.1-pro-preview-customtools the last stop. Specifically optimized for prioritizing bash command and system tools ( view_file and search_code) for more reliable autonomous agents.
  • Breaking API changes: Developers must update the codebase as fields total_reasoning_tokens has been renamed to total_thought_tokens The v1beta Interaction API allows you to better interact with your model’s internal “thinking” processing.
  • Enhanced file and media handling: API file size limit increased from 20MB to 20MB. 100MB. Additionally, developers can now pass the following paths: YouTube URL Prompts can be filled directly, allowing models to analyze video content without downloading or re-uploading files.

Please check technical details and Try it here. Please feel free to follow us too Twitter Don’t forget to join us 100,000+ ML subreddits and subscribe our newsletter. hang on! Are you on telegram? You can now also participate by telegram.




Source link