Anthropic’s new AI model dominates coding and inference benchmarks; takes on GPT-5.2

AI News


Anthropic officially debuts Claude Opus 4.6, its most advanced AI model to date, with significant improvements in inference, coding, and long-context handling. This release increases competition from OpenAI’s GPT and Google’s Gemini by claiming state-of-the-art performance on key benchmarks for economically valuable work and agent coding.

Technical specifications and main features

Claude Opus 4.6 represents a major leap forward in functionality with a beta 1 million token context window, a first for the Opus model line. This allows models to process and retain information across very long documents, codebases, or analysis sessions while reducing “context rot.” The model supports output of up to 128,000 tokens and introduces new developer controls such as adaptive thinking for inference depth and context compression for enhanced agent workflows.

Benchmark performance and features

Anthropic positions Opus 4.6 as a leader in complex, autonomous tasks. This model achieves the highest scores in several important evaluations.

may be of interest

  • Terminal-Bench 2.0: Leading agent coding performance.
  • The last test of humanity: culminate in this interdisciplinary reasoning test.
  • GDPval-AA: Outperforms OpenAI’s GPT-5.2 by approximately 144 Elo points in banking and legal analysis tasks, according to the report.
  • MRCR v2: Scored 76% on this “needle in the hay” retrieval test within a 1M token context, a significant improvement over previous models.

The company is focused on improved code review, debugging performance, and the ability to maintain long-running agent workflows with higher planning accuracy.

Enhanced safety and security

According to the system cards released by Anthropic, the increased performance does not come at the expense of safety. Opus 4.6 shows a lower incidence of incorrect behaviors such as deception and fewer unnecessary rejections compared to previous Claude models. In response to model improvements, Anthropic introduced new cybersecurity probes to assess both defensive and offensive security potential.

APIs, product integration, and availability

This model is immediately available across major cloud platforms via the Anthropic API on claude.ai. Key product integrations include:

  • Claude Code: Added “Agent Teams” to review large codebases in parallel.
  • Co-work environment: Combine talents such as analysis and documentation to enable the execution of autonomous, multi-step tasks.
  • Office Suite: Research preview of Excel upgrade and PowerPoint integration for Max, Team, and Enterprise users.

Prices remain unchanged at $5 per million input tokens and $25 per million output tokens.

Analysis: AI’s impact on the competitive landscape

The release of Opus 4.6 allows us to go head-to-head with our competitors at the cutting edge of AI, especially in areas that require deep inference across large datasets. Anthropic aims to serve the needs of high-value enterprises and developers by improving coding independence, financial analysis, and long-context accuracy. Strong benchmarking results, especially for GDPval-AA, point to a clear strategy for gaining ground in professional and analytical applications.

FAQ:

Q: What is the context window in Claude Opus 4.6?

A: Claude Opus 4.6 introduced a 1 million token context window in beta, allowing you to process more information in a single session.

Q: How does Opus 4.6 perform compared to GPT-5.2?

A: According to Anthropic, Opus 4.6 outperforms GPT-5.2 by approximately 144 Elo points on the GDPval-AA benchmark, which measures performance in financial and legal activities.

Q: Is Claude Opus 4.6 available now?

A: Yes, models are currently available on claude.ai, Anthropic API, and major cloud platforms.



Source link