Claude Opus 4.5 has arrived! Accuracy greatly surpasses GPT-5.1 and Gemini 3, Rakuten: self-evolution is strong.

ChainNewsAbmedia

2025-11-25 08:54:29

Just one week after Google launched Gemini 3, Anthropic also announced the release of its latest flagship model Claude Opus 4.5 on 11/25. They stated that this version has significantly upgraded capabilities in programming, AI agent operations, and using computer applications, and it can also handle longer dialogues. Anthropic's developer relations head, Alex Albert, even mentioned during an interview: “This is the smartest model in the world.”

Claude Opus 4.5 The strongest highlights at a glance

Highlight 1: Performance surpasses GPT-5.1 and Gemini 3, enhancing proxy applications.

The official positioning of Opus 4.5 is as “one of the world's strongest models,” and it will be available from now on on the App, API, and three major cloud platforms (AWS, GCP, Azure). From the AI model performance comparison chart provided by Anthropic, it can be observed:

“Opus 4.5 has an accuracy rate of up to 80.9%, surpassing Gemini 3 Pro and GPT-5.1.”

The official statement indicates that this time Opus 4.5 stands out particularly in programming, AI Agents, multi-step reasoning, and computer tool operations, with noticeable enhancements in general tasks such as lengthy research, PowerPoint, Excel, and other applications.

The new pricing is set at 5 USD per million input tokens and 25 USD per million output tokens, making it more accessible than the previous generation Opus 4.1, allowing more businesses and teams to adopt Opus-level features.

Highlight 2: The internal testing received consistent positive feedback, able to understand and solve problems.

Anthropic revealed that after releasing the beta version, team members provided unanimous feedback. In particular:

“Opus 4.5 can handle some ambiguous issues and reasoning trade-offs, and will explore solutions on its own when encountering complex bugs in multi-system environments.”

The task that Sonnet 4.5 could hardly accomplish is now achievable with Opus 4.5. Testers generally express that Opus 4.5 understands “the user's intentions” very well, and the officials also believe that this brings a significant difference in experience.

Windsurf, GitHub and other CEOs all endorse Opus 4.5. Highlight 3: Innovative record in programming tests, two hours of exam performance surpassing humans.

Anthropic pointed out that the company uses a highly challenging practical test when recruiting engineers. This time, within the same two-hour answering limit, Claude Opus 4.5's performance actually surpassed all human job applicants in history, setting a new record.

The official supplement states that this test primarily assesses technical abilities and judgment under pressure, without involving soft skills such as cooperation and communication. However, the results from this time show that AI is advancing at a very fast pace in the pure technical aspects of the engineering field.

Highlight 4: Enhanced security, making it harder to fall victim to prompt injection attacks.

Anthropic emphasizes that Opus 4.5 is the “most aligned and also the safest” model version to date.

The focus of this security upgrade is that the model's resistance to prompt injection attacks has significantly improved, making it difficult for malicious commands to be embedded in the model and harder to deceive the system into executing improper actions. Compared to other cutting-edge models, Opus 4.5 also achieved the best results in relevant security tests. From the image below, it can be seen:

“Opus 4.5 is the least susceptible to being deceived and the least likely to be successfully attacked by prompt injection under the same test conditions as other well-known models, demonstrating outstanding defensive performance.”

Highlight 5: Continuous long conversations without disconnects, with comprehensive improvements in experience on Chrome and the App.

Anthropic has also updated several products. First, the Plan Mode of Claude Code has been further upgraded, which will clarify the questions and automatically generate an editable plan.md before executing the program. The desktop version has also added multiple sessions, allowing multiple agents to perform different tasks simultaneously.

The Claude App commonly used by general users has also been improved, so long conversations no longer get stuck due to lengthy context; the system will automatically organize previous content to keep the conversation flowing. Claude for Chrome is now fully accessible to Max users, allowing for complex operations across multiple tabs.

Claude for Excel was originally limited to Beta users, but is now expanded to Max, Team, and Enterprise users, integrating Opus 4.5 to enhance spreadsheet and data processing capabilities. Finally, Anthropic has also increased the overall usage limit, removing the Opus exclusive restrictions, allowing Max and Team Premium users to use Opus 4.5 at a “normal workload” level. If a stronger model is released in the future, the related usage will also be adjusted.

( Note:

plan.md

It is not an external file, but a “task plan document” automatically generated by Claude Code before executing tasks, formatted in common Markdown. )

2.6: Rakuten pointed out that Opus 4.5 has self-evolution capabilities.

Among them, a special highlight is that Japan's Rakuten ( pointed out that Claude Opus 4.5 has shown significant breakthroughs in self-evolving AI agents.

In the practical application of office automation, the related agents can optimize their capabilities on their own, achieving optimal performance in just four iterations, while other models cannot match the same quality even after running ten times.

Rakuten emphasizes that this difference allows Opus 4.5 to demonstrate higher efficiency in enterprise-level applications.

This article introduces Claude Opus 4.5! Its accuracy significantly outperforms GPT-5.1 and Gemini 3. Rakuten: Self-evolution is strong. First appeared in Chain News ABMedia.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.