Anthropic's Claude Opus 4.5 Tops Coding Benchmarks, Beats GPT-5.1

TL;DR

Claude Opus 4.5 scores 80.9% on SWE-Bench Verified, outperforming OpenAI's GPT-5.1-Codex-Max (77.9%) and Google's Gemini 3 Pro (76.2%) on code generation tasks.

Key Points

Claude Opus 4.5: 80.9% on SWE-Bench Verified coding benchmark
OpenAI GPT-5.1-Codex-Max: 77.9%; Google Gemini 3 Pro: 76.2%
New context window handling: summarizes earlier conversations instead of truncating at 200K tokens
Conversation memory feature available across all Claude models via web, mobile, desktop, and API

Why It Matters

For developers building with LLMs, Claude's improved coding performance and context handling directly impacts code generation quality and longer multi-turn development workflows. The 3-point benchmark lead signals meaningful architectural improvements that could influence enterprise AI infrastructure decisions.

Read full coverage

Source: archive.techpresso.co