Home AI Article

Anthropic's Claude Opus 4.5 Tops Coding Benchmarks, Beats GPT-5.1

TL;DR

Claude Opus 4.5 scores 80.9% on SWE-Bench Verified, outperforming OpenAI's GPT-5.1-Codex-Max (77.9%) and Google's Gemini 3 Pro (76.2%) on code generation tasks.

Key Points

  • Claude Opus 4.5: 80.9% on SWE-Bench Verified coding benchmark
  • OpenAI GPT-5.1-Codex-Max: 77.9%; Google Gemini 3 Pro: 76.2%
  • New context window handling: summarizes earlier conversations instead of truncating at 200K tokens
  • Conversation memory feature available across all Claude models via web, mobile, desktop, and API

Why It Matters

For developers building with LLMs, Claude's improved coding performance and context handling directly impacts code generation quality and longer multi-turn development workflows. The 3-point benchmark lead signals meaningful architectural improvements that could influence enterprise AI infrastructure decisions.
Read full coverage

Source: archive.techpresso.co