Simon Willison 11/24/2025

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

Read Original

The article reviews Anthropic's new Claude Opus 4.5 model, detailing its technical specs, pricing, and coding capabilities. The author shares personal experience using the model for software development but concludes that evaluating meaningful differences between successive LLM generations is becoming increasingly challenging.

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

1
The Beautiful Web
Jens Oliver Meiert 2 votes
2
Container queries are rad AF!
Chris Ferdinandi 2 votes
3
Wagon’s algorithm in Python
John D. Cook 1 votes
5
Top picks — 2026 January
Paweł Grzybek 1 votes
6
In Praise of –dry-run
Henrik Warne 1 votes
8
Vibe coding your first iOS app
William Denniss 1 votes