Simon Willison 11/24/2025

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

Read Original

The article reviews Anthropic's new Claude Opus 4.5 model, detailing its technical specs, pricing, and coding capabilities. The author shares personal experience using the model for software development but concludes that evaluating meaningful differences between successive LLM generations is becoming increasingly challenging.

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

Comments

No comments yet

Be the first to share your thoughts!