Simon Willison • 11/24/2025

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

The article reviews Anthropic's new Claude Opus 4.5 model, detailing its technical specs, pricing, and coding capabilities. The author shares personal experience using the model for software development but concludes that evaluating meaningful differences between successive LLM generations is becoming increasingly challenging.

0 comments

#software development #Claude Opus #LLM Evaluation