Model Comparison articles

4/20/2026 • EN

Claude Token Counter, now with model comparisons

Upgraded Claude Token Counter with model comparison, showing token inflation in Opus 4.7 vs 4.6.

Anthropic Claude Model Comparison Token Counting Tokenizer

Simon Willison

4/20/2026 • EN

Claude Token Counter, now with model comparisons

Upgrade to Claude Token Counter adds model comparison, revealing token inflation and cost impacts for Opus 4.7.

Anthropic api Claude Token Counter Model Comparison Tokenizer

Simon Willison

3/14/2026 • EN

New LLM Architecture Gallery

A gallery showcasing and comparing architecture diagrams and technical details of recent open-weight Large Language Models (LLMs).

Implementation LLM Architecture Model Comparison Open Weight Models Technical Explainers

Sebastian Raschka

3/14/2026 • EN

New LLM Architecture Gallery

A gallery showcasing architecture diagrams and technical details for recent open-weight Large Language Models (LLMs).

Implementation Links LLM Architecture Model Comparison Open Weight Models Technical Explainers

Sebastian Raschka

11/24/2025 • EN

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

Analysis of Claude Opus 4.5 LLM release and the growing difficulty in evaluating incremental improvements between AI models.

API Pricing Claude Opus LLM Evaluation Model Comparison software development

Simon Willison

10/5/2025 • EN

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

A guide to the four main methods for evaluating Large Language Models, including code examples and practical implementation details.

benchmarking Fine Tuning LLM Evaluation Model Comparison Reasoning Models

Sebastian Raschka

10/5/2025 • EN

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Explores four main methods for evaluating Large Language Models (LLMs), including code examples for implementing each approach from scratch.

benchmarking Fine Tuning LLM Evaluation Model Comparison Reasoning Models

Sebastian Raschka

7/22/2025 • EN

AIC and combined discrete/continuous models

Explains why AIC comparisons between discrete and continuous statistical models are invalid, using examples with binomial and Normal distributions.

Aic Generalized Linear Models Model Comparison R Statistical Modeling

Thomas Lumley

4/9/2025 • EN

First impressions of the new Gemini Deep Research (with 2.5 Pro)

A hands-on review of Google's updated Gemini Deep Research tool with the 2.5 Pro model, covering its features, usability, and areas for improvement.

AI Research Tools Gemini Deep Research Google AI Model Comparison Report Generation

Alex Strick van Linschoten

7/5/2024 • EN

Anthropic Claude 3 vs. Claude 3.5: A Comprehensive Comparison

A detailed comparison of Anthropic's Claude 3 and the newer Claude 3.5 Sonnet AI models, covering performance, capabilities, and benchmarks.

Anthropic artificial intelligence Claude llm Model Comparison

Varun Kumar

4/8/2024 • EN

Comparing LLMs on "Real-World" Retrieval

A developer compares 8 LLMs on a custom retrieval task using medical transcripts, analyzing performance on simple to complex questions.

Data Wrangling Instruction Tuning LLM Evaluation Model Comparison Retrieval Benchmarks

Shreya Shankar

Model Comparison Articles

Claude Token Counter, now with model comparisons

Claude Token Counter, now with model comparisons

New LLM Architecture Gallery

New LLM Architecture Gallery

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

AIC and combined discrete/continuous models

First impressions of the new Gemini Deep Research (with 2.5 Pro)

Anthropic Claude 3 vs. Claude 3.5: A Comprehensive Comparison

Comparing LLMs on "Real-World" Retrieval

Select Language

We use cookies