Mixture Of Experts articles

7/7/2026 • EN

tencent/Hy3

Tencent releases Hy3, a 295B-parameter MoE model outperforming larger open-source models, available for free on OpenRouter.

llm Mixture Of Experts Model open source Tencent

Simon Willison

7/7/2026 • EN

tencent/Hy3

Tencent releases Hy3, a 295B-parameter MoE model outperforming larger open-source models, available free on OpenRouter.

Large Language Model Mixture Of Experts open source Tencent

Simon Willison

6/12/2026 • EN

North Mini Code and Agentic Coding Benchmarks

Analysis of Cohere's new North Mini Code model for agentic coding tasks, including architecture and benchmark performance.

Agentic Coding Cohere Mixture Of Experts North Mini Code Swe Bench

Sebastian Raschka

6/3/2026 • EN

From Mixture of Experts to Mixture of Agents: Sparse Routing Is Escaping the Model

Explores Mixture of Experts (MoE) in AI models, its sparse routing principle, and how it enables large model capacity with low compute cost per token.

Mixture Of Experts Model Architecture Neural Networks Sparse Routing Transformers

Ruslan Magana Vsevolodovna

$DeepSeek V4 - almost on the frontier, a fraction of the price$

4/24/2026 • EN

DeepSeek V4 - almost on the frontier, a fraction of the price

DeepSeek releases V4 Pro and Flash AI models, offering frontier-level performance at significantly lower costs.

AI Model Comparison Deepseek V4 LLM Pricing Mixture Of Experts Open Weights

Simon Willison

$DeepSeek V4 - almost on the frontier, a fraction of the price$

4/24/2026 • EN

DeepSeek V4 - almost on the frontier, a fraction of the price

DeepSeek V4 preview models offer frontier-level performance at a fraction of the cost, with up to 1M token context and open weights.

AI Inference Deepseek V4 LLM Pricing Mixture Of Experts Open Weights

Simon Willison

3/19/2026 • EN

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Explores using Apple's 'LLM in a Flash' research to run a massive 397B parameter AI model locally on a MacBook by streaming weights from SSD.

Apple Mlx LLM Inference memory optimization Mixture Of Experts Quantization

Simon Willison

3/19/2026 • EN

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Explores using Apple's 'LLM in a Flash' research to run a massive 397B parameter AI model locally on a MacBook by streaming weights from SSD.

LLM Inference memory optimization Mixture Of Experts Mlx Quantization

Simon Willison

3/17/2026 • EN

Introducing Mistral Small 4

Mistral AI releases Mistral Small 4, a new 119B parameter open model combining reasoning, multimodal, and coding capabilities.

api Lean 4 llm Mistral AI Mixture Of Experts

Simon Willison

3/17/2026 • EN

Introducing Mistral Small 4

Mistral AI releases Mistral Small 4, a new 119B parameter open model combining reasoning, multimodal, and coding capabilities.

api llm Mistral AI Mixture Of Experts open source

Simon Willison

3/12/2026 • EN

Nemotron 3 Super Throughput Notes

Analysis of NVIDIA's Nemotron 3 Super 120B-A12B model focusing on its accuracy-throughput trade-off design and efficiency features.

Gqa Mamba 2 Mixture Of Experts Speculative Decoding Throughput Optimization

Sebastian Raschka

9/6/2025 • EN

Understanding and Implementing Qwen3 From Scratch

A hands-on tutorial implementing the Qwen3 large language model architecture from scratch using pure PyTorch, explaining its core components.

llm Mixture Of Experts Pytorch Qwen3 Transformer

Sebastian Raschka

9/6/2025 • EN

Understanding and Implementing Qwen3 From Scratch

A hands-on guide to understanding and implementing the Qwen3 large language model architecture from scratch using pure PyTorch.

llm Mixture Of Experts Pytorch Qwen3 Transformer

Sebastian Raschka

8/9/2025 • EN

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

Analysis of OpenAI's new gpt-oss models, comparing architectural improvements from GPT-2 and examining optimizations like MXFP4 and Mixture-of-Experts.

Grouped Query Attention LLM Optimization Mixture Of Experts Rope Embeddings Transformer Architecture

Sebastian Raschka

7/19/2025 • EN

The Big LLM Architecture Comparison

A technical comparison of architectural changes in major Large Language Models (LLMs) from 2024-2025, focusing on structural innovations beyond benchmarks.

Attention Mechanisms LLM Architecture Mixture Of Experts Normalization Layers Transformer Models

Sebastian Raschka

7/19/2025 • EN

The Big LLM Architecture Comparison

A detailed comparison of architectural developments in major large language models (LLMs) released in 2024-2025, focusing on structural changes beyond benchmarks.

Attention Mechanisms LLM Architecture Mixture Of Experts Normalization Layers Transformer Models

Sebastian Raschka