Model Architecture articles

6/30/2026 • EN

Build a Reasoning Model From Scratch Is Out

Announcement of the release of 'Build a Reasoning Model (From Scratch)', a book on implementing modern reasoning techniques for AI.

AI Agents llm Model Architecture Reasoning Model Reinforcement Learning

Sebastian Raschka

6/3/2026 • EN

From Mixture of Experts to Mixture of Agents: Sparse Routing Is Escaping the Model

Explores Mixture of Experts (MoE) in AI models, its sparse routing principle, and how it enables large model capacity with low compute cost per token.

Mixture Of Experts Model Architecture Neural Networks Sparse Routing Transformers

Ruslan Magana Vsevolodovna

12/3/2025 • EN

A Technical Tour of the DeepSeek Models from V3 to V3.2

A technical analysis of the DeepSeek model series, from V3 to the latest V3.2, covering architecture, performance, and release timeline.

Deepseek llm Model Architecture Reinforcement Learning Sparse Attention

Sebastian Raschka

12/3/2025 • EN

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

Analysis of DeepSeek V3.2's architecture, sparse attention mechanism, and RL updates compared to its predecessor and proprietary models.

Deepseek llm Model Architecture Reinforcement Learning Sparse Attention

Sebastian Raschka

8/9/2025 • EN

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

Analyzes the architectural advancements in OpenAI's new open-weight gpt-oss models, comparing them to GPT-2 and other modern LLMs.

Gpt Oss llm Model Architecture Openai Transformer

Sebastian Raschka

3/16/2025 • EN

Improving Recommendation Systems & Search in the Age of LLMs

Explores how large language models (LLMs) are transforming industrial recommendation systems and search, covering hybrid architectures, data generation, and unified frameworks.

llm Model Architecture recommendation systems Search Semantic Ids

Eugene Yan

12/31/2024 • EN

2024: A Year in AI Research

A Google researcher's curated review of key AI research papers from 2024, covering LLMs, new architectures, agents, and security.

AI Research large language models LLM Survey Model Architecture prompt engineering

Xavier Amatriain

8/16/2023 • EN

Open challenges in LLM research

An overview of the top 10 open research challenges in Large Language Models (LLMs), focusing on reducing hallucinations and optimizing context learning.

Context Learning Hallucinations LLM Research Model Architecture Multimodality

Chip Huyen

Model Architecture Articles

Build a Reasoning Model From Scratch Is Out

From Mixture of Experts to Mixture of Agents: Sparse Routing Is Escaping the Model

A Technical Tour of the DeepSeek Models from V3 to V3.2

From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

Improving Recommendation Systems & Search in the Age of LLMs

2024: A Year in AI Research

Open challenges in LLM research

Select Language

We use cookies