Transformer Models articles

7/19/2025 • EN

The Big LLM Architecture Comparison

A detailed comparison of architectural developments in major large language models (LLMs) released in 2024-2025, focusing on structural changes beyond benchmarks.

Attention Mechanisms LLM Architecture Mixture Of Experts Normalization Layers Transformer Models

Sebastian Raschka

7/19/2025 • EN

The Big LLM Architecture Comparison

A technical comparison of architectural changes in major Large Language Models (LLMs) from 2024-2025, focusing on structural innovations beyond benchmarks.

Attention Mechanisms LLM Architecture Mixture Of Experts Normalization Layers Transformer Models

Sebastian Raschka

9/30/2024 • EN

Transformers Create Shapes of the Universe

Explores the philosophical argument that AI, particularly LLMs, possess a form of understanding and model reality, challenging the notion they are mere token predictors.

artificial intelligence llm Philosophy Of AI Transformer Models Understanding

Daniel Miessler

6/22/2023 • EN

Takeaways from DeepMind's RoboCat Paper

A summary and analysis of DeepMind's RoboCat paper, a self-improving foundation agent for robotic manipulation using Transformer models.

Deepmind Machine Learning Robotics Transfer Learning Transformer Models

Eric Jang

1/10/2023 • EN

Large Transformer Model Inference Optimization

Explores techniques to optimize inference speed and memory usage for large transformer models, including distillation, pruning, and quantization.

Attention Mechanism Inference Optimization Kv Cache Model Compression Transformer Models

Lilian Weng

9/13/2022 • EN

Accelerate GPT-J inference with DeepSpeed-Inference on GPUs

Learn to optimize GPT-J inference using DeepSpeed-Inference and Hugging Face Transformers for faster GPU performance.

Deepspeed Inference Gpt J Gpu Optimization large language models Transformer Models

Philipp Schmid

1/19/2021 • EN

Finding the Words to Say: Hidden State Visualizations for Language Models

Explores visualizing hidden states in Transformer language models to understand their internal decision-making process during text generation.

Hidden States Language Models Model Visualization Neural Networks Transformer Models

Jay Alammar

10/29/2020 • EN

How to Build an Open-Domain Question Answering System?

A technical overview of approaches for building open-domain question answering systems using pretrained language models and neural networks.

AI Assistant Language Models Neural Networks Open Domain Question Answering Transformer Models

Lilian Weng

Transformer Models Articles

The Big LLM Architecture Comparison

The Big LLM Architecture Comparison

Transformers Create Shapes of the Universe

Takeaways from DeepMind's RoboCat Paper

Large Transformer Model Inference Optimization

Accelerate GPT-J inference with DeepSpeed-Inference on GPUs

Finding the Words to Say: Hidden State Visualizations for Language Models

How to Build an Open-Domain Question Answering System?

Select Language

We use cookies