Transformer Architecture articles

3/1/2026 • EN

Hallucinations Aren't Bugs: The Kantian Architecture of AI Consciousness

Explores how AI hallucinations mirror Kant's philosophy of mind, arguing they are inherent to rational thought architecture, not software bugs.

AI Consciousness Kantian Philosophy large language models Philosophy Of Mind Transformer Architecture

Benny Cheung

1/30/2026 • EN

AGI, ASI, A*I – Do we have all we need to get there?

Leading AI researchers debate whether current scaling and innovations are sufficient to achieve Artificial General Intelligence (AGI).

Agi AI Research artificial intelligence Machine Learning Transformer Architecture

John D. Cook

12/19/2025 • EN

Sam Rose explains how LLMs work with a visual essay

A visual essay explaining LLM internals like tokenization, embeddings, and transformer architecture in an accessible way.

Embeddings llm Prompt Caching Tokenization Transformer Architecture

Simon Willison

8/9/2025 • EN

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

Analysis of OpenAI's new gpt-oss models, comparing architectural improvements from GPT-2 and examining optimizations like MXFP4 and Mixture-of-Experts.

Grouped Query Attention LLM Optimization Mixture Of Experts Rope Embeddings Transformer Architecture

Sebastian Raschka

2/22/2025 • EN

DeepSeek’s Multi-Head Latent Attention

A technical deep dive into DeepSeek's Multi-Head Latent Attention mechanism, covering its mathematics and implementation in Julia.

julia Kv Caching Low Rank Adaptation Multi Head Attention Transformer Architecture

Lior Sinai

9/1/2024 • EN

Building LLMs from the Ground Up: A 3-hour Coding Workshop

A 3-hour coding workshop video covering the implementation, training, and use of Large Language Models (LLMs) from scratch.

Coding Workshop Deep Learning llm Machine Learning Transformer Architecture

Sebastian Raschka

5/4/2023 • EN

transformer package on CRAN

Announcing the release of the 'transformer' R package on CRAN, implementing a full transformer architecture for AI/ML development.

artificial intelligence Attention Mechanism R Package Transformer Architecture

Bastiaan Quast

1/27/2023 • EN

The Transformer Family Version 2.0

An updated, comprehensive overview of the Transformer architecture and its many recent improvements, including detailed notation and attention mechanisms.

Attention Mechanisms Deep Learning Natural Language Processing Neural Networks Transformer Architecture

Lilian Weng