Attention Mechanism articles

3/8/2026 • EN

Differentiable Memory

Explores differentiable memory using attention mechanisms and linear algebra, with a practical implementation in Jax/Optax.

Attention Mechanism Differentiable Memory Jax Neural Networks Optax

Emir U

10/1/2025 • EN

A History of Large Language Models

A detailed academic history tracing the core ideas behind large language models, from distributed representations to the transformer architecture.

Attention Mechanism Generative Pre Training large language models Neural Networks Transformer

Richard Feynman

6/17/2025 • EN

Understanding and Coding the KV Cache in LLMs from Scratch

Explains the KV cache technique for efficient LLM inference with a from-scratch code implementation.

Attention Mechanism Autoregressive Generation Kv Cache LLM Inference Transformer Optimization

Sebastian Raschka

6/17/2025 • EN

Understanding and Coding the KV Cache in LLMs from Scratch

A technical tutorial explaining the concept and implementation of KV caches for efficient inference in Large Language Models (LLMs).

Attention Mechanism Kv Cache LLM Inference Memory Efficiency Transformer Optimization

Sebastian Raschka

3/6/2025 • EN

Understanding Attention in LLMs

A clear explanation of the attention mechanism in Large Language Models, focusing on how words derive meaning from context using vector embeddings.

Attention Mechanism llm Machine Learning Natural Language Processing Transformers

Bartosz Milewski

5/21/2023 • EN

Some Intuition on Attention and the Transformer

Explains the intuition behind the Attention mechanism and Transformer architecture, focusing on solving issues in machine translation and language modeling.

Attention Mechanism Deep Learning llm NLP Transformer

Eugene Yan

5/4/2023 • EN

transformer package on CRAN

Announcing the release of the 'transformer' R package on CRAN, implementing a full transformer architecture for AI/ML development.

artificial intelligence Attention Mechanism R Package Transformer Architecture

Bastiaan Quast

2/7/2023 • EN

Understanding Large Language Models -- A Transformative Reading List

A curated reading list of key academic papers for understanding the development and architecture of large language models and transformers.

Attention Mechanism large language models Machine Learning Natural Language Processing Transformers

Sebastian Raschka

2/7/2023 • EN

Understanding Large Language Models -- A Transformative Reading List

A curated reading list of key academic papers for understanding the development and architecture of large language models and transformers.

Attention Mechanism large language models Machine Learning Natural Language Processing Transformers

Sebastian Raschka

1/10/2023 • EN

Large Transformer Model Inference Optimization

Explores techniques to optimize inference speed and memory usage for large transformer models, including distillation, pruning, and quantization.

Attention Mechanism Inference Optimization Kv Cache Model Compression Transformer Models

Lilian Weng

10/22/2022 • EN

An Intuition for Attention

A technical explanation of the attention mechanism in transformers, building intuition from key-value lookups to the scaled dot product equation.

Attention Mechanism Deep Learning Machine Learning Neural Networks Transformers

Jay Mody

7/27/2020 • EN

How GPT3 Works - Visualizations and Animations

A visual guide explaining how GPT-3 is trained and generates text, breaking down its transformer architecture and massive scale.

Attention Mechanism Gpt3 Language Models Neural Networks Transformers

Jay Alammar

4/7/2020 • EN

The Transformer Family

An updated overview of the Transformer model family, covering improvements for longer attention spans, efficiency, and new architectures since 2020.

Attention Mechanism Machine Learning Neural Networks NLP Transformer

Lilian Weng

6/24/2018 • EN

Attention? Attention!

Explains the attention mechanism in deep learning, its motivation from human perception, and its role in improving seq2seq models like Transformers.

Attention Mechanism Deep Learning Machine Learning Neural Networks Transformer

Lilian Weng

4/1/2018 • EN

The Annotated Transformer

An annotated, line-by-line implementation of the Transformer architecture from 'Attention is All You Need' in PyTorch.

Attention Mechanism Natural Language Processing Neural Networks Pytorch Transformer

Alexander Rush

3/22/2018 • EN

Gated Multimodal Units for Information Fusion

Explains the Gated Multimodal Unit (GMU), a deep learning architecture for intelligently fusing data from different sources like images and text.

Attention Mechanism Deep Learning Multimodal Fusion Neural Networks Tensorflow

Yoel Zeldes

12/17/2017 • EN

Training Sequence Models with Attention

Practical tips for training sequence-to-sequence models with attention, focusing on debugging and ensuring the model learns to condition on input.

Attention Mechanism Deep Learning Language Model Neural Networks Sequence To Sequence

Awni Hannun

Attention Mechanism Articles

Differentiable Memory

A History of Large Language Models

Understanding and Coding the KV Cache in LLMs from Scratch

Understanding and Coding the KV Cache in LLMs from Scratch

Understanding Attention in LLMs

Some Intuition on Attention and the Transformer

transformer package on CRAN

Understanding Large Language Models -- A Transformative Reading List

Understanding Large Language Models -- A Transformative Reading List

Large Transformer Model Inference Optimization

An Intuition for Attention

How GPT3 Works - Visualizations and Animations

The Transformer Family

Attention? Attention!

The Annotated Transformer

Gated Multimodal Units for Information Fusion

Training Sequence Models with Attention

Select Language

We use cookies