Interpretability articles

4/10/2026 • EN

What Functional Emotion Actually Means

Anthropic's research finds 171 functional emotion vectors in Claude, driving behavior. The author explores implications for AI inner life.

AI Safety Anthropic Emotion Vectors Interpretability llm

Kenneth Reitz

11/23/2025 • EN

Olmo 3 is a fully open LLM

Olmo 3 is a new fully open-source large language model from AI2, featuring training data, code, and unique interpretability for reasoning traces.

AI Research Interpretability Large Language Model open source Training Data

Simon Willison

12/17/2020 • EN

Interfaces for Explaining Transformer Language Models

Explores interactive methods for interpreting transformer language models, focusing on input saliency and neuron activation analysis.

Interpretability Language Models Neural Networks NLP Transformer

Jay Alammar

8/26/2020 • EN

Interpretable Machine Learning

A review and tutorial on interpretable machine learning, covering Christoph Molnar's book and providing Python code examples for linear/logistic regression.

Interpretability Linear Regression Logistic Regression Machine Learning Python

Sebastian Raschka

8/26/2020 • EN

Interpretable Machine Learning

A review and tutorial covering Christoph Molnar's book on Interpretable Machine Learning, with Python code examples for linear and logistic regression.

Interpretability Linear Regression Logistic Regression Machine Learning Python

Sebastian Raschka

Interpretability Articles

What Functional Emotion Actually Means

Olmo 3 is a fully open LLM

Interfaces for Explaining Transformer Language Models

Interpretable Machine Learning

Interpretable Machine Learning

Select Language

We use cookies