Ferenc Huszár

Ferenc Huszár is a Professor of Machine Learning at the University of Cambridge and founder of Reasonable, a deep tech startup building advanced programming LLMs. His research focuses on learning theory, reasoning, and inductive biases in deep learning.

https://www.inference.vc

RSS Feed

1/23/2026

machine learning research LLM theory AI reasoning deep learning AI startups

Articles from this Blog

15 articles from this blog

1/31/2026 • EN

Deep Learning is Powerful Because It Makes Hard Things Easy - Reflections 10 Years On

A reflection on a decade-old blog post about deep learning, examining past predictions on architecture, scaling, and the field's evolution.

Machine Learning Neural Networks Transformers

5/22/2025 • EN

Discrete Diffusion: Continuous-Time Markov Chains

Explores continuous-time Markov chains as a foundation for understanding discrete diffusion models in machine learning.

Machine Learning Diffusion Models Markov Chains

6/8/2023 • EN

We may finally crack Maths. But should we?

Explores the potential and implications of using AI to automate mathematical theorem proving, framing it as a 'tame' problem solvable by machines.

artificial intelligence llm mathematics

5/30/2023 • EN

Mortal Komputation: On Hinton's argument for superhuman AI.

Analyzes Geoffrey Hinton's technical argument comparing biological and digital intelligence, concluding digital AI will surpass human capabilities.

artificial intelligence Neural Networks Deep Learning

3/30/2023 • EN

Autoregressive Models, OOD Prompts and the Interpolation Regime

Explores autoregressive models, their relationship to joint distributions, and how they handle out-of-distribution prompts, with insights relevant to LLMs.

Machine Learning llm Autoregressive Models

3/22/2023 • EN

We May be Surprised Again: Why I take LLMs seriously.

A reflection on past skepticism of deep learning and why similar dismissal of Large Language Models (LLMs) might be a mistake.

Machine Learning artificial intelligence llm

3/3/2022 • EN

Implicit Bayesian Inference in Large Language Models

Explores how Large Language Models perform implicit Bayesian inference through in-context learning, connecting exchangeable sequence models to prompt-based learning.

large language models Deep Learning In Context Learning

6/10/2021 • EN

Causal inference 4: Causal Diagrams, Markov Factorization, Structural Equation Models

Explores the relationship between causal and statistical models, focusing on causal diagrams, Markov factorization, and structural equation models.

statistics Causal Inference Markov Factorization

4/23/2021 • EN

On Information Theoretic Bounds for SGD

Explores how mutual information and KL divergence can be used to derive information-theoretic generalization bounds for Stochastic Gradient Descent (SGD).

Machine Learning Information Theory Kl Divergence

4/1/2021 • EN

Notes on the Origin of Implicit Regularization in SGD

Explores how Stochastic Gradient Descent (SGD) inherently prefers certain minima, leading to better generalization in deep learning, beyond classical theory.

Deep Learning Generalization Stochastic Gradient Descent

$An information maximization view on the $\beta$-VAE objective$

3/18/2021 • EN

An information maximization view on the $\beta$-VAE objective

A technical exploration of the β-VAE objective from an information maximization perspective, discussing its role in learning disentangled representations.

Deep Learning Generative Models Vae

11/20/2020 • EN

Some Intuition on the Neural Tangent Kernel

Explains the Neural Tangent Kernel concept through simple 1D regression examples to illustrate how neural networks evolve during training.

Machine Learning Gradient Descent Neural Networks

11/12/2020 • EN

Notes on Causally Correct Partial Models

Explains the concept of causally correct partial models for reinforcement learning in POMDPs, focusing on counterfactual policy evaluation.

Machine Learning Reinforcement Learning Causal Inference

11/14/2019 • EN

Meta-Learning Millions of Hyper-parameters using the Implicit Function Theorem

Explores a meta-learning method using the Implicit Function Theorem to efficiently optimize millions of hyperparameters via implicit differentiation.

Meta Learning Hyperparameter Optimization Implicit Differentiation

10/31/2019 • EN

The secular Bayesian: Using belief distributions without really believing

A data scientist's journey from dogmatic Bayesianism to a pragmatic, 'secular' use of Bayesian tools without requiring belief in the model's literal existence.

Machine Learning statistics Data Science