DeepSeek’s Multi-Head Latent Attention
A technical deep dive into DeepSeek's Multi-Head Latent Attention mechanism, covering its mathematics and implementation in Julia.
Lior Sinai is a software developer and writer exploring coding, mathematics, and machine learning through hands-on experiments and clear explanations. His blog covers algorithms, Julia, Python, C++, and intuitive approaches to complex mathematical problems.
10 articles from this blog
A technical deep dive into DeepSeek's Multi-Head Latent Attention mechanism, covering its mathematics and implementation in Julia.
Explains modifications to the Martinez-Rueda polygon clipping algorithm for boolean operations, addressing ordering issues in complex scenarios.
Part 5 of a series on building an automatic differentiation package in Julia, demonstrating its use to create and train a multi-layer perceptron on the moons dataset.
Extends a Julia automatic differentiation library (MicroGrad.jl) to handle map, getfield, and anonymous functions, enabling gradient descent for polynomial fitting.
Explores using IRTools.jl for robust automatic differentiation in Julia, focusing on metaprogramming to generate forward and backward passes.
Explores automating automatic differentiation in Julia using metaprogramming and expression-based approaches to generate forward and backward passes.
An introduction to building a minimal automatic differentiation package in Julia, focusing on explicit chain rules and the Julia AD ecosystem.
Analyzing the probability of covering all birthdays in a group and the expected number of people needed, framed as the Coupon Collector's Problem.
A tutorial on building a generative transformer model from scratch in Julia, trained on Shakespeare to create GPT-like text.
A guide to implementing a radix tree (compressed trie) data structure in Julia using Test Driven Development (TDD).