John D. Cook • 4/18/2026

Gaussian distributed weights for LLMs

This article discusses NF4 and FP4 4-bit floating point formats used for quantizing large language model (LLM) weights, particularly in bitsandbytes and Hugging Face models. It explains why NF4 uses Gaussian-distributed values to better match the distribution of LLM parameters, unlike FP4 which uses evenly spaced values. The author critiques the QLoRA paper's description of NF4, highlighting ambiguities in the definition of quantile-based indexing and issues with representing zero. The article also mentions attempts to reproduce NF4 values from the paper's appendix, noting discrepancies.

0 comments

#llm #Numpy #Quantization