Jay Mody • 2/8/2023

Speculative Sampling

This technical article provides an overview, implementation, and time complexity analysis of DeepMind's speculative sampling method for accelerating LLM decoding. It compares autoregressive sampling to the speculative approach, which uses a fast draft model to propose tokens and a slower target model to verify them, improving generation speed.

0 comments

#time complexity #Language Models #Speculative Sampling