MemotivaLLM Engineer Interview Questions: Inference Optimization, KV Cache, Speculative Decoding, Quantization

What is the difference between prefill and decode in LLM inference?

LLM Engineer Interview Questions: Inference Optimization, KV Cache, Speculative Decoding, Quantization

Audio flashcard · 0:20

Nortren·

What is the difference between prefill and decode in LLM inference?

0:20

Prefill is the initial forward pass that processes all input tokens in parallel and builds the KV cache. Decode is the subsequent autoregressive generation, producing one token per forward pass. Prefill is compute-bound and benefits from parallelism, while decode is memory-bandwidth-bound and benefits from KV cache optimization. Production systems optimize them separately.
docs.vllm.ai