Question

What is the difference between prefill and decode in LLM inference?

Accepted Answer

Prefill is the initial forward pass that processes all input tokens in parallel and builds the KV cache. Decode is the subsequent autoregressive generation, producing one token per forward pass. Prefill is compute-bound and benefits from parallelism, while decode is memory-bandwidth-bound and benefits from KV cache optimization. Production systems optimize them separately.