Question

What is the KV cache and why is it important?

Accepted Answer

The KV cache stores the key and value tensors computed for previous tokens during autoregressive generation. Without it, the model would recompute every previous token's keys and values for each new token, making generation quadratic in sequence length. With KV caching, generation cost per new token is approximately linear, which is essential for production inference performance.