What is the KV cache and why is it important?
LLM Engineer Interview Questions: Transformer Architecture, Self-Attention, and Modern LLM Foundations
Audio flashcard · 0:19Nortren·
What is the KV cache and why is it important?
0:19
The KV cache stores the key and value tensors computed for previous tokens during autoregressive generation. Without it, the model would recompute every previous token's keys and values for each new token, making generation quadratic in sequence length. With KV caching, generation cost per new token is approximately linear, which is essential for production inference performance.
huggingface.co