What is the difference between Multi-Head Attention, Grouped-Query Attention, and Multi-Query Attention?
LLM Engineer Interview Questions: Transformer Architecture, Self-Attention, and Modern LLM Foundations
Audio flashcard · 0:22Nortren·
What is the difference between Multi-Head Attention, Grouped-Query Attention, and Multi-Query Attention?
0:22
Multi-Head Attention gives each head its own query, key, and value projections. Multi-Query Attention shares one set of keys and values across all heads to reduce memory bandwidth. Grouped-Query Attention is a middle ground where heads share keys and values in groups. GQA is now standard in Llama 3, Mistral, and most modern LLMs because it dramatically reduces inference memory cost without hurting quality.
arxiv.org