How does LoRA reduce memory requirements?
LLM Engineer Interview Questions: Fine-Tuning, LoRA, QLoRA, PEFT, and Instruction Tuning
Audio flashcard · 0:20Nortren·
How does LoRA reduce memory requirements?
0:20
LoRA reduces memory by freezing the base model so its gradients and optimizer states are not stored, then training only small adapter matrices. For a 7-billion-parameter model, full fine-tuning requires roughly 80 gigabytes of GPU memory, while LoRA can fit on a 16-gigabyte consumer GPU. The savings come mainly from not storing optimizer states for the frozen weights.
arxiv.org