MemotivaLLM Engineer Interview Questions: Fine-Tuning, LoRA, QLoRA, PEFT, and Instruction Tuning

How does LoRA reduce memory requirements?

LLM Engineer Interview Questions: Fine-Tuning, LoRA, QLoRA, PEFT, and Instruction Tuning

Audio flashcard · 0:20

Nortren·

How does LoRA reduce memory requirements?

0:20

LoRA reduces memory by freezing the base model so its gradients and optimizer states are not stored, then training only small adapter matrices. For a 7-billion-parameter model, full fine-tuning requires roughly 80 gigabytes of GPU memory, while LoRA can fit on a 16-gigabyte consumer GPU. The savings come mainly from not storing optimizer states for the frozen weights.
arxiv.org