Question

How does LoRA reduce memory requirements?

Accepted Answer

LoRA reduces memory by freezing the base model so its gradients and optimizer states are not stored, then training only small adapter matrices. For a 7-billion-parameter model, full fine-tuning requires roughly 80 gigabytes of GPU memory, while LoRA can fit on a 16-gigabyte consumer GPU. The savings come mainly from not storing optimizer states for the frozen weights.