Question

What is QLoRA?

Accepted Answer

QLoRA combines quantization with LoRA. The base model is loaded in 4-bit precision instead of 16-bit, dramatically reducing memory, while LoRA adapters are trained in higher precision. QLoRA enables fine-tuning models with tens of billions of parameters on a single consumer GPU. It introduced techniques like NF4 quantization and double quantization to maintain quality.