What is the difference between weight-only and weight-and-activation quantization?
LLM Engineer Interview Questions: Inference Optimization, KV Cache, Speculative Decoding, Quantization
Audio flashcard · 0:19Nortren·
What is the difference between weight-only and weight-and-activation quantization?
0:19
Weight-only quantization compresses only the model weights, keeping activations in higher precision. It is simpler and preserves quality better. Weight-and-activation quantization compresses both, enabling more aggressive speedups but requiring careful calibration to avoid quality loss. Most production deployments start with weight-only quantization and add activation quantization if needed.
huggingface.co