Question

What is the difference between weight-only and weight-and-activation quantization?

Accepted Answer

Weight-only quantization compresses only the model weights, keeping activations in higher precision. It is simpler and preserves quality better. Weight-and-activation quantization compresses both, enabling more aggressive speedups but requiring careful calibration to avoid quality loss. Most production deployments start with weight-only quantization and add activation quantization if needed.