Question

What is scalar quantization and how does it differ from product quantization?

Accepted Answer

Scalar quantization compresses each dimension of a vector independently, typically from 32-bit float to 8-bit integer, giving a 4x memory reduction with minimal recall loss. Product quantization compresses groups of dimensions jointly using a learned codebook, achieving 16x to 64x compression but with higher recall loss and more complex training. Scalar quantization is simpler, requires no training, and is a safe default in Qdrant and Milvus. Use product quantization only when memory pressure is extreme and some recall sacrifice is acceptable.