Question

What is product quantization (PQ) in vector search?

Accepted Answer

Product Quantization, or PQ, compresses vectors by splitting each into M subvectors and encoding each subvector with a short code from a learned codebook, typically 8 bits per subvector. A 1024-dimensional float32 vector shrinks from 4096 bytes to around 64 bytes, roughly a 64x reduction. Distance computations use precomputed lookup tables over the codebooks, making search fast even with compressed storage. PQ trades some recall for massive memory savings, making billion-vector indexes fit in a single machine's RAM.