RAG & Vector DB Interview: pgvector HNSW, IVFFlat, Index Tuning, Postgres RAG

RAG & Vector DB Interview: pgvector HNSW, IVFFlat, Index Tuning, Postgres RAG

This section offers a detailed comparison of popular vector databases like Pinecone, Qdrant, Weaviate, and Milvus. Understanding the differences and strengths of these technologies is crucial for making informed decisions.

12 audio · 5:54

Nortren·

What is pgvector and why is it popular?

0:33
pgvector is an open-source PostgreSQL extension that adds a vector data type and similarity search functions, turning any Postgres database into a vector database. It has exploded in popularity because it lets teams add vector search to existing Postgres deployments without operating a separate database, inheriting Postgres's mature tooling, replication, backups, transactions, and security. pgvector supports HNSW and IVFFlat indexes, cosine, Euclidean, and inner product distances, and integrates with major hosted Postgres services like Supabase, Neon, and AWS RDS.

What is the difference between HNSW and IVFFlat in pgvector?

0:28
HNSW is a graph-based index that delivers higher recall and faster queries than IVFFlat at the cost of higher build time and memory. IVFFlat partitions vectors into lists and searches only the nearest lists, with faster build time but lower recall and slower queries at the same recall target. HNSW is the recommended default since pgvector 0.5.0 for almost all workloads. IVFFlat remains useful when build time matters more than query speed or when memory is extremely constrained, since it uses less memory than HNSW.

How do you tune HNSW index parameters in pgvector?

0:30
pgvector HNSW has two build parameters, m controlling the maximum connections per node, default 16, and ef_construction controlling the candidate queue during build, default 64. At query time, hnsw.ef_search controls the search candidate queue, default 40, trading recall for latency. Higher m and ef_construction give better recall at higher build cost, and higher ef_search gives better recall at higher query latency. Tune ef_search per session based on recall requirements, typical production values range from 40 for fast responses to 200 for high recall.

How does pgvector handle vector dimensions and types?

0:30
pgvector supports three vector types: vector for dense float32, halfvec for float16 with half the memory, and bit for binary vectors. The dimension is declared in the column type like vector(1536) and is fixed per column. pgvector supports dimensions up to 2000 for HNSW indexes on vector types and up to 4000 for halfvec, with higher limits possible without indexing. Choose halfvec to halve memory when the accuracy loss from float16 is acceptable, and bit for binary-quantized vectors with Hamming distance.

What distance operators does pgvector support?

0:30
pgvector provides four distance operators: l2-distance for squared Euclidean with the arrow-arrow operator, inner product with the hash-arrow operator, cosine distance with the less-than-arrow operator, and L1 distance for Manhattan with the plus-arrow operator. Cosine distance equals one minus cosine similarity and is the standard for text embeddings. Inner product is faster when vectors are normalized but returns negative values for similar vectors, which requires sign flipping in queries. The operator must match the index's operator class, configured at index creation.

How do you combine pgvector similarity search with SQL filters?

0:28
pgvector queries use standard SQL WHERE clauses alongside similarity operators, letting you combine vector search with any structured filter like tenant ID, date range, or category. The query planner decides whether to use the vector index or apply filters first, which depends on filter selectivity and available indexes. For highly selective filters, creating a partial HNSW index on the filtered subset or an iterative plan can be faster. pgvector 0.7 and later include iterative index scans that handle selective filters more efficiently.

What is the pgvector limit on rows and performance at scale?

0:26
pgvector has no hard row limit, but practical scale depends on available memory and disk. HNSW indexes typically scale to 10 to 100 million vectors per Postgres instance with adequate RAM, beyond which dedicated vector databases often perform better due to specialized architecture. Performance degrades when the index exceeds memory and must spill to disk, so provisioning enough RAM for the index size is critical. Partitioning by tenant or time, plus scaling reads with replicas, extends pgvector's useful range.

How do you use pgvector with Supabase, Neon, or AWS RDS?

0:31
Supabase, Neon, and AWS RDS all offer pgvector pre-installed or installable with a single SQL command. On managed services, you enable the extension with CREATE EXTENSION vector, then create tables and indexes as with self-hosted Postgres. Supabase provides additional tooling around pgvector including automatic embedding generation and higher-level client libraries. AWS RDS and Neon treat pgvector as a standard extension and rely on Postgres's normal management plane. Connection pooling and read replicas apply normally to vector workloads on these platforms.

How do you do hybrid search with pgvector?

0:27
Hybrid search in pgvector combines vector similarity with PostgreSQL full-text search using tsvector and ts_rank. Create a tsvector column and GIN index alongside the vector column, run both a vector query and a full-text query, then combine scores either with weighted sum in SQL or with Reciprocal Rank Fusion. Postgres's Common Table Expressions make the RRF query readable. This gives the same hybrid-search benefits as purpose-built databases but requires manual query construction rather than a single API call.

What is the cost of HNSW index build in pgvector?

0:31
HNSW index build in pgvector is parallel and can take minutes to hours depending on dataset size, dimensions, and the m and ef_construction parameters. Building an index on 10 million 1536-dimensional vectors with default parameters might take 30 to 120 minutes on a modern server. pgvector 0.6 added parallel index builds that use multiple cores, significantly reducing build time. Builds are a one-time cost per index, but future schema changes or parameter tuning require rebuilds, so plan index changes during low-traffic windows.

When should you move from pgvector to a dedicated vector database?

0:30
Move to a dedicated vector database when dataset size exceeds what fits in Postgres memory comfortably, typically 50 to 100 million vectors, when query latency under filters becomes unacceptable, when you need features like learned sparse retrieval or GPU acceleration, or when vector workload consumes resources other Postgres queries need. Stay on pgvector when your data is smaller, when integration with relational data matters, or when operating one database matters more than the last 30 percent of performance available from specialized systems.

What is pgvectorscale and how does it extend pgvector?

0:30
pgvectorscale is an extension from Timescale that builds on top of pgvector to add the StreamingDiskANN index, a disk-optimized graph index based on Microsoft's DiskANN research. It enables pgvector to scale to hundreds of millions of vectors with low memory requirements by keeping the index on SSD, similar to dedicated vector databases. pgvectorscale also adds statistical binary quantization for further compression. It is designed for Postgres users who hit pgvector's scaling limits but want to stay in the Postgres ecosystem rather than adopt a separate vector database. ---