RAG & Vector DB Interview: Qdrant Collections, Payload, Quantization, Filtering, Sharding

RAG & Vector DB Interview: Qdrant Collections, Payload, Quantization, Filtering, Sharding

This section offers a detailed comparison of popular vector databases like Pinecone, Qdrant, Weaviate, and Milvus. Understanding the differences and strengths of these technologies is crucial for making informed decisions.

12 audio · 5:44

Nortren·

What is Qdrant and what is it used for?

0:30
Qdrant is an open-source vector database and similarity search engine written in Rust, designed for high-performance production workloads with rich filtering and payload support. It stores vectors with arbitrary JSON payload and indexes both for fast combined vector plus filter queries. Qdrant supports HNSW indexing, scalar, product, and binary quantization, on-disk storage, sharding, and replication. It runs as self-hosted open source, Qdrant Cloud managed service, or Qdrant Hybrid Cloud where you run compute in your own infrastructure with managed control plane.

What is a Qdrant collection and how is it structured?

0:28
A Qdrant collection is a named set of points where each point has a unique ID, one or more vectors, and an optional JSON payload. The collection has a fixed vector configuration including size, distance metric, and optional quantization settings that apply to every point. Collections are the main unit of sharding, replication, and access control. A single Qdrant cluster can hold many collections with different configurations, for example separate collections per tenant, per embedding model, or per data type like text, images, or audio.

What is a payload in Qdrant and how is it used?

0:30
A payload is arbitrary JSON data attached to each point in a Qdrant collection, used for metadata like document source, timestamps, tags, user identifiers, or any domain-specific fields. Payloads support filtering during vector search, so queries can restrict results to subsets like specific date ranges or tenants. Payload fields can be indexed individually for faster filtering, and Qdrant supports nested fields, arrays, geo coordinates, and full-text search within payload strings. Payloads replace the need for a separate metadata database in many architectures.

How does Qdrant handle filtering during vector search?

0:27
Qdrant integrates payload filtering directly into HNSW traversal, a technique called filterable HNSW. Instead of filtering before or after search, Qdrant considers the filter while walking the graph, maintaining recall even under highly selective filters. To make this efficient, payload fields used in filters should have payload indexes created on them, which Qdrant uses to check filter matches without loading full payloads. This design makes Qdrant particularly strong when queries combine semantic similarity with structured constraints.

What types of quantization does Qdrant support?

0:33
Qdrant supports three quantization methods: scalar, product, and binary. Scalar quantization compresses float32 dimensions to int8, giving 4x memory reduction with minimal recall loss and is the recommended default. Product quantization compresses groups of dimensions using learned codebooks, giving up to 64x reduction at higher recall cost. Binary quantization stores one bit per dimension, giving 32x reduction and enabling fast Hamming distance, best for high-dimensional normalized embeddings. All three support a rescoring step that uses full-precision vectors on top candidates.

How does Qdrant store vectors and indexes on disk?

0:29
Qdrant supports storing vectors, HNSW graphs, and payloads either fully in RAM, fully on disk with memory mapping, or in hybrid configurations. Memory-mapped storage relies on the operating system page cache, which keeps frequently accessed data in memory while letting cold data live on SSD. Each collection can be configured independently, allowing hot tenants to stay in RAM while cold tenants move to disk. This flexibility lets a single Qdrant node serve billions of vectors with modest memory, trading some query latency for dramatic cost reduction.

What is the difference between Qdrant and Qdrant Cloud?

0:29
Qdrant the open-source project is the core database you can self-host on any infrastructure for free, with all features except commercial support included. Qdrant Cloud is the managed SaaS offering run by Qdrant Solutions that handles deployment, scaling, backups, monitoring, and upgrades on AWS, GCP, or Azure, with pricing based on cluster size. Qdrant Hybrid Cloud lets you run compute in your own cloud account with the control plane managed by Qdrant, giving data-residency compliance with managed operations. All three use the same binary and API.

How does Qdrant handle sharding and replication?

0:28
Qdrant supports user-controlled sharding where a collection is split into shards distributed across cluster nodes, with each shard replicated for fault tolerance. The shard count is set at collection creation and determines write parallelism and horizontal scale. Replicas provide read scaling and high availability, with configurable consistency levels for writes. Qdrant uses Raft for cluster membership and metadata consensus, while vector data replicates between shard copies directly. Resharding is supported but requires a data movement operation.

How does Qdrant support multi-tenancy?

0:26
Qdrant supports multi-tenancy through two main patterns: one collection per tenant, which gives strong isolation at the cost of overhead for many small tenants, or a shared collection with tenant identifier in the payload plus filter on every query. For the shared pattern, Qdrant recommends a payload index on the tenant field and using group-by or shard-key sharding to colocate tenant data on specific shards. This reduces query latency because searches touch only shards containing the tenant's data.

What distance metrics does Qdrant support?

0:30
Qdrant supports four distance metrics: cosine similarity, dot product, Euclidean distance, and Manhattan distance. Cosine and dot product are the most common choices for text embeddings, with dot product being faster when vectors are pre-normalized since it skips the normalization step. The metric is set per vector configuration at collection creation. Qdrant also supports named vectors, letting a single collection store multiple vectors per point with different dimensions and metrics, useful for multimodal data combining text and image embeddings.

What is the Qdrant Web UI and what can you do with it?

0:27
The Qdrant Web UI is a built-in dashboard accessible through the Qdrant HTTP server at the dashboard path, available in both self-hosted and cloud deployments. It lets you browse collections, inspect points and payloads, run search queries, visualize vectors in reduced dimensions, manage snapshots, and monitor cluster health. The UI is useful during development and debugging without writing client code. For production operations, most teams prefer programmatic access through the REST or gRPC API with the Qdrant Python, JavaScript, or Rust clients.

How does Qdrant handle snapshots and backups?

0:27
Qdrant creates snapshots of collections or the entire storage, saving consistent point-in-time copies that can be downloaded and restored into any Qdrant instance. Snapshots are triggered through the API or scheduled externally, and can be stored locally or uploaded to object storage like S3. Restoring creates a new collection from the snapshot file, supporting migration between clusters or recovery from corruption. Qdrant Cloud handles automated backups as part of the managed service, with configurable retention and point-in-time recovery. ---