Question

How would you design a RAG system for 100 million documents?

Accepted Answer

Use a distributed vector database like Milvus, Qdrant Cloud at scale, or Pinecone with proper sharding and replicas. Partition by tenant, date, or document type to keep individual index sizes manageable. Choose HNSW with scalar quantization for memory efficiency, or DiskANN for disk-based scale. Implement a three-stage retrieval pipeline of BM25 or sparse first stage for recall, dense vector retrieval as second stage, and cross-encoder reranking for top precision. Add caching, streaming, and monitoring at every layer.