Walk through the systematic process: first measure current retrieval with recall at k and precision at k on a real evaluation set to establish a baseline. Second, inspect failures manually to diagnose whether the issue is chunking, embedding, query mismatch, or lack of reranking. Third, experiment with specific interventions like hybrid search, a better embedding model, a cross-encoder reranker, query rewriting, or chunk strategy changes. Fourth, re-measure on the same evaluation set to validate improvements. Emphasize measurement over intuition at every step.