Question

When should you fine-tune an embedding model versus use an off-the-shelf one?

Accepted Answer

Fine-tune when off-the-shelf models miss domain terminology, when you have at least a few thousand query-document pairs for training, and when retrieval quality is the bottleneck of your system. Domains like legal, medical, code, and scientific literature often benefit because pretraining corpora underweight these vocabularies. Use off-the-shelf models when you lack training data, when your domain matches general web text, or when iteration speed matters more than the last few percent of recall. Fine-tuning adds operational cost, since you must retrain when corpus distribution shifts.