Question

What is the difference between training and inference for LLMs?

Accepted Answer

Training involves forward and backward passes, gradient computation, and weight updates, processing large batches at once. Inference only does forward passes, one or a few sequences at a time, with autoregressive token-by-token generation. Inference is much cheaper per token but happens vastly more often, making inference optimization the main cost lever in production.