Question

What is continuous batching?

Accepted Answer

Continuous batching, also called in-flight batching, is a serving technique where new requests can join the batch as soon as a slot opens, rather than waiting for the entire batch to finish. This dramatically improves throughput in production by keeping the GPU busy. vLLM, TensorRT-LLM, and most modern inference servers implement continuous batching.