What is TTFT and why does it matter?
LLM Engineer Interview Questions: Inference Optimization, KV Cache, Speculative Decoding, Quantization
Audio flashcard · 0:18Nortren·
What is TTFT and why does it matter?
0:18
TTFT stands for Time To First Token, the latency from request to the first generated token. It matters because users perceive responsiveness through TTFT, not through total generation time. For chat applications, TTFT under one second feels instant. TTFT is dominated by the prefill phase, where the model processes the entire prompt before generating any output.
docs.vllm.ai