What is prompt caching?
LLM Engineer Interview Questions: Inference Optimization, KV Cache, Speculative Decoding, Quantization
Audio flashcard · 0:18Nortren·
What is prompt caching?
0:18
Prompt caching reuses the KV cache from the prefix of a previous request when a new request shares the same prefix. This skips redundant prefill computation, dramatically reducing latency and cost for repeated system prompts or long contexts. OpenAI, Anthropic, and Google all support prompt caching in their APIs as of 2026.
---
docs.anthropic.com