What is prompt caching and how does it save cost?
Prompt Engineering Patterns: Optimization, Versioning, A/B Testing, and Production Best Practices
Audio flashcard · 0:21Nortren·
What is prompt caching and how does it save cost?
0:21
Prompt caching reuses the KV cache from the prefix of a previous request when a new request shares the same prefix. This skips redundant prefill computation, dramatically reducing latency and cost for repeated system prompts or long contexts. OpenAI, Anthropic, and Google all support prompt caching in 2026, often offering 50 to 90 percent discounts on cached input tokens.
docs.anthropic.com