What is the difference between batch and real-time inference?
LLM Engineer Interview Questions: Choosing Between OpenAI, Anthropic, Open Source Models, and Self-Hosting
Audio flashcard · 0:21Nortren·
What is the difference between batch and real-time inference?
0:21
Real-time inference responds immediately to each request, prioritizing low latency. Batch inference processes many requests together later, prioritizing throughput and cost. OpenAI and Anthropic both offer batch APIs at roughly half the price of real-time, with results delivered within 24 hours. Use batch for bulk processing like data labeling, content generation, or evaluation.
platform.openai.com