Question

What is a content filter and how does it work?

Accepted Answer

A content filter is a classifier that examines text for unsafe categories like violence, hate, sexual content, or self-harm. It can run on inputs before they reach the LLM and on outputs before they reach users. Major LLM providers ship built-in content filters, but production systems often add custom filters for domain-specific concerns.