Prompt Engineering Patterns: Prompt Injection, Jailbreaks, and Defensive Prompting Techniques

Question 1

What is prompt injection?

Accepted Answer

Prompt injection is an attack where user input contains instructions that trick the model into ignoring its system prompt, revealing sensitive information, or performing unintended actions. It works because LLMs do not natively distinguish between developer instructions and user content; everything is just tokens. Prompt injection is the OWASP number one risk for LLM applications.

Question 2

What is the difference between direct and indirect prompt injection?

Accepted Answer

Direct prompt injection is when the attacker types malicious instructions directly into the input. Indirect prompt injection is when malicious instructions are hidden in content the LLM retrieves or processes, like a webpage, document, or email. Indirect injection is more dangerous because users may not even realize they triggered an attack.

Question 3

What is a jailbreak prompt?

Accepted Answer

A jailbreak prompt is a specially crafted input designed to make the model bypass its safety training and produce content it normally refuses. Common techniques include role-playing scenarios, hypothetical framing, encoded instructions, and persona adoption. Jailbreaks evolve as fast as defenses, and no LLM is fully resistant.

Question 4

How do you defend against prompt injection?

Accepted Answer

Defenses include separating system from user input clearly with delimiters, validating outputs against expected formats, using structured output, running content filters on inputs and responses, sandboxing tool execution, limiting model permissions, treating all retrieved content as untrusted, and monitoring for unusual behavior. There is no perfect defense; use defense in depth.

Question 5

What is the dual LLM pattern for security?

Accepted Answer

The dual LLM pattern uses two separate language models for security. A privileged LLM has access to tools and sensitive operations but never sees untrusted input. A quarantined LLM processes untrusted content but has no tool access. They communicate through structured handoffs. This prevents most prompt injection attacks because untrusted text never reaches the model that can take action.

Question 6

How do you sanitize user input for LLM prompts?

Accepted Answer

Sanitize by escaping or removing characters that could be interpreted as instructions, wrapping user content in clear delimiters, treating retrieved documents as untrusted data not instructions, and limiting input length. Unlike SQL injection, there is no perfect escape mechanism for natural language, so combine sanitization with other defenses.

Question 7

What is content filtering and how does it complement prompt engineering?

Accepted Answer

Content filters are classifiers that examine text for unsafe categories like violence, hate, sexual content, or self-harm. They run before prompts reach the LLM and after responses are generated. Filters catch attacks and policy violations that prompt engineering alone cannot prevent. Major LLM providers ship built-in filters, but production systems often add custom filters for domain-specific concerns.

Question 8

What are guardrails in prompt engineering?

Accepted Answer

Guardrails are mechanisms that constrain LLM behavior to prevent unsafe, off-topic, or policy-violating outputs. They include input filters that block malicious prompts, output filters that catch bad responses, classifiers that detect PII, structured output enforcement, and topic restrictions. Guardrails sit alongside prompt engineering as a separate defense layer.

---