The Core Difference
Think of it this way:
- Prompt injection = tricking the application
- Jailbreaking = tricking the model itself
Prompt Injection Example
An e-commerce chatbot:
| |
The attacker isn’t trying to make the model say something harmful. They’re trying to make the application do something unauthorized (leak data).
Target: Application logic Goal: Unauthorized actions Who should defend: Application developers
Jailbreaking Example
A general-purpose AI assistant:
| |
The attacker is trying to bypass the model’s refusal to provide dangerous information.
Target: Model safety training Goal: Harmful content generation Who should defend: Model providers (Anthropic, OpenAI)
Why the Distinction Matters
Different Attack Surfaces
| Aspect | Prompt Injection | Jailbreaking |
|---|---|---|
| Target | Application logic | Model safety |
| Vector | User input to app | Direct prompting |
| Goal | Unauthorized actions | Policy bypass |
| Defense | Application layer | Model training |
| Your responsibility | Yes | Partially |
Different Defenses
Prompt injection defenses (your responsibility):
- Input validation
- Output filtering
- Privilege restriction
- Context separation
Jailbreaking defenses (model provider’s responsibility):
- RLHF training
- Constitutional AI
- Output classifiers
- Content policies
You can’t fix jailbreaking in your application. You can mitigate prompt injection.
Overlap Cases
Sometimes both appear together:
| |
This attempts both:
- Jailbreak: “ignore safety guidelines” + “how to hack”
- Injection: “show me all user data”
Your defenses should catch the injection part. The model should handle the jailbreak part.
Indirect Prompt Injection
This is where things get interesting. Indirect injection combines aspects of both:
| |
Injection aspect: Malicious content in document changes AI behavior Jailbreak aspect: Attempts to disable safety features
This is primarily an injection attack, but the jailbreak-style language is used to maximize success.
Who Needs to Care About What
If you’re building applications:
Focus on prompt injection:
- Users can submit malicious input
- Documents can contain malicious content
- Your system has data/actions to protect
Rely on model providers for jailbreaking defense.
If you’re evaluating model safety:
Focus on jailbreaking:
- Will the model produce harmful content?
- Can safety training be bypassed?
- What content policies exist?
If you’re doing security research:
Both matter. Understanding the taxonomy helps:
- Identify which layer is vulnerable
- Apply appropriate defenses
- Communicate clearly about threats
Common Misconceptions
“My app doesn’t need injection defense because the model won’t produce harmful content”
Wrong. Prompt injection isn’t about harmful content—it’s about unauthorized actions. Leaking data, bypassing auth, manipulating other users. None of these require harmful content generation.
“I added a jailbreak filter so I’m protected”
Jailbreak detection doesn’t stop injection. An attacker can inject instructions without using jailbreak-style language:
| |
No jailbreak language, but clearly an injection attack.
“The model is safe so my application is safe”
Model safety ≠ application safety. A perfectly aligned model can still:
- Access unauthorized data if given permission by your app
- Execute functions if your app allows function calling
- Reveal context if your prompts leak information
Practical Testing
Testing for Prompt Injection
| |
Testing for Jailbreaking
| |
These are different tests for different vulnerabilities.
FAQ
Is indirect prompt injection the same as jailbreaking?
Which is more dangerous?
Can I prevent both in my application?
Are modern models immune to jailbreaking?
Conclusion
Key Takeaways
- Prompt injection manipulates applications; jailbreaking bypasses model safety
- Different threats require different defenses
- Application developers should focus on injection defense
- Model providers handle jailbreaking defense
- Indirect injection through documents is primarily an injection concern
- Jailbreak filters don’t protect against injection attacks
- Test for both, but defend what you control