Why It Matters for AI-Coded Apps
AI code generators frequently produce complex regular expressions for input validation that are vulnerable to ReDoS. LLMs optimize for correctness (matching the right strings) without considering computational complexity. A regex that correctly validates email addresses might also freeze the server when given a specially crafted input.
Real-World Example
An AI generates email validation: /^([a-zA-Z0-9]+\.)*[a-zA-Z0-9]+@([a-zA-Z0-9]+\.)*[a-zA-Z]{2,}$/. The nested quantifiers ()+ create catastrophic backtracking. Input aaaaaaaaaaaaaaaaaaaaaaaaaaa! takes exponential time to reject, freezing the event loop. A safe alternative: use a simple check or a dedicated validation library.
How to Detect and Prevent It
Avoid nested quantifiers in regex (e.g., (a+)+, (a|a)*). Use atomic groups or possessive quantifiers where supported. Set timeout limits on regex execution. Use libraries like re2 (linear-time regex engine) instead of backtracking engines. For common validations (email, URL), use dedicated validation libraries instead of regex.
Frequently Asked Questions
How do I identify vulnerable regex patterns?
(a+)+, (a|b)+, (a+b*)+. Use tools like recheck, redos-detector, or safe-regex to analyze your patterns. The general rule: if a regex has a quantifier inside a group that also has a quantifier, it may be vulnerable.