How to Review AI-Generated Code for Security Issues

How to Review AI-Generated Code for Security Issues

The Review Mindset Shift

Security Code Review : A systematic examination of source code to identify security vulnerabilities, with focus on authentication, authorization, input validation, and data handling.

When reviewing human code, I look for mistakes. When reviewing AI code, I look for omissions.

AI doesn’t forget to close a database connection by accident. It systematically omits entire security concerns because they weren’t in the prompt. The auth system works, but there’s no rate limiting. The API returns data, but there’s no authorization check.

The Review Hierarchy

Review in this order. Each level assumes the previous one is solid.

Level 1: Authentication

Start here. If auth is broken, everything else is irrelevant.

Questions:

  • How are sessions/tokens created?
  • How are they validated on each request?
  • Where is the validation middleware?
  • What happens when validation fails?

Red flags:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// jwt.decode without verify
const user = jwt.decode(token);

// No expiration
const token = jwt.sign({ userId });

// Weak secret
const SECRET = 'secret';

// Token stored in localStorage
localStorage.setItem('token', response.token);

What good looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Strong verification
const user = jwt.verify(token, process.env.JWT_SECRET);

// Expiration set
const token = jwt.sign({ userId }, SECRET, { expiresIn: '1h' });

// httpOnly cookie storage
res.cookie('token', token, {
  httpOnly: true,
  secure: true,
  sameSite: 'strict'
});

Level 2: Authorization

Auth tells us WHO. Authz tells us WHAT they can do.

Questions:

  • Can user A access user B’s resources?
  • Are admin endpoints protected?
  • Is authorization checked server-side?
  • What happens when authorization fails?

Red flags:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Trusting client-provided userId
const documents = await db.documents.findMany({
  where: { userId: req.body.userId }  // Not req.user.id!
});

// Role check only on frontend
{user.isAdmin && <AdminPanel />}  // No backend check

// Missing ownership verification
app.delete('/api/posts/:id', async (req, res) => {
  await db.posts.delete({ where: { id: req.params.id } });
});

What good looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
app.delete('/api/posts/:id', async (req, res) => {
  const post = await db.posts.findUnique({
    where: { id: req.params.id }
  });

  if (!post) {
    return res.status(404).json({ error: 'Not found' });
  }

  if (post.authorId !== req.user.id && !req.user.isAdmin) {
    return res.status(403).json({ error: 'Forbidden' });
  }

  await db.posts.delete({ where: { id: req.params.id } });
  res.status(204).send();
});

Level 3: Input Validation

Every input path is an attack vector.

Questions:

  • Is there server-side validation?
  • Are validation errors informative without being exploitable?
  • Is validation consistent across endpoints?
  • What types are actually enforced?

Review checklist:

  • API request bodies validated
  • Query parameters validated
  • URL parameters validated
  • File uploads validated (type, size, content)
  • Headers validated where used

Red flags:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// No validation
const { email, password } = req.body;
await createUser(email, password);

// Client-only validation
<input type="email" required />  // Server doesn't validate

// Type coercion issues
const id = req.params.id;  // String, not number
await db.users.findUnique({ where: { id } });  // May work, may not

Level 4: Data Handling

How sensitive data flows through the system.

Questions:

  • Are passwords hashed before storage?
  • Is sensitive data logged?
  • What’s returned in API responses?
  • How are files stored?

Red flags:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Password in logs
console.log('Login attempt:', { email, password });

// Full user object returned
res.json({ user });  // Includes password hash?

// Sensitive data in error messages
catch (error) {
  res.status(500).json({ error: error.message });  // May leak internals
}

Level 5: Third-Party Integration

AI loves adding libraries. Each one is attack surface.

Questions:

  • Are API keys in environment variables?
  • Are webhooks verified?
  • Is data validated before passing to external services?
  • What happens when services fail?

Red flags:

1
2
3
4
5
6
7
// Hardcoded key
const stripe = new Stripe('sk_live_xxx');

// Unverified webhook
app.post('/webhook', (req, res) => {
  processEvent(req.body);  // Anyone can send fake events
});

The Review Process

AI Code Security Review

Systematic review process for AI-generated code

Run Automated Scans First

Let tools catch the obvious stuff:

1
2
3
gitleaks detect --source .
semgrep --config auto .
npm audit

This surfaces hardcoded secrets, known vulnerable patterns, and dependency issues.

Trace Auth Flow

Start from login, follow the token through to a protected endpoint. Document every step. Verify each step actually checks what it claims to check.

Test Authorization Manually

Create two user accounts. Try to access user 1’s data as user 2. Test every endpoint that handles user-specific data. Document which ones fail open.

Review Input Boundaries

Find every place user input enters the system:

  • API endpoints
  • Form handlers
  • File uploads
  • URL parameters
  • Headers

Verify each has server-side validation.

Check Error Handling

Make things fail intentionally:

  • Invalid inputs
  • Missing resources
  • Service timeouts

Verify error responses don’t leak information.

Document and Report

For each finding:

  • Location (file, line)
  • Severity (Critical/High/Medium/Low)
  • Impact (what an attacker could do)
  • Remediation (how to fix)
  • Verification (how to confirm the fix)

Time Boxing

You can’t review everything. Prioritize:

Time AvailableFocus
30 minutesAuth + authz for main flows
2 hoursAll auth paths + input validation
Half dayFull review including dependencies
Full dayFull review + manual testing

If you only have 30 minutes, spend it on authentication and authorization. Those are the vulnerabilities that lead to data breaches.

Review Notes Template

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Security Review: [Project Name]
Date: [Date]
Reviewer: [Name]
Commit: [Hash]

### Summary
[High-level findings]

### Critical Issues
1. [Issue]: [Location]
   - Impact: [What attacker can do]
   - Fix: [How to fix]

### High Issues
[...]

### Recommendations
- [Systemic improvements]

### Out of Scope
[What wasn't reviewed]

FAQ

How long should an AI code review take?

Depends on scope. Auth-focused review: 1-2 hours. Full security review: 4-8 hours. Plan for 2x the time of a human code review—AI generates more code and has more systematic issues.

Should I review all generated code?

No. Focus on code that handles: authentication, authorization, payments, sensitive data, file operations. Skip pure UI components unless they handle user input.

Can AI tools help with security review?

Somewhat. Use AI to explain unfamiliar code patterns. Don’t use AI to determine if something is secure—it has blind spots for the same patterns it creates.

How do I verify fixes without creating new vulnerabilities?

Test the specific vulnerability manually before and after the fix. Run automated scans after fixes to catch regressions. Have a second reviewer check significant security changes.

Conclusion

Key Takeaways

  • AI code reviews focus on omissions, not mistakes
  • Review in order: auth, authz, input validation, data handling, integrations
  • Run automated scans first to catch obvious issues
  • Manually test authorization with multiple user accounts
  • Time-box reviews and prioritize auth flows
  • Document findings with severity, impact, and remediation
  • Verify fixes don’t introduce new issues

AI Coding Security Insights.
Ship Vibe-Coded Apps Safely.

Effortlessly test and evaluate web application security using Vibe Eval agents.