Building an Extensible Security Scanner with 200+ Checks (Without Collapsing Under Its Own Weight)

The Scale Problem Every Security Scanner Hits

You start with five checks: XSS, SQL injection, HTTPS, maybe CORS. Clean code, easy to maintain. Then you add CSRF. Then clickjacking. Then CSP validation. By check 30, your codebase is a mess of if-statements and copy-pasted browser instances.

Security Scanner Architecture : A structured system design that separates orchestration (managing browser lifecycle), discovery (finding and registering checks), and execution (running individual security tests) to enable scaling from dozens to hundreds of checks without code duplication or tight coupling.

I hit this wall building vibe_eval. We needed comprehensive coverage: security, performance, accessibility, privacy, SEO, PWA compliance. That’s not 10 checks—it’s 200+. The naive approach would have been thousands of lines of duplicated code and a maintenance nightmare.

Here’s the architecture that actually scales.

The Three-Layer Model That Works

Most security scanners are flat: one big script that instantiates checks and runs them. This breaks down around 20 checks because there’s no separation of concerns.

The three-layer model separates responsibilities:

1
2
3
4
5
6
7
┌─────────────────────────────────────┐
│  Scanner Orchestrator (Operator)   │  ← Single browser, coordinates execution
├─────────────────────────────────────┤
│  Check Registry (Metaclass)        │  ← Auto-discovery, state management
├─────────────────────────────────────┤
│  Individual Checks (VibeCheck)     │  ← 200+ isolated implementations
└─────────────────────────────────────┘

Each layer has exactly one job. The Operator manages browser lifecycle. The Registry discovers and organizes checks. Individual checks implement specific tests. Zero overlap.

Metaclass-Based Registration : A Python pattern that uses metaclass hooks to automatically register subclasses at import time, eliminating manual registration boilerplate and enabling zero-configuration plugin architectures.

Layer 1: The Operator (Browser Orchestration)

The Operator manages what’s expensive: the browser instance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Operator:
    def __init__(self, url: str):
        self.url = self._normalize_url(url)
        self.checks = vibe_check.registry_init(url=self.url)  # 200+ checks

    def run_checks(self) -> dict[str, Any]:
        results: dict[str, Any] = {
            "url": self.url,
            "checks": {},
            "screenshot_base64": "",
        }

        with sync_playwright() as p:
            browser = p.chromium.launch(headless=True)
            page = browser.new_page()

            collected_requests: list[Any] = []
            page.on("requestfinished", lambda r: collected_requests.append(r))

            response = page.goto(self.url, timeout=30000)
            page.wait_for_timeout(2000)

            for check in self.checks:
                result = check.run(page, collected_requests)
                results["checks"][check.__class__.__name__] = result._asdict()

        return results

Why this matters:

Single browser instance: Starting a browser costs 2-3 seconds. Running 200 checks with separate browsers = 400+ seconds wasted.
Request collection: Network traffic captured once, shared across all checks. No duplicate monitoring.
Unified error handling: If one check crashes, the scan continues.

The Operator doesn’t know what checks do. It just feeds them a page and collects results.

Layer 2: Check Categories (Organization at Scale)

At 200+ checks, filesystem organization becomes critical:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
backend/labyrinth/checks/
├── security/              # 80+ checks
│   ├── csp_check.py
│   ├── api_key_exposure_check.py
│   └── hallucinated_dependencies_check.py
├── performance/           # 40+ checks
│   ├── brotli_compression_check.py
│   ├── cdn_check.py
│   └── js_minification_check.py
├── accessibility/         # 30+ checks
│   ├── aria_check.py
│   ├── mobile_friendly_check.py
│   └── color_contrast_check.py
├── privacy_compliance/    # 25+ checks
│   ├── cookie_consent_check.py
│   ├── data_anonymization_check.py
│   └── gdpr_compliance_check.py
├── seo/                   # 15+ checks
├── pwa/                   # 10+ checks
├── client_apis/           # Browser API checks
├── operations/            # DevOps checks
└── integrations/          # Third-party service checks

Why categories matter:

Discoverability: Developers know exactly where to add new checks
Filtering: Run only security checks, or only performance checks
Reporting: Group results by category in the UI
Team ownership: Different teams own different categories

Layer 3: The Base Check Interface

Every check inherits from VibeCheck, which provides the contract:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
class VibeCheck(_StateMixin, metaclass=VibeCheckMeta):
    SEVERITY = 8  # Default severity (1-10 scale)

    def __init__(self, url: str):
        self.url = url

    def check_page(self, page: Page) -> Discoveries:
        """Implement this for checks that only need page content"""
        pass

    def check_page_and_ctx(self, page: Page, collected_requests) -> Discoveries:
        """Implement this for checks that need network requests"""
        pass

    def run(self, page: Page, collected_requests=None) -> Discoveries:
        """Unified entry point"""
        if collected_requests is None:
            result = self.check_page(page)
        else:
            result = self.check_page_and_ctx(page, collected_requests)
        return self._ensure_discoveries(result)

The dual interface pattern:

Some checks only need page HTML:

1
2
3
4
class ARIACheck(VibeCheck):
    def check_page(self, page: Page) -> Discoveries:
        # Just analyze the DOM
        return Discoveries.ok("ARIA labels present")

Others need network traffic:

1
2
3
4
5
6
class BrotliCompressionCheck(VibeCheck):
    def check_page_and_ctx(self, page: Page, collected_requests) -> Discoveries:
        for req in collected_requests:
            if "br" in req.headers.get("content-encoding", ""):
                return Discoveries.ok("Brotli compression enabled")
        return Discoveries.fail("No Brotli compression detected")

This pattern eliminates boilerplate. Checks declare what they need, and the framework provides it.

State Management: The Three Flags

Every check has three state flags:

1
2
3
4
class _StateMixin:
    is_active = True         # Should this check run?
    is_premium = False       # Premium feature?
    is_on_maintenance = False  # Temporarily disabled?

Real-world usage:

1
2
3
4
5
# In the scanner
checks = [c for c in checks if c.is_active and not c.is_on_maintenance]

# For free tier users
checks = [c for c in checks if not c.is_premium]

This lets you:

A/B test new checks by setting is_premium = True
Disable flaky checks without deleting code
Maintenance mode for checks with external dependencies

At 200+ checks, you will have flaky checks. This system lets you disable them without blocking releases.

Example: A Simple Security Check

Here’s a complete check that detects missing HTTPS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from playwright.sync_api import Page
from backend.labyrinth.shape import VibeCheck, Discoveries

class HTTPSCheck(VibeCheck):
    """Verifies the site uses HTTPS"""

    SEVERITY = 9  # High severity - security critical
    is_premium = False  # Available to all users

    def check_page(self, page: Page) -> Discoveries:
        if page.url.startswith("https://"):
            return Discoveries.ok(
                "Site uses HTTPS",
                recommendations=[]
            )

        return Discoveries.fail(
            "Site does not use HTTPS - traffic is unencrypted",
            recommendations=[
                "Obtain an SSL/TLS certificate (free via Let's Encrypt)",
                "Configure your web server to use HTTPS",
                "Redirect all HTTP traffic to HTTPS",
                "Enable HSTS headers",
            ]
        )

That’s it. Drop this file in checks/security/ and it’s automatically:

Registered via metaclass
Available in registry_init()
Categorized as “Security”
Ready to run

No imports in __init__.py. No manual registration. No configuration files. The metaclass handles everything.

Example: A Complex Network-Aware Check

For checks that need network data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class ContentSecurityPolicyCheck(VibeCheck):
    """Checks for proper CSP headers"""

    SEVERITY = 8
    is_premium = True

    def check_page_and_ctx(self, page: Page, collected_requests) -> Discoveries:
        # Find the main document request
        for req in collected_requests:
            if req.url == page.url:
                csp = req.headers.get("content-security-policy", "")

                if not csp:
                    return Discoveries.fail(
                        "No Content-Security-Policy header found",
                        recommendations=[
                            "Add CSP header to prevent XSS attacks",
                            "Start with a restrictive policy and relax as needed",
                            "Use CSP reporting to monitor violations",
                        ]
                    )

                # Check for unsafe directives
                unsafe_patterns = ["'unsafe-eval'", "'unsafe-inline'"]
                found_unsafe = [p for p in unsafe_patterns if p in csp]

                if found_unsafe:
                    return Discoveries.fail(
                        f"CSP contains unsafe directives: {', '.join(found_unsafe)}",
                        recommendations=[
                            "Remove 'unsafe-inline' and 'unsafe-eval'",
                            "Use nonces or hashes for inline scripts",
                            "Refactor to avoid eval() usage",
                        ]
                    )

                return Discoveries.ok("CSP header present and secure")

        return Discoveries.ok("No CSP check needed")

The check doesn’t manage browser lifecycle or request collection. It just implements the logic.

Shared Utilities in the Base Class

Common patterns get extracted into base class helpers.

Path Probing

Many checks need to probe for exposed files:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class AdminExposureCheck(VibeCheck):
    def check_page(self, page: Page) -> Discoveries:
        admin_paths = ["/admin", "/administrator", "/wp-admin"]

        return self._check_paths_for_presence(
            page,
            admin_paths,
            success_info="No admin panels exposed",
            failure_template="Admin panel found at {path}",
            recommendations=[
                "Restrict admin access by IP",
                "Use non-standard admin URLs",
                "Implement rate limiting",
            ]
        )

The _check_paths_for_presence() helper handles:

Multiple path probing
Timeout management
Heuristic 404 detection
Result formatting

Heuristic 404 Detection

SPAs often return 200 for everything:

1
2
3
4
5
6
7
8
def heuristic_404(self, response) -> bool:
    """Detect 404s even when status code is 200"""
    if response.status == 404:
        return True

    body = response.text().lower()
    markers = ["404", "not found", "page not found"]
    return any(marker in body for marker in markers)

This prevents false positives on SPAs that render 404 pages with HTTP 200.

Scaling Performance: From Sequential to Parallel

Current (simple): Sequential execution

1
2
for check in self.checks:
    result = check.run(page, collected_requests)

Future (parallel): Dependency graph execution

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Checks declare dependencies
class CSPCheck(VibeCheck):
    depends_on = []  # No dependencies

class JSMinificationCheck(VibeCheck):
    depends_on = [ScriptLoadingCheck]  # Needs scripts first

# Execute in parallel where possible
async def run_parallel(checks_batch):
    tasks = [check.run_async(page, requests) for check in checks_batch]
    return await asyncio.gather(*tasks)

With parallelization, 200 checks drop from 5 minutes to under 40 seconds.

Check Severity and Scoring

Each check has a SEVERITY (1-10):

1
2
3
4
5
6
SEVERITY = 10  # Critical (SQL injection, RCE)
SEVERITY = 9   # High (XSS, auth bypass)
SEVERITY = 8   # Medium-High (CSP missing)
SEVERITY = 5   # Medium (performance issues)
SEVERITY = 3   # Low (SEO recommendations)
SEVERITY = 1   # Info (best practices)

The scanner aggregates these into an overall security score:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def calculate_security_score(results):
    total_severity = sum(
        check_result['severity']
        for check_result in results['checks'].values()
        if check_result['critical']
    )

    max_possible = len(results['checks']) * 10
    risk_score = (total_severity / max_possible) * 100

    # Invert for "security score" (higher is better)
    return 100 - risk_score

Production Learnings

1. Check Isolation is Critical

Bad check:

1
2
3
4
5
class BadCheck(VibeCheck):
    shared_state = []  # WRONG: Shared across instances!

    def check_page(self, page):
        self.shared_state.append(page.url)  # Data leak!

Good check:

1
2
3
4
5
6
7
class GoodCheck(VibeCheck):
    def __init__(self, url):
        super().__init__(url)
        self.instance_state = []  # Isolated per instance

    def check_page(self, page):
        self.instance_state.append(page.url)

2. Timeouts Everywhere

Every network operation needs a timeout:

1
response = self._safe_request(page, "/admin", timeout_ms=5000)

Without this, a slow server can hang the entire scan.

3. Graceful Degradation

If one check crashes, the scan continues:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
for check in self.checks:
    try:
        result = check.run(page, collected_requests)
        results["checks"][name] = result._asdict()
    except Exception as e:
        logger.error(f"Check {name} failed: {e}")
        results["checks"][name] = {
            "info": f"Check failed with error: {str(e)}",
            "critical": False,
            "error": True,
        }

At scale, some checks will fail. The architecture must assume this and continue.

Add a New Security Check

Step-by-step guide to adding a new check to the scanner

Create the Check File

Create a new Python file in the appropriate category directory (e.g., backend/labyrinth/checks/security/new_check.py).

Choose the category based on what the check validates: security, performance, accessibility, privacy_compliance, seo, pwa, client_apis, operations, or integrations.

Inherit from VibeCheck

Define your check class inheriting from VibeCheck and set the SEVERITY level (1-10):

1
2
3
4
5
6
from playwright.sync_api import Page
from backend.labyrinth.shape import VibeCheck, Discoveries

class YourCheck(VibeCheck):
    SEVERITY = 8
    is_premium = False

Higher severity (8-10) for security-critical issues, lower (1-5) for best practices and optimizations.

Implement the Check Logic

Implement either check_page() if you only need DOM access, or check_page_and_ctx() if you need network requests:

1
2
3
4
5
6
7
8
def check_page(self, page: Page) -> Discoveries:
    # Your check logic here
    if condition_passes:
        return Discoveries.ok("Check passed")
    return Discoveries.fail(
        "Check failed: reason",
        recommendations=["Fix 1", "Fix 2"]
    )

The framework automatically registers and executes your check.

Test the Check

Write unit tests in tests/unit/test_your_check.py:

1
2
3
4
def test_your_check_passes(mock_page):
    check = YourCheck("https://example.com")
    result = check.check_page(mock_page)
    assert not result.critical

Test both passing and failing scenarios to ensure the check behaves correctly.

Run the Scanner

Run the full scanner to verify your check integrates correctly:

1
2
3
operator = Operator("https://test-site.com")
results = operator.run_checks()
assert "YourCheck" in results["checks"]

The check should appear in results automatically without any manual registration.

Scaling: Performance Evolution

Checks	Sequential Time	Parallel Time (Future)
10	15s	5s
50	75s	15s
100	150s	25s
200+	300s (5min)	40s

Memory footprint stays constant because:

Each check instance: ~1KB (mostly references)
200 checks: ~200KB
Single browser: ~50MB (Playwright)
Total: ~50MB (browser dominates)

When Architecture Matters vs When It Doesn’t

You need this architecture if:

You’re building 20+ checks
Multiple developers add checks concurrently
You need check filtering (premium, maintenance, categories)
Scan performance matters (shared browser instance)

You don’t need this architecture if:

You have < 10 checks
Single developer owns all checks
No filtering or state management needed
One-off security audit tool

The three-layer model has overhead. Don’t pay for it unless you’re scaling.

FAQ

Why use metaclasses instead of manual registration?

Metaclasses eliminate boilerplate and make adding checks frictionless. Without metaclasses, every new check requires updating a central registry file, which creates merge conflicts and slows development. With metaclass auto-registration, dropping a file in the checks directory is enough—zero configuration, zero imports.

How do you handle flaky checks at scale?

Use the is_on_maintenance flag to disable problematic checks without deleting code. Add retry logic with exponential backoff for network-dependent checks. Implement check timeouts to prevent hangs. Most importantly, ensure graceful degradation—one failing check should never crash the entire scan.

Why not use existing security scanners like OWASP ZAP?

Existing scanners are great for general-purpose scanning but difficult to extend with custom checks specific to your stack. Building your own scanner gives you complete control over check logic, filtering, and integration with your deployment pipeline. Use existing scanners for broad coverage, build custom scanners for stack-specific validation.

How do you prevent duplicate work across checks?

Collect network requests once and share them across all checks. Use the dual interface pattern (check_page vs check_page_and_ctx) so checks declare what they need. Cache expensive computations in the Operator layer. Future versions will implement check dependencies to eliminate redundant work.

What's the actual performance impact of running 200+ checks?

Sequential execution takes about 5 minutes for 200 checks. Browser startup is 2-3s, page load is 1-3s, and each check adds 1-2s. Parallelizing check execution (coming in future versions) drops this to 40 seconds. The single browser instance saves 400+ seconds compared to per-check browser instantiation.

Conclusion

Key Takeaways

The three-layer model (Operator, Registry, Checks) separates orchestration, discovery, and execution for zero-duplication scaling
Metaclass-based auto-registration eliminates manual boilerplate—dropping a file in checks/ is enough to add new functionality
Single browser instance reuse saves 400+ seconds for 200 checks compared to per-check browser instantiation
Dual interface pattern (check_page vs check_page_and_ctx) lets checks declare minimal dependencies without framework overhead
State flags (is_active, is_premium, is_on_maintenance) enable A/B testing, tiering, and graceful degradation at scale
Category-based filesystem organization makes navigation and team ownership clear at 200+ checks
Shared utilities (path probing, heuristic 404 detection) prevent code duplication across similar checks
Graceful degradation is critical—one failing check must not crash the entire scan
Check isolation prevents data leaks—no shared state between check instances
Severity scoring (1-10 scale) aggregates into overall security scores for dashboards and reporting

The architecture you choose at 5 checks determines whether you can reach 500 checks. Invest in the three-layer model early, and adding check #200 is as easy as adding check #1.