feat(engine,ui): unify detection in rules engine, add function rules & per-script matches; improve scripts table UX

Core changes - Centralize detection in the Rules Engine; browser.py now focuses on fetch/extract/persist. - Add class-based adapters: - FactAdapter: converts snippets → structured facts. - FunctionRuleAdapter: wraps dict-based rule functions for engine input (str or dict). - Register function rules (code-based) alongside YAML rules: - form_action_missing - form_http_on_https_page - form_submits_to_different_host - script_src_uses_data_or_blob - script_src_has_dangerous_extension - script_third_party_host Rules & YAML - Expand/normalize YAML rules with severities + tags; tighten patterns. - Add new regex rules: new_function_usage, unescape_usage, string_timer_usage, long_hex_constants. - Move iframe rule to `text` category. - Keep existing script/form/text rules; all compile under IGNORECASE. Browser / analysis refactor - browser.py: - Remove inline heuristics; rely on engine for PASS/FAIL, reason, severity, tags. - Build page-level overview (`rule_checks`) across categories. - Analyze forms: add `base_url` + `base_hostname` to snippet so function rules can evaluate; include per-form rule_checks. - Analyze scripts: **per-script evaluation**: - Inline -> run regex script rules on inline text. - External -> run function script rules with a facts dict (src/src_hostname/base_url/base_hostname). - Only include scripts that matched ≥1 rule; attach severity/tags to matches. - Persist single source of truth: `/data/<uuid>/results.json`. - Backward-compat: `fetch_page_artifacts(..., engine=...)` kwarg accepted/ignored. UI/UX - Suspicious Scripts table now shows only matched scripts. - Add severity badges and tag chips; tooltips show rule description. - Prevent table blowouts: - Fixed layout + ellipsis + wrapping helpers (`.scripts-table`, `.breakable`, `details pre.code`). - Shortened inline snippet preview (configurable). - Minor template niceties (e.g., rel="noopener" on external links where applicable). Config - Add `ui.snippet_preview_len` to settings.yaml; default 160. - Load into `app.config["SNIPPET_PREVIEW_LEN"]` and use in `analyze_scripts`. Init / wiring - Import and register function rules as `Rule(...)` objects (not dicts). - Hook Rules Engine to Flask logger for verbose/diagnostic output. - Log totals on startup; keep YAML path override via `SNEAKYSCOPE_RULES_FILE`. Bug fixes - Fix boot crash: pass `Rule` instances to `engine.add_rule()` instead of dicts. - Fix “N/A” in scripts table by actually computing per-script matches. - Ensure form rules fire by including `base_url`/`base_hostname` in form snippets. Roadmap - Update roadmap to reflect completed items: - “Show each check and whether it triggered (pass/fail list per rule)” - Severity levels + tags in Suspicious Scripts - Results.json as route source of truth - Scripts table UX (badges, tooltips, layout fix)
2025-08-20 21:33:30 -05:00
parent 70d29f9f95
commit 1eb2a52f17
14 changed files with 1108 additions and 423 deletions
--- a/app/init.py
+++ b/app/init.py
@@ -1,16 +1,3 @@
 """
 app/__init__.py
 Application factory and startup hooks for SneakyScope.
 Responsibilities:
 - Create the Flask app.
 - Load settings (YAML -> dataclasses) with safe defaults.
 - Initialize and load the Suspicious Rules Engine from YAML.
 - Register blueprints (routes).
 - Configure core paths (e.g., SANDBOX_STORAGE).
 """
 import os
 import logging
 from pathlib import Path
@@ -18,11 +5,24 @@ from flask import Flask
 # Local imports
 from .utils.settings import get_settings
-from .utils import io_helpers  # if you need logging/setup later
+from .utils.rules_engine import RuleEngine, load_rules_from_yaml, Rule
-from .utils import cache_db     # available for future injections
+
-from .utils.rules_engine import RuleEngine, load_rules_from_yaml  # rules engine
+# our code based rules
 from .rules.function_rules import (
    FactAdapter,
    FunctionRuleAdapter,
    script_src_uses_data_or_blob,
    script_src_has_dangerous_extension,
    script_third_party_host,
    form_submits_to_different_host,
    form_http_on_https_page,
    form_action_missing,
 )
 from . import routes  # blueprint
 # from .utils import io_helpers  # if need logging/setup later
 # from .utils import cache_db     # available for future injections
 def create_app() -> Flask:
    """
@@ -37,46 +37,110 @@ def create_app() -> Flask:
    # Load settings (safe fallback to defaults if file missing)
    settings = get_settings()
-    # Secret key loaded from env
+    # Secret key loaded from env (warn if missing)
    app.secret_key = os.getenv("SECRET_KEY")
    if not app.secret_key:
        app.logger.warning("[init] SECRET_KEY is not set; sessions may be insecure in production.")
    # Configure storage directory (bind-mount is still handled by sandbox.sh)
    sandbox_storage_default = Path("/data")
    app.config["SANDBOX_STORAGE"] = str(sandbox_storage_default)
-    # Initialize Suspicious Rules Engine at startup
+    # ---------------------------
-    # Determine rules file path relative to this package
+    # Suspicious Rules Engine
-    base_dir = Path(__file__).resolve().parent
+    # ---------------------------
    rules_path = base_dir / "config" / "suspicious_rules.yaml"
-    # Create an engine instance (even if file missing, we still want an engine)
+    # Determine rules file path relative to this package (allow env override)
-    engine = RuleEngine()
+    base_dir = Path(__file__).resolve().parent
    default_rules_path = base_dir / "config" / "suspicious_rules.yaml"
    rules_path_str = os.getenv("SNEAKYSCOPE_RULES_FILE", str(default_rules_path))
    rules_path = Path(rules_path_str)
    # Create engine bound to Flask logger so all verbose/debug goes to app.logger
    engine = RuleEngine(rules=[], logger=app.logger)
    # Try to load from YAML if present; log clearly if not
    if rules_path.exists():
        try:
-            loaded_rules = load_rules_from_yaml(rules_path)
+            loaded_rules = load_rules_from_yaml(rules_path, logger=app.logger)
-            # Add rules one-by-one (explicit)
+            # Add rules one-by-one (explicit, clearer logs if any rule fails to compile)
-            for rule in loaded_rules:
+            index = 0
-                engine.add_rule(rule)
+            total = len(loaded_rules)
-            app.logger.info(f"[+] Loaded {len(loaded_rules)} suspicious rules from {rules_path}")
+            while index < total:
                engine.add_rule(loaded_rules[index])
                index = index + 1
            app.logger.info(f"[init] Loaded {len(loaded_rules)} suspicious rules from {rules_path}")
        except Exception as e:
-            app.logger.warning(f"[!] Failed loading rules from {rules_path}: {e}")
+            app.logger.warning(f"[init] Failed loading rules from {rules_path}: {e}")
    else:
-        app.logger.warning(f"[!] Rules file not found at {rules_path}. Engine will start with zero rules.")
+        app.logger.warning(f"[init] Rules file not found at {rules_path}. Engine will start with zero rules.")
-    # Store engine on app config so it is accessible via current_app
+    # Built-in function-based rules
    adapter = FactAdapter(logger=app.logger)
    engine.add_rule(Rule(
        name="form_action_missing",
        description="Form has no action attribute",
        category="form",
        rule_type="function",
        function=FunctionRuleAdapter(form_action_missing, category="form", adapter=adapter),
    ))
    engine.add_rule(Rule(
        name="form_http_on_https_page",
        description="Form submits via HTTP from HTTPS page",
        category="form",
        rule_type="function",
        function=FunctionRuleAdapter(form_http_on_https_page, category="form", adapter=adapter),
    ))
    engine.add_rule(Rule(
        name="form_submits_to_different_host",
        description="Form submits to a different host",
        category="form",
        rule_type="function",
        function=FunctionRuleAdapter(form_submits_to_different_host, category="form", adapter=adapter),
    ))
    # Script rules expect dict 'facts' (you’ll wire per-script facts later)
    engine.add_rule(Rule(
        name="script_src_uses_data_or_blob",
        description="Script src uses data:/blob: URL",
        category="script",
        rule_type="function",
        function=FunctionRuleAdapter(script_src_uses_data_or_blob, category="script", adapter=adapter),
    ))
    engine.add_rule(Rule(
        name="script_src_has_dangerous_extension",
        description="External script with dangerous extension",
        category="script",
        rule_type="function",
        function=FunctionRuleAdapter(script_src_has_dangerous_extension, category="script", adapter=adapter),
    ))
    engine.add_rule(Rule(
        name="script_third_party_host",
        description="Script is from a third-party host",
        category="script",
        rule_type="function",
        function=FunctionRuleAdapter(script_third_party_host, category="script", adapter=adapter),
    ))
    # Store engine both ways: attribute (convenient) and config
    app.rule_engine = engine
    app.config["RULE_ENGINE"] = engine
-    # Make app name/version available for templates here if you want it globally
+    # App metadata available to templates
    app.config["APP_NAME"] = settings.app.name
    app.config["APP_VERSION"] = f"v{settings.app.version_major}.{settings.app.version_minor}"
    # Register blueprints
    app.register_blueprint(routes.bp)
-    # Example log line so we know we booted cleanly
+    # Example log lines so we know we booted cleanly
    app.logger.info(f"SneakyScope started: {app.config['APP_NAME']} {app.config['APP_VERSION']}")
    app.logger.info(f"SANDBOX_STORAGE: {app.config['SANDBOX_STORAGE']}")
    app.logger.info(f"Registered {len(engine.rules)} total rules (YAML + function)")
    return app
--- a/app/browser.py
+++ b/app/browser.py
@@ -1,18 +1,46 @@
-import re
+"""
-import uuid
+app/browser.py
 import json
 from pathlib import Path
 from bs4 import BeautifulSoup
 from datetime import datetime
 from urllib.parse import urlparse
 from typing import Dict, Any, Optional
 from playwright.async_api import async_playwright, TimeoutError as PWTimeoutError
-from flask import current_app  # access the rule engine from app config
+Page fetcher + analysis orchestrator for SneakyScope.
 - Fetches a URL (HTML, redirects, etc.)
 - Runs the Suspicious Rules Engine (PASS/FAIL for all rules)
 - Writes artifacts (screenshot.png, source.txt, results.json) into /data/<run_uuid>/
 - Returns a single 'result' dict suitable for UI and future API
 Design notes:
 - Detection logic (regex/heuristics) lives in the rules engine (YAML/function rules).
 - This module keeps "plumbing" only (fetch, extract, persist).
 - Minimal non-detection heuristics remain here (e.g., skip benign script MIME types).
 Assumptions:
 - Flask app context is active (uses current_app for logger and RULE_ENGINE).
 - SANDBOX_STORAGE is configured (default: /data).
 - enrich_url(url) returns enrichment dict.
 """
 import json
 import uuid
 import re
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple
 from urllib.parse import urlparse
 from bs4 import BeautifulSoup
 from flask import current_app
 from playwright.async_api import async_playwright, TimeoutError as PWTimeoutError
 from app.utils.io_helpers import safe_write
 from .enrichment import enrich_url
 from .utils.settings import get_settings
 settings = get_settings()
 # ---------------------------------------------------------------------------
 # Engine access helpers
 # ---------------------------------------------------------------------------
 def get_rule_engine():
    """
    Retrieve the rules engine instance from the Flask application config.
@@ -21,96 +49,158 @@ def get_rule_engine():
        RuleEngine or None: The engine if available, or None if not configured.
    """
    try:
-        # current_app is only available during an active request context
+        return current_app.config.get("RULE_ENGINE")
        engine = current_app.config.get("RULE_ENGINE")
        return engine
    except Exception:
        # If called outside a Flask request context, fail gracefully
        return None
-def run_rule_checks(text, category):
+def _summarize_results(results: List[Dict[str, Any]]) -> Dict[str, int]:
    """
-    Run all rules for a given category against the provided text.
+    Summarize a list of engine rule result dicts (result = "PASS"|"FAIL").
    Args:
        text (str): The content to test (e.g., form snippet, inline JS).
        category (str): The rule category to run (e.g., 'form' or 'script').
    Returns:
-        dict: {
+        {'fail_count': int, 'total_rules': int}
-            "checks": [ { "rule": str, "category": str, "matched": bool, "reason": Optional[str] }, ... ],
+    """
-            "summary": { "matched_count": int, "total_rules": int }
+    summary = {"fail_count": 0, "total_rules": 0}
    index = 0
    total = len(results)
    while index < total:
        item = results[index]
        summary["total_rules"] = summary["total_rules"] + 1
        if str(item.get("result", "")).upper() == "FAIL":
            summary["fail_count"] = summary["fail_count"] + 1
        index = index + 1
    return summary
 def run_rule_checks(text: str, category: str) -> Dict[str, Any]:
    """
    Run all rules for a given category against provided text, returning a table-friendly model.
    Args:
        text: Text to analyze (HTML, snippet, etc.)
        category: One of 'form', 'script', 'text' (or any category your rules use)
    Returns:
        {
          "checks": [
             { "name": str, "description": str, "category": str,
               "result": "PASS"|"FAIL", "reason": Optional[str],
               "severity": Optional[str], "tags": Optional[List[str]] }, ...
          ],
          "summary": { "fail_count": int, "total_rules": int }
        }
    """
-    result = {
+    out: Dict[str, Any] = {"checks": [], "summary": {"fail_count": 0, "total_rules": 0}}
        "checks": [],
        "summary": {
            "matched_count": 0,
            "total_rules": 0
        }
    }
    engine = get_rule_engine()
    if engine is None:
-        # No engine configured; return empty but well-formed structure
+        return out
        return result
    try:
-        # Run engine rules for the specified category
+        engine_results = engine.run_all(text, category=category)  # list of dicts
-        check_results = engine.run_all(text, category=category)
+        # Normalize explicitly
-
+        index = 0
-        # Normalize results into the expected structure
+        total = len(engine_results)
-        total = 0
+        while index < total:
-        matched = 0
+            item = engine_results[index]
        for item in check_results:
            # item is expected to contain: rule, category, matched, reason (optional)
            total = total + 1
            if bool(item.get("matched")):
                matched = matched + 1
            normalized = {
-                "rule": item.get("rule"),
+                "name": item.get("name"),
                "description": item.get("description"),
                "category": item.get("category"),
-                "matched": bool(item.get("matched")),
+                "result": item.get("result"),        # "PASS" | "FAIL"
-                "reason": item.get("reason")
+                "reason": item.get("reason"),        # present on FAIL by engine design
                "severity": item.get("severity"),
                "tags": item.get("tags"),
            }
-            result["checks"].append(normalized)
+            out["checks"].append(normalized)
            index = index + 1
-        result["summary"]["matched_count"] = matched
+        out["summary"] = _summarize_results(out["checks"])
-        result["summary"]["total_rules"] = total
+    except Exception as exc:
-
+        # Preserve shape; record the error as a synthetic PASS (so UI doesn't break)
-    except Exception as e:
+        out["checks"].append({
-        # If anything goes wrong, keep structure and add a fake failure note
+            "name": "engine_error",
-        result["checks"].append({
+            "description": "Rule engine failed during evaluation",
            "rule": "engine_error",
            "category": category,
-            "matched": False,
+            "result": "PASS",
-            "reason": f"Rule engine error: {e}"
+            "reason": f"{exc}",
            "severity": None,
            "tags": None
        })
-        result["summary"]["matched_count"] = 0
+        out["summary"] = {"fail_count": 0, "total_rules": 1}
        result["summary"]["total_rules"] = 0
-    return result
+    return out
-def analyze_forms(html: str, base_url: str):
+def build_rule_checks_overview(full_html_text: str) -> List[Dict[str, Any]]:
    """
-    Parse forms from the page HTML and apply heuristic flags and rule-based checks.
+    Build a top-level overview for the results page: runs each category across
-
+    the entire HTML and groups results by category.
    Args:
        html (str): The full page HTML.
        base_url (str): The final URL of the page (used for hostname comparisons).
    Returns:
-        list[dict]: A list of form analysis dictionaries, each including:
+        [
-            - action, method, inputs
+          {"category": "script", "results": [ ...engine dicts... ], "summary": {...}},
-            - flagged (bool), flag_reasons (list[str]), status (str)
+          {"category": "form",   "results": [ ... ], "summary": {...}},
-            - rule_checks: dict with "checks" (list) and "summary" (dict)
+          {"category": "text",   "results": [ ... ], "summary": {...}},
        ]
    """
    overview: List[Dict[str, Any]] = []
    engine = get_rule_engine()
    categories = ["script", "form", "text"]
    index = 0
    total = len(categories)
    while index < total:
        cat = categories[index]
        block = {"category": cat, "results": [], "summary": {"fail_count": 0, "total_rules": 0}}
        if engine is not None:
            try:
                results = engine.run_all(full_html_text, category=cat)
                block["results"] = results
                block["summary"] = _summarize_results(results)
            except Exception as exc:
                block["results"] = [{
                    "name": "engine_error",
                    "description": "Rule engine failed during overview evaluation",
                    "category": cat,
                    "result": "PASS",
                    "reason": f"{exc}",
                    "severity": None,
                    "tags": None
                }]
                block["summary"] = {"fail_count": 0, "total_rules": 1}
        overview.append(block)
        index = index + 1
    return overview
 # ---------------------------------------------------------------------------
 # Form & Script analysis (plumbing only; detection is in the rules engine)
 # ---------------------------------------------------------------------------
 def analyze_forms(html: str, base_url: str) -> List[Dict[str, Any]]:
    """
    Parse forms from the page HTML and apply rule-based checks (engine), keeping
    only simple plumbing heuristics here (no security logic).
    Returns list of dicts with keys:
      - action, method, inputs
      - flagged (bool), flag_reasons (list[str]), status (str)
      - rule_checks: {'checks': [...], 'summary': {...}} (per-form snippet evaluation)
    Note:
      The 'flagged' value is now purely a legacy visual hint based on simple
      heuristics; the authoritative PASS/FAIL details are in rule_checks.
      As you migrate heuristics into function rules, this 'flagged' may be
      removed entirely.
    """
    soup = BeautifulSoup(html, "lxml")
-    forms_info = []
+    forms_info: List[Dict[str, Any]] = []
    page_hostname = urlparse(base_url).hostname
    for form in soup.find_all("form"):
@@ -118,40 +208,31 @@ def analyze_forms(html: str, base_url: str):
        method = form.get("method", "get").lower()
        # Build explicit inputs list
-        inputs = []
+        inputs: List[Dict[str, Any]] = []
        for inp in form.find_all("input"):
            input_name = inp.get("name")
            input_type = inp.get("type", "text")
-            inputs.append({
+            inputs.append({"name": input_name, "type": input_type})
                "name": input_name,
                "type": input_type
            })
-        flagged_reasons = []
+        # Minimal legacy flags (kept for UI continuity; detection lives in engine)
        flagged_reasons: List[str] = []
        # No action specified
        if not action or str(action).strip() == "":
            flagged_reasons.append("No action specified")
        # External host
        else:
            try:
                action_host = urlparse(action).hostname
                if not str(action).startswith("/") and action_host != page_hostname:
                    flagged_reasons.append("Submits to a different host")
            except Exception:
                # If hostname parsing fails, skip this condition quietly
                pass
        # HTTP form on HTTPS page
        try:
            if urlparse(action).scheme == "http" and urlparse(base_url).scheme == "https":
                flagged_reasons.append("Submits over insecure HTTP")
        except Exception:
            # If scheme parsing fails, ignore
            pass
        # Hidden password / suspicious hidden inputs
        for hidden in form.find_all("input", type="hidden"):
            name_value = hidden.get("name") or ""
            if "password" in name_value.lower():
@@ -159,15 +240,23 @@ def analyze_forms(html: str, base_url: str):
        flagged = bool(flagged_reasons)
-        # Serialize a simple form snippet for the rules engine to analyze (category='form')
+        # Serialize a simple form snippet for rule category='form'
        snippet_lines = []
        snippet_lines.append(f"base_url={base_url}")
        snippet_lines.append(f"base_hostname={page_hostname}")
        snippet_lines.append(f"action={action}")
        snippet_lines.append(f"method={method}")
        snippet_lines.append("inputs=")
-        for item in inputs:
+
        i = 0
        n = len(inputs)
        while i < n:
            item = inputs[i]
            snippet_lines.append(f"  - name={item.get('name')} type={item.get('type')}")
            i = i + 1
        form_snippet = "\n".join(snippet_lines)
        # Per-form rule checks (PASS/FAIL list via engine)
        rule_checks = run_rule_checks(form_snippet, category="form")
        forms_info.append({
@@ -183,156 +272,116 @@ def analyze_forms(html: str, base_url: str):
    return forms_info
-def analyze_scripts(html: str, base_url: str = "", engine=None) -> list[dict]:
+def analyze_scripts(html: str, base_url: str = "") -> List[Dict[str, Any]]:
    """
-    Analyze <script> elements using the RuleEngine (if provided) and
+    Collect script artifacts and evaluate per-script matches via the rules engine.
-    lightweight built-in heuristics. Only append a record when at least
+    Only include rows that matched at least one rule. Inline scripts are checked
-    one rule or heuristic matches, and always set a sensible 'type'.
+    against regex rules using their text; external scripts are checked against
    function rules using a small 'facts' dict (src/hosts).
    Returns list of dicts like:
      {
        "type": "external" | "inline" | "unknown",
        "src": "...",                 # for external
        "content_snippet": "...",     # for inline
-        "rules": [ { "name": "...", "description": "..." }, ... ],
+        "rules": [ { "name": "...", "description": "..." }, ... ]
        "heuristics": [ "reason1", "reason2", ... ]
      }
    """
    soup = BeautifulSoup(html, "lxml")
-    results: list[dict] = []
+    results: List[Dict[str, Any]] = []
-    import re
+    # Benign MIME types we ignore entirely (non-detection plumbing)
    from urllib.parse import urlparse
    # Benign MIME types we ignore entirely
    benign_types = {"application/ld+json", "application/json"}
-    # Suspicious file extensions for external scripts
+    engine = get_rule_engine()
-    dangerous_ext = (".vbs", ".hta")
+    base_hostname = urlparse(base_url).hostname or ""
    # Inline red flags
    risky_inline_patterns = [
        (re.compile(r"\beval\s*\(", re.IGNORECASE), "Uses eval()"),
        (re.compile(r"\bnew\s+Function\s*\(", re.IGNORECASE), "Uses Function constructor"),
        (re.compile(r"\bdocument\.write\s*\(", re.IGNORECASE), "Uses document.write()"),
        (re.compile(r"\bActiveXObject\s*\(", re.IGNORECASE), "Uses ActiveXObject (IE-only)"),
        (re.compile(r"\batob\s*\(", re.IGNORECASE), "Uses atob() (possible obfuscation)"),
        (re.compile(r"\bunescape\s*\(", re.IGNORECASE), "Uses unescape() (legacy/obfuscation)"),
        (re.compile(r"\bset(?:Timeout|Interval)\s*\(\s*['\"`].+['\"`]\s*,", re.IGNORECASE),
         "String passed to setTimeout/setInterval"),
        (re.compile(r"[\"']?0x[0-9a-fA-F]{16,}[\"']?", re.IGNORECASE),
         "Contains long hex-like constants (possible obfuscation)"),
    ]
    base_host = urlparse(base_url).hostname or ""
    for script in soup.find_all("script"):
        try:
            src = (script.get("src") or "").strip()
            s_type_attr = (script.get("type") or "").strip().lower()
            # IMPORTANT: .string is often None; get_text() is reliable
            inline_text = script.get_text(strip=True) or ""
-            # Skip benign structured data outright
+            # Skip benign structured data outright (noise control)
            if s_type_attr in benign_types:
                continue
-            # ---- Build facts for the rules engine
+            record: Dict[str, Any] = {}
            facts = {
                "script_type_attr": s_type_attr or None,
                "has_src": bool(src),
                "src": src or None,
                "attrs": dict(script.attrs),
                "inline_len": len(inline_text),
                "inline_preview": inline_text[:200].replace("\n", " ") if inline_text else None,
                "base_url": base_url or None,
                "base_hostname": base_host or None,
                "src_hostname": urlparse(src).hostname if src else None,
            }
            # ---- Evaluate rules engine (using name/description)
            engine_matches: list[dict] = []
            if engine is not None:
                try:
                    if hasattr(engine, "evaluate_script"):
                        matches = engine.evaluate_script(facts)
                    elif hasattr(engine, "evaluate"):
                        matches = engine.evaluate(facts)
                    else:
                        matches = []
                    if isinstance(matches, list):
                        for m in matches:
                            if isinstance(m, dict) and "name" in m:
                                engine_matches.append({
                                    "name": m["name"],
                                    "description": m.get("description", "")
                                })
                            elif isinstance(m, str):
                                engine_matches.append({"name": m, "description": ""})
                except Exception as e:
                    engine_matches.append({"name": "Rules Engine Error", "description": str(e)})
            # ---- Built-in heuristics
            heuristics: list[str] = []
            if src:
-                # Unusual URL schemes for script sources
+                record["type"] = "external"
-                if src.startswith(("data:", "blob:")):
+                record["src"] = src
-                    heuristics.append("Script src uses data:/blob: URL")
+            elif inline_text:
-                # Dangerous extensions
+                record["type"] = "inline"
-                for ext in dangerous_ext:
+                record["content_snippet"] = (inline_text[:settings.ui.snippet_preview_len]).replace("\n", " ")
                    if src.lower().endswith(ext):
                        heuristics.append(f"External script with dangerous extension ({ext.lstrip('.')})")
                        break
                # Third-party host hint
                src_host = facts.get("src_hostname") or ""
                if base_host and src_host and src_host != base_host:
                    heuristics.append(f"Third-party host: {src_host}")
            else:
                record["type"] = "unknown"
            # --- Per-script evaluation: gather matches from engine rules
            matches: List[Dict[str, str]] = []
            if engine is not None:
                # Inline content → run regex script rules against the text
                if inline_text:
-                    for pat, why in risky_inline_patterns:
+                    for r in engine.rules:
-                        if pat.search(inline_text):
+                        if getattr(r, "category", None) == "script" and getattr(r, "rule_type", None) == "regex":
-                            heuristics.append(why)
+                            ok, reason = r.run(inline_text)
-
+                            if ok:
-            # ---- Only append when something matched; always set type
+                                matches.append({
-            if engine_matches or heuristics:
+                                    "name": getattr(r, "name", "unknown_rule"),
-                record: dict = {}
+                                    "description": getattr(r, "description", "") or (reason or ""),
                                    "severity": getattr(r, "severity", None),
                                    "tags": getattr(r, "tags", None),
                                })
                # External src → run function script rules with facts
                if src:
-                    record["type"] = "external"
+                    facts = {
-                    record["src"] = src
+                        "src": src,
-                elif inline_text:
+                        "base_url": base_url,
-                    record["type"] = "inline"
+                        "base_hostname": base_hostname,
-                    record["content_snippet"] = facts.get("inline_preview")
+                        "src_hostname": urlparse(src).hostname or "",
-                else:
+                        "category": "script",
-                    record["type"] = "unknown"
+                    }
-
+                    for r in engine.rules:
-                if engine_matches:
+                        if getattr(r, "category", None) == "script" and getattr(r, "rule_type", None) == "function":
-                    record["rules"] = engine_matches
+                            ok, reason = r.run(facts)
-                if heuristics:
+                            if ok:
-                    record["heuristics"] = heuristics
+                                matches.append({
                                    "name": getattr(r, "name", "unknown_rule"),
                                    "description": (reason or "") or getattr(r, "description", ""),
                                    "severity": getattr(r, "severity", None),
                                    "tags": getattr(r, "tags", None),
                                })
            # Only keep rows that matched at least one rule
            if matches:
                record["rules"] = matches
                results.append(record)
-        except Exception as e:
+        except Exception as exc:
            # Never let a single broken <script> kill the whole analysis
            results.append({
                "type": "unknown",
-                "heuristics": [f"Script analysis error: {e}"]
+                "heuristics": [f"Script analysis error: {exc}"]
            })
    return results
-async def fetch_page_artifacts(url: str, storage_dir: Path, engine=None) -> Dict[str, Any]:
+
 # ---------------------------------------------------------------------------
 # Fetcher / Orchestrator
 # ---------------------------------------------------------------------------
 async def fetch_page_artifacts(url: str, storage_dir: Path) -> Dict[str, Any]:
    """
    Fetch page artifacts and save them in a UUID-based directory.
-    Args:
+    Writes:
-        url (str): URL to analyze.
+      - /data/<uuid>/screenshot.png
-        storage_dir (Path): Base /data path.
+      - /data/<uuid>/source.txt
-        engine: Optional rules engine instance (from app.config["RULE_ENGINE"]).
+      - /data/<uuid>/results.json  (single source of truth for routes)
    Returns:
        result dict with keys used by templates (and future API).
    """
    run_uuid = str(uuid.uuid4())
    run_dir = storage_dir / run_uuid
@@ -342,9 +391,9 @@ async def fetch_page_artifacts(url: str, storage_dir: Path, engine=None) -> Dict
    source_path = run_dir / "source.txt"
    results_path = run_dir / "results.json"
-    redirects = []
+    redirects: List[Dict[str, Any]] = []
-    downloads = []
+    downloads: List[Dict[str, Any]] = []
-    scripts = []
+    scripts_seen: List[str] = []
    async with async_playwright() as pw:
        browser = await pw.chromium.launch(
@@ -359,10 +408,30 @@ async def fetch_page_artifacts(url: str, storage_dir: Path, engine=None) -> Dict
        )
        page = await context.new_page()
-        # Event handlers
+        # Event handlers (plumbing)
-        page.on("response", lambda resp: redirects.append({"status": resp.status, "url": resp.url}) if 300 <= resp.status <= 399 else None)
+        def _on_response(resp):
-        page.on("download", lambda d: downloads.append({"url": d.url, "suggested_filename": d.suggested_filename}))
+            try:
-        page.on("request", lambda r: scripts.append(r.url) if r.url.endswith((".js", ".vbs", ".hta")) else None)
+                if 300 <= resp.status <= 399:
                    redirects.append({"status": resp.status, "url": resp.url})
            except Exception:
                pass
        def _on_download(d):
            try:
                downloads.append({"url": d.url, "suggested_filename": d.suggested_filename})
            except Exception:
                pass
        def _on_request(r):
            try:
                if r.url.endswith((".js", ".vbs", ".hta")):
                    scripts_seen.append(r.url)
            except Exception:
                pass
        page.on("response", _on_response)
        page.on("download", _on_download)
        page.on("request", _on_request)
        try:
            await page.goto(url, wait_until="networkidle", timeout=60000)
@@ -378,23 +447,65 @@ async def fetch_page_artifacts(url: str, storage_dir: Path, engine=None) -> Dict
        await context.close()
        await browser.close()
    # Read back saved source
    html_content = source_path.read_text(encoding="utf-8")
    forms_info = analyze_forms(html_content, final_url)
    suspicious_scripts = analyze_scripts(html_content, base_url=final_url, engine=engine)
    # Forms analysis (per-form rule checks)
    forms_info = analyze_forms(html_content, final_url)
    # Scripts artifacts (no detection here)
    suspicious_scripts = analyze_scripts(html_content, base_url=final_url)
    # Enrichment
    enrichment = enrich_url(url)
-    result = {
+    # Global PASS/FAIL table per category (entire document)
    rule_checks_overview = build_rule_checks_overview(html_content)
    for blk in rule_checks_overview:
        current_app.logger.debug(f"[rules] {blk['category']}: {blk['summary']}")
    # Assemble single result dict
    result: Dict[str, Any] = {
        "uuid": run_uuid,
        "submitted_url": url,
        "final_url": final_url,
        "redirects": redirects,
        "downloads": downloads,
-        "scripts": scripts,
+        "scripts": scripts_seen,
        "forms": forms_info,
        "suspicious_scripts": suspicious_scripts,
        "rule_checks": rule_checks_overview,  # table-ready for UI
        "enrichment": enrichment
    }
-    safe_write(results_path, json.dumps(result, indent=2))
+    # Persist as the single source of truth for routes
    safe_write(results_path, json.dumps(result, indent=2, ensure_ascii=False))
    try:
        current_app.logger.info(f"[browser] Saved results.json for run {run_uuid}")
    except Exception:
        pass
    return result
 def load_results(storage_dir: Path, run_uuid: str) -> Optional[Dict[str, Any]]:
    """
    Load a prior run's results.json from /data/<uuid>/.
    Returns:
        dict or None
    """
    run_dir = storage_dir / run_uuid
    results_path = run_dir / "results.json"
    if not results_path.exists():
        return None
    try:
        text = results_path.read_text(encoding="utf-8")
        data = json.loads(text)
        return data
    except Exception:
        return None
--- a/app/config/settings.yaml
+++ b/app/config/settings.yaml
@@ -7,3 +7,6 @@ cache:
  recent_runs_count: 10
  whois_cache_days: 7
  geoip_cache_days: 7
 ui:
  snippet_preview_len: 300
--- a/app/config/suspicious_rules.yaml
+++ b/app/config/suspicious_rules.yaml
@@ -1,80 +1,138 @@
 # config/suspicious_rules.yaml
 # Baseline suspicious rules for SneakyScope
 # Organized by category: script, form, text
-# Extend these with more specific rules as needed
+# Notes:
 # - Engine compiles regex with IGNORECASE.
 # - 'severity' is optional: low | medium | high
 # - 'tags' is optional: list of strings for grouping
 # --- Script Rules ---
 - name: eval_usage
  description: "Use of eval() in script"
  category: script
  type: regex
-  pattern: "\\beval\\("
+  pattern: '\beval\s*\('
  severity: high
  tags: [obfuscation, unsafe-eval]
 - name: new_function_usage
  description: "Use of Function constructor (new Function)"
  category: script
  type: regex
  pattern: '\bnew\s+Function\s*\('
  severity: high
  tags: [obfuscation]
 - name: document_write
  description: "Use of document.write (often abused in malicious injections)"
  category: script
  type: regex
-  pattern: "document\\.write\\("
+  pattern: '\bdocument\s*\.\s*write\s*\('
  severity: medium
  tags: [injection, legacy-api]
 - name: inline_event_handler
  description: "Inline event handler detected (onload, onclick, etc.)"
  category: script
  type: regex
-  pattern: "on(load|click|error|mouseover|keydown)\\s*="
+  pattern: '\bon(load|click|error|mouseover|mouseenter|submit|keydown|keyup|change)\s*='
  severity: medium
  tags: [inline-handlers, potential-xss]
 - name: obfuscated_encoding
-  description: "Suspicious use of atob() or btoa() (base64 encoding/decoding)"
+  description: "Suspicious use of atob()/btoa() (base64 encode/decode)"
  category: script
  type: regex
-  pattern: "\\b(atob|btoa)\\("
+  pattern: '\b(atob|btoa)\s*\('
  severity: medium
  tags: [encoding, obfuscation]
- name: suspicious_iframe
+- name: unescape_usage
-  description: "Iframe usage in script (possible phishing/malvertising)"
+  description: "Use of unescape() (legacy/obfuscation)"
  category: script
  type: regex
-  pattern: "<iframe[^>]*>"
+  pattern: '\bunescape\s*\('
  severity: low
  tags: [legacy-api, obfuscation]
 - name: string_timer_usage
  description: "String passed to setTimeout/setInterval (sink for XSS)"
  category: script
  type: regex
  pattern: '\bset(?:Timeout|Interval)\s*\(\s*[''"`].+[''"`]\s*,'
  severity: medium
  tags: [xss-sink]
 - name: long_hex_constants
  description: "Long hex-like constants (possible obfuscation)"
  category: script
  type: regex
  pattern: '["'']?0x[0-9a-fA-F]{16,}["'']?'
  severity: low
  tags: [obfuscation]
 # --- Form Rules ---
- name: suspicious_form_action
+- name: suspicious_form_action_absolute
-  description: "Form action with external URL (potential credential exfiltration)"
+  description: "Form action uses absolute URL (potential credential exfiltration)"
  category: form
  type: regex
-  pattern: "<form[^>]*action=['\"]http"
+  pattern: '<form\b[^>]*\baction\s*=\s*[''"]https?://'
  severity: medium
  tags: [exfiltration, form]
 - name: hidden_inputs
-  description: "Form with hidden inputs (possible credential harvesting)"
+  description: "Form with hidden inputs (could be used to smuggle data)"
  category: form
  type: regex
-  pattern: "<input[^>]*type=['\"]hidden"
+  pattern: '<input\b[^>]*\btype\s*=\s*[''"]hidden[''"]'
  severity: low
  tags: [stealth, form]
 - name: password_field
-  description: "Form requesting password field"
+  description: "Form requests a password field"
  category: form
  type: regex
-  pattern: "<input[^>]*type=['\"]password"
+  pattern: '<input\b[^>]*\btype\s*=\s*[''"]password[''"]'
  severity: high
  tags: [credentials, form]
 # --- Text Rules (Social Engineering / BEC) ---
 - name: urgent_request
  description: "Language suggesting urgency (common in phishing/BEC)"
  category: text
  type: regex
-  pattern: "(urgent|immediately|asap|action required)"
+  pattern: '\b(urgent|immediately|asap|action\s*required|verify\s*now)\b'
  severity: medium
  tags: [bec, urgency]
 - name: account_suspension
  description: "Threat of account suspension/closure"
  category: text
  type: regex
-  pattern: "(account.*suspend|account.*close|verify.*account)"
+  pattern: '\b(account\s*(suspend|closure|close)|verify\s*account)\b'
  severity: medium
  tags: [bec, scare-tactics]
 - name: financial_request
  description: "Request for gift cards, wire transfer, or money"
  category: text
  type: regex
-  pattern: "(gift card|wire transfer|bank account|bitcoin|payment required)"
+  pattern: '\b(gift\s*card|wire\s*transfer|bank\s*account|bitcoin|crypto|payment\s*required)\b'
  severity: high
  tags: [bec, financial]
 - name: credential_reset
  description: "Password reset or credential reset wording"
  category: text
  type: regex
-  pattern: "(reset password|update credentials|login to verify)"
+  pattern: '\b(reset\s*password|update\s*credentials|log\s*in\s*to\s*verify|password\s*expiry)\b'
  severity: medium
  tags: [bec, credentials]
 - name: suspicious_iframe
  description: "Iframe tag present (possible phishing/malvertising/drive-by)"
  category: text
  type: regex
  pattern: '<iframe\b[^>]*\bsrc\s*=\s*[''"][^''"]+[''"]'
  severity: medium
  tags: [iframe, phishing, malvertising]
--- a/app/routes.py
+++ b/app/routes.py
@@ -65,7 +65,7 @@ def analyze():
    try:
        engine = current_app.config.get("RULE_ENGINE")
-        result = asyncio.run(fetch_page_artifacts(url, storage, engine=engine))
+        result = asyncio.run(fetch_page_artifacts(url, storage))
        # result = asyncio.run(fetch_page_artifacts(url, storage))
        current_app.logger.info(f"[+] Analysis done for {url}")
    except Exception as e:
--- a/app/rules/function_rules.py
+++ b/app/rules/function_rules.py
@@ -0,0 +1,203 @@
 """
 app/rules/function_rules.py
 Class-based adapters + function-based rules for SneakyScope.
 Design:
 - FactAdapter: converts text snippets into structured 'facts' dicts by category.
 - FunctionRuleAdapter: wraps a rule function (expects dict facts) so it can be
  used directly by the RuleEngine even when the engine is given strings.
 Each rule returns (matched: bool, reason: Optional[str]).
 If matched is True, 'reason' should explain why.
 Note:
 - Form rules work today with text snippets, thanks to FunctionRuleAdapter+FactAdapter.
 - Script rules expect per-script dict facts (src/base_hostname/etc.). They are
  registered now and will fully activate when you evaluate per-script contexts.
 """
 from __future__ import annotations
 from typing import Any, Dict, Optional
 from urllib.parse import urlparse
 # ---------------------------------------------------------------------------
 # Adapters
 # ---------------------------------------------------------------------------
 class FactAdapter:
    """
    Converts raw text/html snippets into structured 'facts' suitable for
    function-based rules. If input is already a dict, returns it unchanged.
    You can expand the per-category parsers over time as needed.
    """
    def __init__(self, logger: Optional[Any] = None) -> None:
        self.logger = logger
    def adapt(self, text_or_facts: Any, category: str = "") -> Dict[str, Any]:
        """
        Adapt text_or_facts (str or dict) into a facts dict.
        Args:
            text_or_facts: Either raw string snippet or an already-structured dict.
            category: 'form' | 'script' | 'text' | ... (used to choose parser)
        """
        # Already structured — pass through
        if isinstance(text_or_facts, dict):
            # Ensure a category key for consistency (optional)
            text_or_facts.setdefault("category", category or text_or_facts.get("category") or "")
            return text_or_facts
        # String snippets are parsed by category
        if isinstance(text_or_facts, str):
            if category == "form":
                return self._adapt_form_snippet(text_or_facts)
            elif category == "script":
                # For now, we don't parse script snippets into facts. Script rules expect
                # per-script dicts (src/base_hostname/etc.), which you'll provide when you
                # add per-script evaluation. Return minimal facts for safety.
                return {"category": "script", "raw": text_or_facts}
            elif category == "text":
                return {"category": "text", "raw": text_or_facts}
            else:
                if self.logger:
                    self.logger.warning(f"[FactAdapter] Unknown category '{category}', returning raw snippet.")
                return {"category": category, "raw": text_or_facts}
        # Fallback for unrecognized input types
        if self.logger:
            self.logger.warning(f"[FactAdapter] Unsupported input type: {type(text_or_facts)!r}")
        return {"category": category, "raw": text_or_facts}
    # ---- Per-category parsers ----
    def _adapt_form_snippet(self, snippet: str) -> Dict[str, Any]:
        """
        Parse the simple form snippet format used by browser.py today, e.g.:
            action=https://example.com/post
            method=post
            inputs=
              - name=email type=text
              - name=password type=password
        Only extracts fields needed by current function rules.
        """
        facts: Dict[str, Any] = {"category": "form", "raw": snippet}
        lines = snippet.splitlines()
        i = 0
        n = len(lines)
        while i < n:
            line = (lines[i] or "").strip()
            if line.startswith("action="):
                facts["action"] = line.split("=", 1)[1].strip()
            elif line.startswith("method="):
                facts["method"] = line.split("=", 1)[1].strip()
            i = i + 1
        # Normalize context keys expected by form rules
        facts.setdefault("base_url", "")       # filled by caller later if desired
        facts.setdefault("base_hostname", "")  # filled by caller later if desired
        return facts
 class FunctionRuleAdapter:
    """
    Callable wrapper that adapts engine input (str or dict) into 'facts' and then
    invokes the underlying function rule that expects a facts dict.
    Usage:
        wrapped = FunctionRuleAdapter(fn=form_action_missing, category="form", adapter=FactAdapter(app.logger))
        matched, reason = wrapped("action=https://...")  # engine-friendly
    """
    def __init__(self, fn, category: str = "", adapter: Optional[FactAdapter] = None) -> None:
        self.fn = fn
        self.category = category
        self.adapter = adapter or FactAdapter()
    def __call__(self, text_or_facts: Any):
        facts = self.adapter.adapt(text_or_facts, category=self.category)
        return self.fn(facts)
 # ---------------------------------------------------------------------------
 # Function-based rules (dict 'facts' expected)
 # ---------------------------------------------------------------------------
 # ---------------- Script rules ----------------
 def script_src_uses_data_or_blob(facts: Dict[str, Any]):
    """Flags <script> tags with src='data:' or 'blob:'."""
    src = facts.get("src") or ""
    if isinstance(src, str) and src.startswith(("data:", "blob:")):
        scheme = src.split(":", 1)[0]
        return True, f"Script src uses {scheme}: URL"
    return False, None
 def script_src_has_dangerous_extension(facts: Dict[str, Any]):
    """Flags <script> tags with dangerous file extensions (e.g., .vbs, .hta)."""
    src = facts.get("src") or ""
    if not isinstance(src, str):
        return False, None
    low = src.lower()
    dangerous = (".vbs", ".hta")
    i = 0
    m = len(dangerous)
    while i < m:
        ext = dangerous[i]
        if low.endswith(ext):
            return True, f"External script has dangerous extension ({ext})"
        i = i + 1
    return False, None
 def script_third_party_host(facts: Dict[str, Any]):
    """Flags scripts loaded from a different hostname than the page."""
    base_host = facts.get("base_hostname") or ""
    src_host = facts.get("src_hostname") or ""
    if base_host and src_host and base_host != src_host:
        return True, f"Third-party script host: {src_host}"
    return False, None
 # ---------------- Form rules ----------------
 def form_submits_to_different_host(facts: Dict[str, Any]):
    """Flags <form> actions that submit to a different hostname than the page."""
    base_host = facts.get("base_hostname") or ""
    action = facts.get("action") or ""
    try:
        action_host = urlparse(action).hostname
        if action_host and base_host and action_host != base_host:
            return True, "Form submits to a different host"
    except Exception:
        # Parsing failed; treat as no match rather than erroring out
        pass
    return False, None
 def form_http_on_https_page(facts: Dict[str, Any]):
    """Flags forms submitting over HTTP while the page was loaded over HTTPS."""
    base_url = facts.get("base_url") or ""
    action = facts.get("action") or ""
    try:
        if urlparse(base_url).scheme == "https" and urlparse(action).scheme == "http":
            return True, "Form submits over insecure HTTP"
    except Exception:
        pass
    return False, None
 def form_action_missing(facts: Dict[str, Any]):
    """Flags <form> elements with no action attribute."""
    action = (facts.get("action") or "").strip()
    if not action:
        return True, "Form has no action attribute"
    return False, None
--- a/app/static/style.css
+++ b/app/static/style.css
@@ -286,3 +286,67 @@ details ul, details p {
 .scripts-table td small {
  opacity: 0.85;
 }
 /* keep the table from exploding */
 .scripts-table {
  table-layout: fixed;
  width: 100%;
 }
 /* columns: Type | Source URL | Snippet | Matches */
 .scripts-table th:nth-child(1) { width: 8rem; }
 .scripts-table th:nth-child(2) { width: 32rem; }   /* tweak as you like */
 .scripts-table th:nth-child(3) { width: 24rem; }
 .scripts-table th:nth-child(4) { width: auto; }
 /* ellipsize the table cells by default */
 .scripts-table td, .scripts-table th {
  overflow: hidden;
  text-overflow: ellipsis;
  white-space: nowrap;
 }
 /* let URLs/snippets wrap *inside* their cell when expanded content shows */
 .breakable {
  white-space: normal;
  overflow-wrap: anywhere;
  word-break: break-word;
 }
 /* when user opens <details>, keep code readable without blowing layout */
 details pre.code {
  white-space: pre-wrap;
  word-break: break-word;
  max-height: 18rem;
  overflow: auto;
 }
 /* Generic badge */
 .badge {
  display: inline-block;
  padding: 0.1rem 0.4rem;
  margin-left: 0.35rem;
  border-radius: 0.4rem;
  font-size: 0.75rem;
  line-height: 1;
  vertical-align: middle;
  user-select: none;
 }
 /* Severity colors */
 .sev-high   { background: #fdecea; color: #b71c1c; border: 1px solid #f5c6c4; }
 .sev-medium { background: #fff8e1; color: #8a6d3b; border: 1px solid #ffe0a3; }
 .sev-low    { background: #e8f5e9; color: #1b5e20; border: 1px solid #b9e6be; }
 /* Tag chips */
 .chip {
  display: inline-block;
  padding: 0.1rem 0.35rem;
  margin-left: 0.25rem;
  border-radius: 999px;
  font-size: 0.7rem;
  line-height: 1;
  background: #eef2f7;
  color: #425466;
  border: 1px solid #d9e2ec;
 }
--- a/app/templates/result.html
+++ b/app/templates/result.html
@@ -187,9 +187,9 @@
        <td>{{ s.type or 'unknown' }}</td>
        <!-- Source URL -->
-        <td>
+        <td class="breakable">
          {% if s.src %}
-            <a href="{{ s.src }}" target="_blank">{{ s.src }}</a>
+            <a href="{{ s.src }}" target="_blank">{{ s.src[:50] }}</a>
          {% else %}
            N/A
          {% endif %}
@@ -199,8 +199,8 @@
        <td>
          {% if s.content_snippet %}
            <details>
-              <summary>View snippet</summary>
+              <summary>View snippet ({{ s.content_snippet|length }} chars) </summary>
-              <pre class="code">{{ s.content_snippet }}</pre>
+              <pre class="code">({{ s.content_snippet}}</pre>
            </details>
          {% else %}
            N/A
@@ -213,18 +213,26 @@
          {% set has_heur = s.heuristics and s.heuristics|length > 0 %}
          {% if has_rules %}
-            <strong>Rules</strong>
+          <strong>Rules</strong>
-            <ul>
+          <ul>
-              {% for r in s.rules %}
+            {% for r in s.rules %}
-                <li title="{{ r.description or '' }}">
+              <li title="{{ r.description or '' }}">
-                  {{ r.name }}
+                {{ r.name }}
-                  {% if r.description %}
+                {% if r.severity %}
-                    <small>— {{ r.description }}</small>
+                  <span class="badge sev-{{ r.severity|lower }}">{{ r.severity|title }}</span>
-                  {% endif %}
+                {% endif %}
-                </li>
+                {% if r.tags %}
-              {% endfor %}
+                  {% for t in r.tags %}
-            </ul>
+                    <span class="chip" title="Tag: {{ t }}">{{ t }}</span>
-          {% endif %}
+                  {% endfor %}
                {% endif %}
                {% if r.description %}
                  <small>— {{ r.description }}</small>
                {% endif %}
              </li>
            {% endfor %}
          </ul>
        {% endif %}
          {% if has_heur %}
            <strong>Heuristics</strong>
--- a/app/utils/rules_engine.py
+++ b/app/utils/rules_engine.py
@@ -1,132 +1,291 @@
 """
 rules_engine.py
-A flexible rule-based engine for detecting suspicious patterns in scripts, forms,
+Flask-logger integrated rules engine for SneakyScope.
-or other web artifacts inside SneakyScope. 
+Logs go to `current_app.logger` when a Flask app context is active,
-
+otherwise to a namespaced standard logger "sneakyscope.rules".
 Each rule is defined as:
    - name: str            # Rule identifier
    - description: str     # Human-readable reason for analysts
    - category: str        # e.g., 'script', 'form', 'text', 'generic'
    - type: str            # 'regex' or 'function'
    - pattern: str         # Regex pattern (if type=regex)
    - function: callable   # Python function returning (bool, str) (if type=function)
 The framework returns a list of results, with pass/fail and reasoning.
 """
 import re
 import logging
 from dataclasses import dataclass, asdict, field
 from pathlib import Path
-from typing import Callable, Dict, List, Tuple, Union
+from typing import Callable, Dict, List, Optional, Tuple, Union
 import yaml
 try:
    # Flask is optional; engine still works without it.
    from flask import current_app, has_app_context
 except Exception:
    current_app = None  # type: ignore
    def has_app_context() -> bool:  # type: ignore
        return False
 def get_engine_logger() -> logging.Logger:
    """
    Return a logger that prefers Flask's current_app.logger if available.
    Falls back to a namespaced standard logger otherwise.
    """
    if has_app_context() and current_app is not None and hasattr(current_app, "logger"):
        return current_app.logger
    return logging.getLogger("sneakyscope.rules")
@dataclass
 class Rule:
-    """Represents a single detection rule."""
+    """
    Represents a single detection rule.
-    def __init__(
+    When rule_type == 'regex', 'pattern' must be provided.
-        self,
+    When rule_type == 'function', 'function' must be provided and return (matched: bool, reason: str).
-        name: str,
+    """
-        description: str,
+    name: str
-        category: str,
+    description: str
-        rule_type: str = "regex",
+    category: str
-        pattern: str = None,
+    rule_type: str = "regex"
-        function: Callable = None,
+    pattern: Optional[str] = None
-    ):
+    function: Optional[Callable[[str], Tuple[bool, str]]] = None
-        self.name = name
+    severity: Optional[str] = None             # 'low' | 'medium' | 'high' (optional)
-        self.description = description
+    tags: Optional[List[str]] = field(default=None)  # e.g., ['obfuscation', 'phishing'] (optional)
        self.category = category
        self.rule_type = rule_type
        self.pattern = pattern
        self.function = function
-    def run(self, text: str) -> Tuple[bool, str]:
+    # Internal compiled regex cache (not serialized)
    _compiled_regex: Optional[re.Pattern] = field(default=None, repr=False, compare=False)
    def compile_if_needed(self, logger: Optional[logging.Logger] = None) -> bool:
        """
-        Run the rule on given text.
+        Compile the regex pattern once for performance, if applicable.
        Returns:
            bool: True if the regex is compiled and ready, False otherwise.
        """
        if logger is None:
            logger = get_engine_logger()
        if self.rule_type == "regex" and self.pattern:
            try:
                self._compiled_regex = re.compile(self.pattern, re.IGNORECASE)
                logger.debug(f"[Rule] Compiled regex for '{self.name}'")
                return True
            except re.error as rex:
                self._compiled_regex = None
                logger.warning(f"[Rule] Failed to compile regex for '{self.name}': {rex}")
                return False
        return False
    def run(self, text: str, logger: Optional[logging.Logger] = None) -> Tuple[bool, str]:
        """
        Run the rule on the given text.
        Returns:
            (matched: bool, reason: str)
        """
-        if self.rule_type == "regex" and self.pattern:
+        if logger is None:
-            if re.search(self.pattern, text, re.IGNORECASE):
+            logger = get_engine_logger()
        if self.rule_type == "regex":
            if not self.pattern:
                logger.warning(f"[Rule] '{self.name}' missing regex pattern.")
                return False, "Invalid rule configuration: missing pattern"
            if self._compiled_regex is None:
                compiled_ok = self.compile_if_needed(logger=logger)
                if not compiled_ok:
                    return False, f"Invalid regex pattern: {self.pattern!r}"
            if self._compiled_regex and self._compiled_regex.search(text):
                return True, f"Matched regex '{self.pattern}' → {self.description}"
-            else:
+            return False, "No match"
-                return False, "No match"
+
-        elif self.rule_type == "function" and callable(self.function):
+        if self.rule_type == "function":
-            return self.function(text)
+            if callable(self.function):
-        else:
+                try:
-            return False, "Invalid rule configuration"
+                    matched, reason = self.function(text)
                    if isinstance(matched, bool) and isinstance(reason, str):
                        return matched, reason
                    logger.warning(f"[Rule] '{self.name}' function returned invalid types.")
                    return False, "Invalid function return type; expected (bool, str)"
                except Exception as exc:
                    logger.exception(f"[Rule] '{self.name}' function raised exception.")
                    return False, f"Rule function raised exception: {exc!r}"
            logger.warning(f"[Rule] '{self.name}' has invalid function configuration.")
            return False, "Invalid rule configuration: function not callable"
        logger.warning(f"[Rule] '{self.name}' has unknown type '{self.rule_type}'.")
        return False, f"Invalid rule configuration: unknown type '{self.rule_type}'"
@dataclass
 class RuleResult:
    """
    Uniform per-rule outcome for UI/API consumption.
    result is "PASS" or "FAIL" (FAIL == matched True)
    """
    name: str
    description: str
    category: str
    result: str                  # "PASS" | "FAIL"
    reason: Optional[str] = None
    severity: Optional[str] = None
    tags: Optional[List[str]] = None
 class RuleEngine:
-    """Loads and executes rules against provided text."""
+    """
    Loads and executes rules against provided text, with Flask-aware logging.
    """
-    def __init__(self, rules: List[Rule] = None):
+    def __init__(self, rules: Optional[List[Rule]] = None, logger: Optional[logging.Logger] = None):
-        self.rules = rules or []
+        """
        Args:
            rules: Optional initial rule list.
            logger: Optional explicit logger. If None, uses Flask app logger if available,
                    otherwise a namespaced standard logger.
        """
        if logger is None:
            self.logger = get_engine_logger()
        else:
            self.logger = logger
-    def add_rule(self, rule: Rule):
+        self.rules: List[Rule] = rules or []
-        """Add a new rule at runtime."""
+        self._compile_all()
    def _compile_all(self) -> None:
        """
        Compile all regex rules at initialization and warn about failures.
        """
        index = 0
        total = len(self.rules)
        while index < total:
            rule = self.rules[index]
            if rule.rule_type == "regex":
                compiled_ok = rule.compile_if_needed(logger=self.logger)
                if not compiled_ok:
                    self.logger.warning(f"[Engine] Regex failed at init for rule '{rule.name}' (pattern={rule.pattern!r})")
            index = index + 1
    def add_rule(self, rule: Rule) -> None:
        """
        Add a new rule at runtime; compiles regex if needed and logs failures.
        """
        self.rules.append(rule)
        if rule.rule_type == "regex":
            compiled_ok = rule.compile_if_needed(logger=self.logger)
            if not compiled_ok:
                self.logger.warning(f"[Engine] Regex failed when adding rule '{rule.name}' (pattern={rule.pattern!r})")
-    def run_all(self, text: str, category: str = None) -> List[Dict]:
+    def run_all(self, text: str, category: Optional[str] = None) -> List[Dict]:
        """
        Run all rules against text.
        Args:
-            text: str → the content to test
+            text: The content to test.
-            category: str → optional, only run rules in this category
+            category: If provided, only evaluate rules that match this category.
        Returns:
-            List of dicts with rule results.
+            List of dicts with PASS/FAIL per rule (JSON-serializable).
        """
-        results = []
+        results: List[Dict] = []
-        for rule in self.rules:
+
-            if category and rule.category != category:
+        index = 0
        total = len(self.rules)
        while index < total:
            rule = self.rules[index]
            if category is not None and rule.category != category:
                index = index + 1
                continue
-            matched, reason = rule.run(text)
+            matched, reason = rule.run(text, logger=self.logger)
-            results.append(
+
-                {
+            result_str = "FAIL" if matched else "PASS"
-                    "rule": rule.name,
+            reason_to_include: Optional[str]
-                    "category": rule.category,
+            if matched:
-                    "matched": matched,
+                reason_to_include = reason
-                    "reason": reason if matched else None,
+            else:
-                }
+                reason_to_include = None
            rr = RuleResult(
                name=rule.name,
                description=rule.description,
                category=rule.category,
                result=result_str,
                reason=reason_to_include,
                severity=rule.severity,
                tags=rule.tags,
            )
            results.append(asdict(rr))
            index = index + 1
        self.logger.debug(f"[Engine] Completed evaluation. Returned {len(results)} rule results.")
        return results
-def load_rules_from_yaml(yaml_file: Union[str, Path]) -> List[Rule]:
+def load_rules_from_yaml(yaml_file: Union[str, Path], logger: Optional[logging.Logger] = None) -> List[Rule]:
    """
    Load rules from a YAML file.
-    Example YAML format:
+    Supports optional 'severity' and 'tags' keys.
        - name: suspicious_eval
          description: "Use of eval() in script"
          category: script
          type: regex
          pattern: "\\beval\\("
-        - name: password_reset
+    Example YAML:
-          description: "Password reset wording"
+      - name: suspicious_eval
-          category: text
+        description: "Use of eval() in script"
-          type: regex
+        category: script
-          pattern: "reset password"
+        type: regex
        pattern: "\\beval\\("
        severity: medium
        tags: [obfuscation]
    Returns:
        List[Rule]
    """
-    rules = []
+    if logger is None:
-    with open(yaml_file, "r", encoding="utf-8") as f:
+        logger = get_engine_logger()
    rules: List[Rule] = []
    path = Path(yaml_file)
    with path.open("r", encoding="utf-8") as f:
        data = yaml.safe_load(f)
-    for item in data:
+    if not isinstance(data, list):
-        rule = Rule(
+        logger.error("[Loader] Rules YAML must be a list of rule objects.")
-            name=item["name"],
+        raise ValueError("Rules YAML must be a list of rule objects.")
            description=item["description"],
            category=item["category"],
            rule_type=item.get("type", "regex"),
            pattern=item.get("pattern"),
        )
        rules.append(rule)
    idx = 0
    total = len(data)
    while idx < total:
        item = data[idx]
        name = item.get("name")
        description = item.get("description")
        category = item.get("category")
        rule_type = item.get("type", "regex")
        pattern = item.get("pattern")
        severity = item.get("severity")
        tags = item.get("tags")
        if not name or not description or not category:
            logger.warning(f"[Loader] Skipping invalid rule at index {idx}: missing required fields.")
            idx = idx + 1
            continue
        rule = Rule(
            name=name,
            description=description,
            category=category,
            rule_type=rule_type,
            pattern=pattern,
            function=None,  # function rules should be registered in code
            severity=severity,
            tags=tags if isinstance(tags, list) else None,
        )
        rules.append(rule)
        idx = idx + 1
    logger.info(f"[Loader] Loaded {len(rules)} rules from '{yaml_file}'.")
    return rules
--- a/app/utils/settings.py
+++ b/app/utils/settings.py
@@ -39,6 +39,10 @@ BASE_DIR = Path(__file__).resolve().parent.parent
 DEFAULT_SETTINGS_FILE = BASE_DIR / "config" / "settings.yaml"
 # ---------- CONFIG DATA CLASSES ----------
@dataclass
 class UIConfig:
    snippet_preview_len: int = 160
@dataclass
 class Cache_Config:
    whois_cache_days: int = 7
@@ -56,6 +60,7 @@ class AppConfig:
@dataclass
 class Settings:
    cache: Cache_Config = field(default_factory=Cache_Config)
    ui: UIConfig = field(default_factory=UIConfig)
    app: AppConfig = field(default_factory=AppConfig)
    @classmethod
--- a/docs/Feature
+++ b/docs/Feature
@@ -0,0 +1,31 @@
 # Feature Session Plan – SneakyScope
 **Feature:**
 * \[Short description of the feature or improvement]
 **Effort:**
 * Easy / Medium / Hard
 **Dependencies:**
 * \[List of prerequisites or related tasks that must be done first]
 **Design Notes:**
 * \[Goals, considerations, analyst/UX needs, edge cases, pitfalls to avoid]
 **Implementation Tasks:**
 * [ ] Step 1 – …
 * [ ] Step 2 – …
 * [ ] Step 3 – …
 **Validation / Testing:**
 * \[How we’ll verify it works — e.g., test cases, UI check, API output, logs]
 **Next Steps After Completion:**
 * \[What this unblocks or enables, i.e. the next feature/dependency]
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,6 @@
 # Roadmap and chats
 ## Vibecode? Brotha Ewww
 No, I don't "vibe code". There is a huge difference between asking AI to do everything vs "give me a boiler plate function" and tweaking from there. I've been coding over 20 years, these fingers have typed enough. So, yes I use AI while I code. 
 Some of these little files in here are just helpful ways I've started using AI to help keep me on track with the project. Feel free to borrow.
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -1,71 +1,32 @@
 # SneakyScope — Roadmap (Updated 8-20-25)
-## Priority 1 – Core Functionality / Stability
+## Priority 1 – Core Analysis / Stability
-**Permissions / Storage Paths**
+* Opt-in “fetch external scripts” mode (off by default): on submission, download external script content (size/time limits) and run rules on fetched content.
 * Remove remaining legacy form “flagged\_reasons” plumbing once all equivalent function rules are in place.
 * Unit tests: YAML compilation, function-rule adapters, and per-script/per-form rule cases.
-* ✅ `/data` and other mounted volumes setup handled by `sandbox.sh`
+## Priority 2 – API Layer
 * ✅ Downloads, screenshots, and HTML artifacts are written correctly (`safe_write` in `io_helpers.py`)
---
+* API endpoints: `/screenshot`, `/source`, `/analyse`.
 * OpenAPI spec: create `openapi/openapi.yaml` and serve at `/api/openapi.yaml`.
 * Docs UI: Swagger UI or Redoc at `/docs`.
-## Priority 2 – Data Accuracy / Enrichment
+## Priority 3 – UI / UX
-**WHOIS & GeoIP Enhancements**
+* Front page/input handling: auto-prepend `http://`/`https://`/`www.` for bare domains.
 * Source code viewer: embed page source in an editor view for readability.
 * Scripts table: toggle between “Only suspicious” and “All scripts”.
 * Rules Lab (WYSIWYG tester): paste a rule, validate/compile, run against sample text; lightweight nav entry.
-* ✅ Implemented Python-based WHOIS parsing with fallback to raw WHOIS text
+## Priority 4 – Artifact Management & Ops
 * ✅ Default `"Possible Privacy"` or `"N/A"` for missing WHOIS fields
 * ✅ GeoIP + ASN + ISP info displayed per IP in **accordion tables**
 * ✅ Cache WHOIS and GeoIP results to reduce repeated queries
-**Suspicious Scripts & Forms**
+* Retention/cleanup policy for old artifacts (age/size thresholds).
 * Make periodic maintenance scripts for storage; cleanup options set in `settings.yaml`.
 * Results caching UX: add “Re-run analysis” vs. “Load from cache” controls in the results UI.
-* [ ] Expand flagged script and form output with reasons for analysts
+## Priority 5 – Extras / Integrations
 * [ ] Show each check and if it triggered flags (pass/fail for each check)
-**Add Suspicious BEC words**
+* Bulk URL analysis (batch/queue).
-
+* Alerting & integrations (webhooks, Slack, email).
-* ✅ Look for things like `"reset password"`
+* Optional: analyst verdict tags and export (CSV/JSON).
 * ✅ Make configurable via a config file (yaml doc with rules)
 ---
 ## Priority 3 – User Interface / UX
 **Front Page / Input Handling**
 * [ ] Automatically prepend `http://`, `https://`, and/or `www.` if a user only enters a domain
 **Result Templates / Cards**
 * [ ] load sourcecode for webpage in a code editor view or code block on page so that it's easier to read
 * [ ] Update result cards with clear, analyst-friendly explanations
 * [ ] Include flagged logic and reason lists for scripts and forms
 * ✅ Display GeoIP results in accordion tables (✅ done)
 ---
 ## Priority 4 – API Layer
 **API Endpoints**
 * [ ] Add `/screenshot` endpoint
 * [ ] Add `/source` endpoint
 * [ ] Add `/analyse` endpoint
 **OpenAPI + Docs**
 * [ ] Create initial `openapi/openapi.yaml` spec file
 * [ ] Serve spec at `/api/openapi.yaml`
 * [ ] Wire up Swagger UI or Redoc at `/docs` for interactive API exploration
 ---
 ## Priority 5 – Optional / Cleanup
 **Artifact Management**
 * [ ] Implement saving of results from a UUID as "results.json" so we don't rerun all the rules and just load from cache.
 * [ ] Implement cleanup or retention policy for old artifacts
 * [ ] Optional: Add periodic maintenance scripts for storage
 **Extra Features**
 * [ ] Placeholder for additional features (e.g., bulk URL analysis, alerting, integrations)
--- a/docs/workflow.md
+++ b/docs/workflow.md
@@ -0,0 +1,12 @@
 ### 🛠 SneakyScope Feature Workflow
 1. Pick feature from roadmap
 2. Drop in **Feature Session Plan** template
 3. Fill in description, effort, dependencies, design notes
 4. Expand into tasks → implement code → test/validate
 5. Update roadmap (remove/complete, reorder if needed)
 ---
 This way, every session starts with the same rhythm, and we don’t lose context between chats.