feat: on-demand external script analysis + code viewer; refactor form analysis to rule engine

- API: add `POST /api/analyze_script` (app/blueprints/api.py) - Fetch one external script to artifacts, run rules, return findings + snippet - Uses new ExternalScriptFetcher (results_path aware) and job UUID - Returns: { ok, final_url, status_code, bytes, truncated, sha256, artifact_path, findings[], snippet, snippet_len } - TODO: document in openapi/openapi.yaml - Fetcher: update `app/utils/external_fetch.py` - Constructed with `results_path` (UUID dir); writes to `<results_path>/scripts/fetched/<index>.js` - Loads settings via `get_settings()`, logs via std logging - UI (results.html): - Move “Analyze external script” action into **Content Snippet** column for external rows - Clicking replaces button with `<details>` snippet, shows rule matches, and adds “open in viewer” link - Robust fetch handler (checks JSON, shows errors); builds viewer URL from absolute artifact path - Viewer: - New route: `GET /view/artifact/<run_uuid>/<path:filename>` (app/blueprints/ui.py) - New template: Monaco-based read-only code viewer (viewer.html) - Removes SRI on loader to avoid integrity block; loads file via `raw_url` and detects language by extension - Forms: - Refactor `analyze_forms` to mirror scripts analysis: - Uses rule engine (`category == "form"`) across regex/function rules - Emits rows only when matches exist - Includes `content_snippet`, `action`, `method`, `inputs`, `rules` - Replace legacy plumbing (`flagged`, `flag_reasons`, `status`) in output - Normalize form function rules to canonical returns `(bool, Optional[str])`: - `form_action_missing` - `form_http_on_https_page` - `form_submits_to_different_host` - Add minor hardening (lowercasing hosts, no-op actions, clearer reasons) - CSS: add `.forms-table` to mirror `.scripts-table` (5 columns) - Fixed table layout, widths per column, chip/snippet styling, responsive tweaks - Misc: - Fix “working outside app context” issue by avoiding `current_app` at import time (left storage logic inside routes) - Add “View Source” link to open page source in viewer Refs: - Roadmap: mark “Source code viewer” done; keep TODO to add `/api/analyze_script` to OpenAPI
2025-08-21 15:32:24 -05:00
parent 05cf23ad67
commit 3a24b392f2
15 changed files with 1192 additions and 218 deletions
--- a/app/init.py
+++ b/app/init.py
@@ -19,7 +19,8 @@ from .rules.function_rules import (
    form_action_missing,
 )

-from . import routes  # blueprint
+from app.blueprints import ui  # ui blueprint
+from app.blueprints import api  # api blueprint

 # from .utils import io_helpers  # if need logging/setup later
 # from .utils import cache_db     # available for future injections
@@ -136,7 +137,8 @@ def create_app() -> Flask:
    app.config["APP_VERSION"] = f"v{settings.app.version_major}.{settings.app.version_minor}"

    # Register blueprints
-    app.register_blueprint(routes.bp)
+    app.register_blueprint(ui.bp)
+    app.register_blueprint(api.api_bp)

    # Example log lines so we know we booted cleanly
    app.logger.info(f"SneakyScope started: {app.config['APP_NAME']} {app.config['APP_VERSION']}")
--- a/app/blueprints/api.py
+++ b/app/blueprints/api.py
@@ -0,0 +1,212 @@
+# app/blueprints/api.py
+"""
+API blueprint for JSON endpoints.
+
+Endpoints:
+  POST /api/analyze_script
+    Body:
+      {
+        "job_id": "<uuid>",           # or "uuid": "<uuid>"
+        "url": "https://cdn.example.com/app.js",
+        "category": "script"          # optional, defaults to "script"
+      }
+    Response:
+      {
+        "ok": true,
+        "final_url": "...",
+        "status_code": 200,
+        "bytes": 12345,
+        "truncated": false,
+        "sha256": "...",
+        "artifact_path": "/abs/path/to/<uuid>/scripts/fetched/<index>.js",
+        "findings": [ { "name": "...", "description": "...", "severity": "...", "tags": [...], "reason": "..." }, ... ],
+        "snippet": "<trimmed content>",
+        "snippet_len": 45678
+      }
+"""
+
+import os
+import time
+from flask import Blueprint, request, jsonify, current_app, send_file, abort
+from pathlib import Path
+
+from app.utils.settings import get_settings
+from app.utils.external_fetcher import ExternalScriptFetcher
+from werkzeug.exceptions import HTTPException
+
+api_bp = Blueprint("api", __name__, url_prefix="/api")
+
+
+def _resolve_results_path(job_id: str) -> str:
+    """
+    Compute the absolute results directory for a given job UUID.
+    Prefers <BASE>/artifacts/<uuid>, falls back to <BASE>/<uuid>.
+    """
+    base_dir = "/data"
+
+    candidate_with_artifacts = os.path.join(base_dir, job_id)
+    if os.path.isdir(candidate_with_artifacts):
+        return candidate_with_artifacts
+
+    fallback = os.path.join(base_dir, job_id)
+    os.makedirs(fallback, exist_ok=True)
+    return fallback
+
+
+def _make_snippet(text: str, max_chars: int = 1200) -> str:
+    """Produce a trimmed, safe-to-render snippet of the script contents."""
+    if not text:
+        return ""
+    snippet = text.strip()
+    return (snippet[:max_chars] + "…") if len(snippet) > max_chars else snippet
+
+@api_bp.errorhandler(400)
+@api_bp.errorhandler(403)
+@api_bp.errorhandler(404)
+@api_bp.errorhandler(405)
+def _api_err(err):
+    """
+    Return JSON for common client errors.
+    """
+    if isinstance(err, HTTPException):
+        code = err.code
+        name = (err.name or "error").lower()
+    else:
+        code = 400
+        name = "error"
+    return jsonify({"ok": False, "error": name}), code
+
+
+@api_bp.errorhandler(500)
+def _api_500(err):
+    """
+    Return JSON for server errors and log the exception.
+    """
+    try:
+        current_app.logger.exception("API 500")
+    except Exception:
+        pass
+    return jsonify({"ok": False, "error": "internal server error"}), 500
+
+
+@api_bp.post("/analyze_script")
+def analyze_script():
+    """
+    Analyze EXACTLY one external script URL for a given job UUID.
+
+    Expected JSON body:
+      { "job_id": "<uuid>", "url": "https://cdn.example.com/app.js", "category": "script" }
+    """
+    body = request.get_json(silent=True) or {}
+
+    job_id_raw = body.get("job_id") or body.get("uuid")
+    script_url_raw = body.get("url")
+    category = (body.get("category") or "script").strip() or None  # default to "script"
+
+    job_id = (job_id_raw or "").strip() if isinstance(job_id_raw, str) else ""
+    script_url = (script_url_raw or "").strip() if isinstance(script_url_raw, str) else ""
+    
+    # log this request
+    current_app.logger.info(f"Got request to analyze {script_url} via API ")
+
+    if not job_id or not script_url:
+        return jsonify({"ok": False, "error": "Missing job_id (or uuid) or url"}), 400
+
+    settings = get_settings()
+    
+    if not settings.external_fetch.enabled:
+        return jsonify({"ok": False, "error": "Feature disabled"}), 400
+
+    # Resolve the UUID-backed results directory for this run.
+    results_path = _resolve_results_path(job_id)
+
+    # Initialize the fetcher; it reads its own settings internally.
+    fetcher = ExternalScriptFetcher(results_path=results_path)
+
+    # Unique index for the saved file name: <results_path>/scripts/fetched/<index>.js
+    unique_index = int(time.time() * 1000)
+
+    outcome = fetcher.fetch_one(script_url=script_url, index=unique_index)
+    if not outcome.ok or not outcome.saved_path:
+        return jsonify({
+            "ok": False,
+            "error": outcome.reason,
+            "status_code": outcome.status_code,
+            "final_url": outcome.final_url
+        }), 502
+
+    # Read bytes and decode to UTF-8 for rules and snippet
+    try:
+        with open(outcome.saved_path, "rb") as fh:
+            js_text = fh.read().decode("utf-8", errors="ignore")
+    except Exception:
+        js_text = ""
+
+    # Pull the rules engine from the app (prefer attribute, then config).
+    findings = []
+    try:
+        engine = getattr(current_app, "rule_engine", None)
+        if engine is None:
+            engine = current_app.config.get("RULE_ENGINE")
+    except Exception:
+        engine = None
+
+    if engine is not None and hasattr(engine, "run_all"):
+        try:
+            # run_all returns PASS/FAIL for each rule; we only surface FAIL (matched) to the UI
+            all_results = engine.run_all(js_text, category=category)
+            if isinstance(all_results, list):
+                matched = []
+                for r in all_results:
+                    try:
+                        if (r.get("result") == "FAIL"):
+                            matched.append({
+                                "name": r.get("name"),
+                                "description": r.get("description"),
+                                "severity": r.get("severity"),
+                                "tags": r.get("tags") or [],
+                                "reason": r.get("reason"),
+                                "category": r.get("category"),
+                            })
+                    except Exception:
+                        # Ignore malformed entries
+                        continue
+                findings = matched
+        except Exception as exc:
+            try:
+                current_app.logger.error("Rule engine error", extra={"error": str(exc)})
+            except Exception:
+                pass
+            findings = []
+
+    snippet = _make_snippet(js_text, max_chars=settings.ui.snippet_preview_len)
+
+    return jsonify({
+        "ok": True,
+        "final_url": outcome.final_url,
+        "status_code": outcome.status_code,
+        "bytes": outcome.bytes_fetched,
+        "truncated": outcome.truncated,
+        "sha256": outcome.sha256_hex,
+        "artifact_path": outcome.saved_path,
+        "findings": findings,         # only FAILed rules
+        "snippet": snippet,
+        "snippet_len": len(js_text)
+    })
+
+
+@api_bp.get("/artifacts/<run_uuid>/<filename>")
+def get_artifact_raw(run_uuid, filename):
+    # prevent path traversal
+    if "/" in filename or ".." in filename:
+        abort(400)
+    
+    run_dir = _resolve_results_path(run_uuid)
+    full_path = Path(run_dir) / filename
+
+    # if file is not there, give a 404
+    if not os.path.isfile(full_path):
+        abort(404)
+
+    # else return file
+    return send_file(full_path, as_attachment=False)
--- a/app/blueprints/ui.py
+++ b/app/blueprints/ui.py
@@ -1,3 +1,5 @@
+# app/blueprints/ui.py
+
 import os
 import json
 import asyncio
@@ -5,11 +7,10 @@ from pathlib import Path
 from datetime import datetime
 from flask import Blueprint, render_template, request, redirect, url_for, flash, current_app, send_file, abort

-# from .browser import fetch_page_artifacts
-from .utils.browser import get_browser
-from .utils.enrichment import enrich_url
-from .utils.settings import get_settings
-from .utils.io_helpers import get_recent_results
+from app.utils.browser import get_browser
+from app.utils.enrichment import enrich_url
+from app.utils.settings import get_settings
+from app.utils.io_helpers import get_recent_results

 bp = Blueprint("main", __name__)

@@ -34,9 +35,6 @@ def index():
    The number of recent runs is controlled via settings.cache.recent_runs_count (int).
    Falls back to 10 if not present or invalid.
    """
-    # Resolve SANDBOX_STORAGE from app config
-    storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
-
    # Pull recent count from settings with a safe fallback
    try:
        # settings is already initialized at module import in your file
@@ -46,13 +44,15 @@ def index():
    except Exception:
        recent_count = 10

+    # Resolve SANDBOX_STORAGE from app config
+    storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
+
    # Build the recent list (non-fatal if storage is empty or unreadable)
    recent_results = get_recent_results(storage, recent_count, current_app.logger)

    # Pass to template; your index.html will hide the card if list is empty
    return render_template("index.html", recent_results=recent_results)

-
@bp.route("/analyze", methods=["POST"])
 def analyze():
    url = request.form.get("url", "").strip()
@@ -87,6 +87,7 @@ def analyze():

@bp.route("/results/<run_uuid>", methods=["GET"])
 def view_result(run_uuid: str):
+    # Resolve SANDBOX_STORAGE from app config
    storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
    run_dir = storage / run_uuid
    results_path = run_dir / "results.json"
@@ -105,6 +106,7 @@ def view_result(run_uuid: str):

@bp.route("/artifacts/<run_uuid>/<filename>", methods=["GET"])
 def artifacts(run_uuid: str, filename: str):
+    # Resolve SANDBOX_STORAGE from app config
    storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
    run_dir = storage / run_uuid
    full_path = run_dir / filename
@@ -123,3 +125,11 @@ def artifacts(run_uuid: str, filename: str):
    return send_file(full_path)


+@bp.get("/view/artifact/<run_uuid>/<filename>")
+def view_artifact(run_uuid, filename):
+    # Build a safe raw URL that streams the file (you said you already have this route)
+    raw_url = url_for('api.get_artifact_raw', run_uuid=run_uuid, filename=filename)
+    # Optional: derive language server-side if you prefer
+    language = None  # e.g., 'javascript'
+    return render_template('viewer.html', filename=filename, raw_url=raw_url, language=language)
+
--- a/app/config/settings.yaml
+++ b/app/config/settings.yaml
@@ -8,5 +8,11 @@ cache:
  whois_cache_days: 7
  geoip_cache_days: 7

+external_script_fetch: 
+  enabled: True
+  max_total_mb: 5 
+  max_time_ms: 3000 
+  max_redirects: 3
+
 ui:
  snippet_preview_len: 300
--- a/app/rules/function_rules.py
+++ b/app/rules/function_rules.py
@@ -22,6 +22,7 @@ from __future__ import annotations
 from typing import Any, Dict, Optional
 from urllib.parse import urlparse

+_NOOP_ACTIONS = {"", "#", "javascript:void(0)", "javascript:void(0);"}

 # ---------------------------------------------------------------------------
 # Adapters
@@ -169,35 +170,48 @@ def script_third_party_host(facts: Dict[str, Any]):

 # ---------------- Form rules ----------------

-def form_submits_to_different_host(facts: Dict[str, Any]):
-    """Flags <form> actions that submit to a different hostname than the page."""
-    base_host = facts.get("base_hostname") or ""
-    action = facts.get("action") or ""
-    try:
-        action_host = urlparse(action).hostname
-        if action_host and base_host and action_host != base_host:
-            return True, "Form submits to a different host"
-    except Exception:
-        # Parsing failed; treat as no match rather than erroring out
-        pass
+def form_action_missing(facts: Dict[str, Any]):
+    """Flags <form> elements with no meaningful action attribute."""
+    action = (facts.get("action") or "").strip()
+    if action in _NOOP_ACTIONS:
+        return True, "Form has no action attribute (or uses a no-op action)"
    return False, None


 def form_http_on_https_page(facts: Dict[str, Any]):
    """Flags forms submitting over HTTP while the page was loaded over HTTPS."""
-    base_url = facts.get("base_url") or ""
-    action = facts.get("action") or ""
+    base_url = (facts.get("base_url") or "").strip()
+    action   = (facts.get("action") or "").strip()
+
    try:
-        if urlparse(base_url).scheme == "https" and urlparse(action).scheme == "http":
-            return True, "Form submits over insecure HTTP"
+        base_scheme = (urlparse(base_url).scheme or "").lower()
+        parsed_act  = urlparse(action)
+        act_scheme  = (parsed_act.scheme or "").lower()
    except Exception:
-        pass
+        return False, None  # parsing trouble → don’t flag
+
+    # Only flag absolute http:// actions on https pages.
+    # Relative or schemeless ('//host/...') isn’t flagged here (it won’t be HTTP on an HTTPS page).
+    if base_scheme == "https" and act_scheme == "http":
+        return True, f"Submits over insecure HTTP (action={parsed_act.geturl()})"
    return False, None


-def form_action_missing(facts: Dict[str, Any]):
-    """Flags <form> elements with no action attribute."""
-    action = (facts.get("action") or "").strip()
-    if not action:
-        return True, "Form has no action attribute"
+def form_submits_to_different_host(facts: Dict[str, Any]):
+    """Flags <form> actions that submit to a different hostname than the page."""
+    base_host = (facts.get("base_hostname") or "").strip().lower()
+    action    = (facts.get("action") or "").strip()
+
+    if not action or action in _NOOP_ACTIONS:
+        return False, None
+
+    try:
+        parsed = urlparse(action)
+        act_host = (parsed.hostname or "").lower()
+    except Exception:
+        return False, None
+
+    # Only compare when the action specifies a host (absolute URL or schemeless //host/path).
+    if act_host and base_host and act_host != base_host:
+        return True, f"Submits to a different host ({act_host} vs {base_host})"
    return False, None
--- a/app/static/style.css
+++ b/app/static/style.css
@@ -279,6 +279,7 @@ details ul, details p {
  }
 }

+/* SCRIPTS TABLE */
 .scripts-table td ul {
  margin: 0.25rem 0 0.25rem 1rem;
  padding-left: 1rem;
@@ -305,6 +306,59 @@ details ul, details p {
  white-space: nowrap;
 }

+
+/* lists & small text inside cells */
+.forms-table td ul {
+  margin: 0.25rem 0 0.25rem 1rem;
+  padding-left: 1rem;
+}
+.forms-table td small {
+  opacity: 0.85;
+}
+
+/* keep the table from exploding */
+.forms-table {
+  table-layout: fixed;
+  width: 100%;
+}
+
+/* columns: Action | Method | Inputs | Matches | Form Snippet */
+.forms-table th:nth-child(1) { width: 15rem; } /* Action */
+.forms-table th:nth-child(2) { width: 5rem;  } /* Method */
+.forms-table th:nth-child(3) { width: 15rem; } /* Inputs */
+.forms-table th:nth-child(5) { width: 24rem; } /* Snippet */
+.forms-table th:nth-child(4) { width: auto;  } /* Matches grows */
+
+/* ellipsize cells by default */
+.forms-table td,
+.forms-table th {
+  overflow: hidden;
+  text-overflow: ellipsis;
+  white-space: nowrap;
+}
+
+/* nicer wrapping inside snippet/details & input chips */
+.forms-table details { white-space: normal; }
+.forms-table details > pre.code {
+  white-space: pre-wrap;     /* let long lines wrap */
+  max-height: 28rem;
+  overflow: auto;
+}
+.forms-table .chips {
+  display: flex;
+  gap: 0.25rem;
+  flex-wrap: wrap;
+  white-space: normal;       /* allow chip text to wrap if needed */
+}
+
+/* (optional) responsive tweaks */
+@media (max-width: 1200px) {
+  .forms-table th:nth-child(1) { width: 22rem; }
+  .forms-table th:nth-child(3) { width: 16rem; }
+  .forms-table th:nth-child(5) { width: 18rem; }
+}
+
+
 /* let URLs/snippets wrap *inside* their cell when expanded content shows */
 .breakable {
  white-space: normal;
--- a/app/templates/base.html
+++ b/app/templates/base.html
@@ -31,3 +31,6 @@
    </footer>
  </body>
 </html>
+
+{% block page_js %}
+{% endblock %}
--- a/app/templates/index.html
+++ b/app/templates/index.html
@@ -90,6 +90,9 @@
  100% { transform: rotate(360deg); }
 }
 </style>
+{% endblock %}
+
+{% block page_js %}

 <script>
 const form = document.getElementById('analyze-form');
--- a/app/templates/result.html
+++ b/app/templates/result.html
@@ -79,21 +79,6 @@
    {% endfor %}
    {% endif %}

-    <!-- BEC Words -->
-    {% if enrichment.bec_words %}
-    <h3>BEC Words Detected</h3>
-    <table class="enrichment-table">
-        <thead>
-            <tr><th>Word</th></tr>
-        </thead>
-        <tbody>
-            {% for word in enrichment.bec_words %}
-            <tr><td>{{ word }}</td></tr>
-            {% endfor %}
-        </tbody>
-    </table>
-    {% endif %}
-
    {% if not enrichment.whois and not enrichment.raw_whois and not enrichment.geoip and not enrichment.bec_words %}
    <p>No enrichment data available.</p>
    {% endif %}
@@ -129,90 +114,131 @@

 <!-- Forms -->
 <div class="card" id="forms">
-    <h2>Forms</h2>
-    {% if forms %}
-    {% for form in forms %}
-    <details class="card {% if form.flagged %}flagged{% endif %}" style="padding:0.5rem; margin-bottom:0.5rem;">
-        <summary>{{ form.status }} — Action: {{ form.action }} ({{ form.method | upper }})</summary>
-        <table class="enrichment-table">
-            <thead>
-                <tr>
-                    <th>Input Name</th>
-                    <th>Type</th>
-                </tr>
-            </thead>
-            <tbody>
-                {% for inp in form.inputs %}
-                <tr>
-                    <td>{{ inp.name }}</td>
-                    <td>{{ inp.type }}</td>
-                </tr>
+  <h2>Forms</h2>
+
+  {% if forms and forms|length > 0 %}
+    <table class="enrichment-table forms-table">
+      <thead>
+        <tr>
+          <th>Action</th>
+          <th>Method</th>
+          <th>Inputs</th>
+          <th>Matches (Rules)</th>
+          <th>Form Snippet</th>
+        </tr>
+      </thead>
+      <tbody>
+        {% for f in forms %}
+        <tr>
+          <!-- Action -->
+          <td class="breakable">
+            {% if f.action %}
+              {{ f.action[:25] }}{% if f.action|length > 25 %}…{% endif %}
+            {% else %}
+              <span class="text-dim">(no action)</span>
+            {% endif %}
+          </td>
+
+          <!-- Method -->
+          <td>{{ (f.method or 'get')|upper }}</td>
+
+          <!-- Inputs -->
+          <td>
+            {% if f.inputs and f.inputs|length > 0 %}
+              <div class="chips">
+                {% for inp in f.inputs %}
+                  <span class="chip" title="{{ (inp.name or '') ~ ' : ' ~ (inp.type or 'text') }}">
+                    {{ inp.name or '(unnamed)' }}<small> : {{ (inp.type or 'text') }}</small>
+                  </span>
                {% endfor %}
-            </tbody>
-        </table>
-        {% if form.flagged %}
-        <p><strong>Flag Reasons:</strong></p>
-        <ul>
-            {% for reason in form.flag_reasons %}
-            <li>{{ reason }}</li>
-            {% endfor %}
-        </ul>
-        {% endif %}
-    </details>
-    {% endfor %}
-    {% else %}
-    <p>No forms detected.</p>
-    {% endif %}
-    <p><a href="#top-jump-list">Back to top</a></p>
+              </div>
+            {% else %}
+              <span class="text-dim">None</span>
+            {% endif %}
+          </td>
+
+          <!-- Matches (Rules) -->
+          <td>
+            {% if f.rules and f.rules|length > 0 %}
+              <ul>
+                {% for r in f.rules %}
+                  <li title="{{ r.description or '' }}">
+                    {{ r.name }}
+                    {% if r.severity %}
+                      <span class="badge sev-{{ r.severity|lower }}">{{ r.severity|title }}</span>
+                    {% endif %}
+                    {% if r.tags %}
+                      {% for t in r.tags %}
+                        <span class="chip" title="Tag: {{ t }}">{{ t }}</span>
+                      {% endfor %}
+                    {% endif %}
+                    {% if r.description %}
+                      <small> — {{ r.description }}</small>
+                    {% endif %}
+                  </li>
+                {% endfor %}
+              </ul>
+            {% else %}
+              <span class="text-dim">N/A</span>
+            {% endif %}
+          </td>
+
+          <!-- Form Snippet -->
+          <td>
+            {% if f.content_snippet %}
+              <details>
+                <summary>View snippet ({{ f.content_snippet|length }} chars)</summary>
+                <pre class="code">{{ f.content_snippet }}</pre>
+              </details>
+            {% else %}
+              <span class="text-dim">N/A</span>
+            {% endif %}
+          </td>
+        </tr>
+        {% endfor %}
+      </tbody>
+    </table>
+  {% else %}
+    <p class="text-dim">No form issues detected.</p>
+  {% endif %}
+
+  <p><a href="#top-jump-list">Back to top</a></p>
 </div>

+
 <!-- Suspicious Scripts -->
 <div class="card" id="scripts">
  <h2>Suspicious Scripts</h2>

  {% if suspicious_scripts %}
  <table class="enrichment-table scripts-table">
-    <thead>
-      <tr>
-        <th>Type</th>
-        <th>Source URL</th>
-        <th>Content Snippet</th>
-        <th>Matches (Rules & Heuristics)</th>
-      </tr>
-    </thead>
-    <tbody>
-      {% for s in suspicious_scripts %}
-      <tr>
-        <!-- Type -->
-        <td>{{ s.type or 'unknown' }}</td>
+  <thead>
+    <tr>
+      <th>Type</th>
+      <th>Source URL</th>
+      <th>Matches (Rules & Heuristics)</th>
+      <th>Content Snippet</th>
+    </tr>
+  </thead>
+  <tbody>
+    {% for s in suspicious_scripts %}
+    <tr>
+      <!-- Type -->
+      <td>{{ s.type or 'unknown' }}</td>

-        <!-- Source URL -->
-        <td class="breakable">
-          {% if s.src %}
-            <a href="{{ s.src }}" target="_blank">{{ s.src[:50] }}</a>
-          {% else %}
-            N/A
-          {% endif %}
-        </td>
+      <!-- Source URL -->
+      <td class="breakable">
+        {% if s.src %}
+          <a href="{{ s.src }}" target="_blank" rel="noopener">{{ s.src[:50] }}</a>
+        {% else %} N/A {% endif %}
+      </td>

-        <!-- Inline content snippet (collapsible) -->
-        <td>
-          {% if s.content_snippet %}
-            <details>
-              <summary>View snippet ({{ s.content_snippet|length }} chars) </summary>
-              <pre class="code">({{ s.content_snippet}}</pre>
-            </details>
-          {% else %}
-            N/A
-          {% endif %}
-        </td>
+      <!-- Matches (Rules & Heuristics) -->
+      <td data-role="matches-cell">
+        {% set has_rules = s.rules and s.rules|length > 0 %}
+        {% set has_heur = s.heuristics and s.heuristics|length > 0 %}

-        <!-- Rules & Heuristics -->
-        <td>
-          {% set has_rules = s.rules and s.rules|length > 0 %}
-          {% set has_heur = s.heuristics and s.heuristics|length > 0 %}
-
-          {% if has_rules %}
+        {% if has_rules %}
          <strong>Rules</strong>
          <ul>
            {% for r in s.rules %}
@@ -234,23 +260,45 @@
          </ul>
        {% endif %}

-          {% if has_heur %}
-            <strong>Heuristics</strong>
-            <ul>
-              {% for h in s.heuristics %}
-                <li>{{ h }}</li>
-              {% endfor %}
-            </ul>
-          {% endif %}
+        {% if has_heur %}
+          <strong>Heuristics</strong>
+          <ul>
+            {% for h in s.heuristics %}
+              <li>{{ h }}</li>
+            {% endfor %}
+          </ul>
+        {% endif %}

-          {% if not has_rules and not has_heur %}
-            N/A
+        {% if not has_rules and not has_heur %}
+          <span class="text-dim">N/A</span>
+        {% endif %}
+      </td>
+
+      <!-- Content Snippet (reused for Analyze button / dynamic snippet) -->
+      <td data-role="snippet-cell">
+        {% if s.content_snippet %}
+          <details>
+            <summary>View snippet ({{ s.content_snippet|length }} chars)</summary>
+            <pre class="code">{{ s.content_snippet }}</pre>
+          </details>
+        {% else %}
+          {% if s.type == 'external' and s.src %}
+            <button
+              type="button"
+              class="btn btn-sm btn-primary btn-analyze-snippet"
+              data-url="{{ s.src }}"
+              data-job="{{ uuid }}">Analyze external script</button>
+          {% else %}
+            <span class="text-dim">N/A</span>
          {% endif %}
-        </td>
-      </tr>
-      {% endfor %}
+        {% endif %}
+      </td>
+    </tr>
+    {% endfor %}
    </tbody>
  </table>
+
+
  {% else %}
    <p>No suspicious scripts detected.</p>
  {% endif %}
@@ -269,8 +317,154 @@
 <!-- Source -->
 <div class="card" id="source">
    <h2>Source</h2>
-    <p><a href="{{ url_for('main.artifacts', run_uuid=uuid, filename='source.txt') }}" target="_blank">View Source</a></p>
+    <p><a href="{{ url_for('main.view_artifact', run_uuid=uuid, filename='source.html') }}" target="_blank">View Source</a></p>
    <p><a href="#top-jump-list">Back to top</a></p>
 </div>

+{% endblock %}
+
+{% block page_js %}
+<script>
+/**
+ * From an absolute artifact path like:
+ *   /data/<uuid>/scripts/fetched/0.js
+ *   /data/<uuid>/1755803694244.js
+ *   C:\data\<uuid>\1755803694244.js
+ * return { uuid, rel } where rel is the path segment(s) after the uuid.
+ */
+function parseArtifactPath(artifactPath) {
+  if (!artifactPath) return { uuid: null, rel: null };
+  const norm = String(artifactPath).replace(/\\/g, '/'); // windows -> posix
+  const re = /\/([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})\/(.+)$/;
+  const m = norm.match(re);
+  if (!m) return { uuid: null, rel: null };
+  return { uuid: m[1], rel: m[2] };
+}
+
+/** Build /view/artifact/<uuid>/<path:filename> */
+function buildViewerUrlFromAbsPath(artifactPath) {
+  const { uuid, rel } = parseArtifactPath(artifactPath);
+  if (!uuid || !rel) return '#';
+  const encodedRel = rel.split('/').map(encodeURIComponent).join('/');
+  return `/view/artifact/${encodeURIComponent(uuid)}/${encodedRel}`;
+}
+
+document.addEventListener('click', function (e) {
+  const btn = e.target.closest('.btn-analyze-snippet');
+  if (!btn) return;
+
+  const row = btn.closest('tr');
+  const snippetCell = btn.closest('[data-role="snippet-cell"]') || btn.parentElement;
+  const matchesCell = row ? row.querySelector('[data-role="matches-cell"]') : null;
+
+  const url = btn.dataset.url;
+  const job = btn.dataset.job;
+
+  // Replace button with a lightweight loading text
+  const loading = document.createElement('span');
+  loading.className = 'text-dim';
+  loading.textContent = 'Analyzing…';
+  btn.replaceWith(loading);
+
+  fetch('/api/analyze_script', {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' }, // include CSRF header if applicable
+    body: JSON.stringify({ job_id: job, url: url})
+  })
+  .then(r => r.json())
+  .then(data => {
+    if (!data.ok) {
+      loading.textContent = 'Error: ' + (data.error || 'Unknown');
+      return;
+    }
+
+    // --- Build the snippet details element ---
+    const snippetText = data.snippet || '';              // backend should return a preview
+    const snippetLen  = data.snippet_len || snippetText.length;
+    
+    // --- File path / viewer things
+    const filepath = data.artifact_path || ''; // e.g., "/data/3ec90584-076e-457c-924b-861be7e11a34/1755803694244.js"
+    const viewerUrl = buildViewerUrlFromAbsPath(filepath);
+
+
+    const details = document.createElement('details');
+    const summary = document.createElement('summary');
+    summary.textContent = 'View snippet (' + data.snippet_len + ' chars' +  (data.truncated ? ', truncated' : '') + ', ' + data.bytes + ' bytes)';
+
+    const pre = document.createElement('pre');
+    pre.className = 'code';
+    pre.textContent = snippetText; // textContent preserves literal code safely
+
+    // put things in the DOM
+    details.appendChild(summary);
+    details.appendChild(pre);
+
+    const link = document.createElement('a');
+    link.href = viewerUrl;
+    link.target = '_blank';
+    link.rel = 'noopener';
+    link.textContent = 'open in viewer';
+
+    summary.appendChild(document.createElement('br')); // line break under the summary text
+    summary.appendChild(link);
+
+    loading.replaceWith(details);
+
+    // Replace "Analyzing…" with the new details block
+    loading.replaceWith(details);
+
+    // --- Update the Matches cell with rule findings ---
+    if (matchesCell) {
+      if (Array.isArray(data.findings) && data.findings.length) {
+        const frag = document.createDocumentFragment();
+        const strong = document.createElement('strong');
+        strong.textContent = 'Rules';
+        const ul = document.createElement('ul');
+
+        data.findings.forEach(function (f) {
+          const li = document.createElement('li');
+          const name = f.name || 'Rule';
+          const desc = f.description ? ' — ' + f.description : '';
+          li.textContent = name + desc;
+
+          // Optional badges for severity/tags if present
+          if (f.severity) {
+            const badge = document.createElement('span');
+            badge.className = 'badge sev-' + String(f.severity).toLowerCase();
+            badge.textContent = String(f.severity).charAt(0).toUpperCase() + String(f.severity).slice(1);
+            li.appendChild(document.createTextNode(' '));
+            li.appendChild(badge);
+          }
+          if (Array.isArray(f.tags)) {
+            f.tags.forEach(function (t) {
+              const chip = document.createElement('span');
+              chip.className = 'chip';
+              chip.title = 'Tag: ' + t;
+              chip.textContent = t;
+              li.appendChild(document.createTextNode(' '));
+              li.appendChild(chip);
+            });
+          }
+
+          ul.appendChild(li);
+        });
+
+        frag.appendChild(strong);
+        frag.appendChild(ul);
+
+        // Replace placeholder N/A or existing heuristics-only content
+        matchesCell.innerHTML = '';
+        matchesCell.appendChild(frag);
+      } else {
+        matchesCell.innerHTML = '<span class="text-dim">No rule matches.</span>';
+      }
+    }
+  })
+  .catch(function (err) {
+    loading.textContent = 'Request failed: ' + err;
+  });
+});
+</script>
+
+
 {% endblock %}
--- a/app/templates/viewer.html
+++ b/app/templates/viewer.html
@@ -0,0 +1,111 @@
+{% extends "base.html" %}
+{% block content %}
+<div style="max-width:1100px;margin:0 auto;padding:1rem 1.25rem;">
+  <header style="display:flex;align-items:center;justify-content:space-between;gap:1rem;flex-wrap:wrap;">
+    <div>
+      <h2 style="margin:0;font-size:1.1rem;">Code Viewer</h2>
+      <div class="text-dim" style="font-size:0.9rem;">
+        <strong>File:</strong> <span id="fileName">{{ filename }}</span>
+      </div>
+    </div>
+    <div style="display:flex;gap:.5rem;align-items:center;">
+      <button id="copyBtn" class="btn btn-sm">Copy</button>
+      <button id="wrapBtn" class="btn btn-sm">Toggle wrap</button>
+      <a id="openRaw" class="btn btn-sm" href="{{ raw_url }}" target="_blank" rel="noopener">Open raw</a>
+      <a id="downloadRaw" class="btn btn-sm" href="{{ raw_url }}" download>Download</a>
+    </div>
+  </header>
+
+  <div id="viewerStatus" class="text-dim" style="margin:.5rem 0 .75rem;"></div>
+  <div id="editor" style="height:72vh;border:1px solid #1f2a36;border-radius:8px;"></div>
+</div>
+
+<!-- Monaco AMD loader (no integrity to avoid mismatch) -->
+<script src="https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/vs/loader.min.js"
+        crossorigin="anonymous" referrerpolicy="no-referrer"></script>
+
+<script>
+(function () {
+  const RAW_URL  = "{{ raw_url }}";
+  const FILENAME = "{{ filename }}";
+  const LANGUAGE = "{{ language|default('', true) }}";
+
+  const statusEl = document.getElementById('viewerStatus');
+
+  function extToLang(name) {
+    if (!name) return 'plaintext';
+    const m = name.toLowerCase().match(/\.([a-z0-9]+)$/);
+    const ext = m ? m[1] : '';
+    const map = {js:'javascript',mjs:'javascript',cjs:'javascript',ts:'typescript',json:'json',
+                 html:'html',htm:'html',css:'css',py:'python',sh:'shell',bash:'shell',
+                 yml:'yaml',yaml:'yaml',md:'markdown',txt:'plaintext',log:'plaintext'};
+    return map[ext] || 'plaintext';
+  }
+
+  // Wait until the AMD loader has defined window.require
+  function waitForRequire(msLeft = 5000) {
+    return new Promise((resolve, reject) => {
+      const t0 = performance.now();
+      (function poll() {
+        if (window.require && typeof window.require === 'function') return resolve();
+        if (performance.now() - t0 > msLeft) return reject(new Error('Monaco loader not available'));
+        setTimeout(poll, 25);
+      })();
+    });
+  }
+
+  function configureMonaco() {
+    // Point AMD loader at the CDN
+    require.config({ paths: { 'vs': 'https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/vs' } });
+    // Worker bootstrap
+    window.MonacoEnvironment = {
+      getWorkerUrl: function () {
+        const base = 'https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/';
+        const code = "self.MonacoEnvironment={baseUrl:'" + base + "'};importScripts('" + base + "vs/base/worker/workerMain.js');";
+        return 'data:text/javascript;charset=utf-8,' + encodeURIComponent(code);
+      }
+    };
+  }
+
+  async function main() {
+    try {
+      statusEl.textContent = 'Loading file…';
+      await waitForRequire();
+      configureMonaco();
+
+      const resp = await fetch(RAW_URL, { cache: 'no-store' });
+      const text = await resp.text();
+
+      require(['vs/editor/editor.main'], function () {
+        const editor = monaco.editor.create(document.getElementById('editor'), {
+          value: text,
+          language: LANGUAGE || extToLang(FILENAME),
+          readOnly: true,
+          automaticLayout: true,
+          wordWrap: 'on',
+          minimap: { enabled: false },
+          scrollBeyondLastLine: false,
+          theme: 'vs-dark'
+        });
+
+        // Buttons
+        document.getElementById('copyBtn')?.addEventListener('click', async () => {
+          try { await navigator.clipboard.writeText(editor.getValue()); statusEl.textContent = 'Copied.'; }
+          catch (e) { statusEl.textContent = 'Copy failed: ' + e; }
+        });
+        document.getElementById('wrapBtn')?.addEventListener('click', () => {
+          const opts = editor.getRawOptions();
+          editor.updateOptions({ wordWrap: opts.wordWrap === 'on' ? 'off' : 'on' });
+        });
+
+        statusEl.textContent = (resp.ok ? '' : `Warning: HTTP ${resp.status}`) + (text.length ? '' : ' (empty file)');
+      });
+    } catch (err) {
+      statusEl.textContent = 'Viewer error: ' + err.message;
+    }
+  }
+
+  main();
+})();
+</script>
+{% endblock %}
--- a/app/utils/browser.py
+++ b/app/utils/browser.py
@@ -33,7 +33,7 @@ from flask import current_app
 from playwright.async_api import async_playwright, TimeoutError as PWTimeoutError

 from app.utils.io_helpers import safe_write
-from app.enrichment import enrich_url
+from app.utils.enrichment import enrich_url
 from app.utils.settings import get_settings

 # Load settings once for constants / defaults
@@ -202,85 +202,111 @@ class Browser:
    # -----------------------------------------------------------------------
    # Form & Script analysis (plumbing only; detection is in the rules engine)
    # -----------------------------------------------------------------------
-    def analyze_forms(self, html: str, base_url: str) -> List[Dict[str, Any]]:
+    def analyze_forms(self, html: str, base_url: str = "") -> List[Dict[str, Any]]:
        """
-        Parse forms from the page HTML and apply rule-based checks (engine), keeping
-        only simple plumbing heuristics here (no security logic).
+        Collect form artifacts and evaluate per-form matches via the rules engine.
+        Only include rows that matched at least one rule.

-        Returns list of dicts with keys:
-          - action, method, inputs
-          - flagged (bool), flag_reasons (list[str]), status (str)
-          - rule_checks: {'checks': [...], 'summary': {...}} (per-form snippet evaluation)
+        Returns list of dicts with keys (per matched form):
+        - type: "form"
+        - action, method, inputs
+        - content_snippet: str
+        - rules: List[{name, description, severity?, tags?}]
        """
        soup = BeautifulSoup(html, "lxml")
-        forms_info: List[Dict[str, Any]] = []
-        page_hostname = urlparse(base_url).hostname
+        results: List[Dict[str, Any]] = []
+
+        engine = self._get_rule_engine()
+        base_hostname = urlparse(base_url).hostname or ""
+        # Match how scripts picks preview len
+        try:
+            preview_len = getattr(settings.ui, "snippet_preview_len", 200)  # keep parity with scripts
+        except Exception:
+            preview_len = 200

        for form in soup.find_all("form"):
-            action = form.get("action")
-            method = form.get("method", "get").lower()
+            try:
+                action = (form.get("action") or "").strip()
+                method = (form.get("method") or "get").strip().lower()

-            inputs: List[Dict[str, Any]] = []
-            for inp in form.find_all("input"):
-                input_name = inp.get("name")
-                input_type = inp.get("type", "text")
-                inputs.append({"name": input_name, "type": input_type})
+                inputs: List[Dict[str, Any]] = []
+                for inp in form.find_all("input"):
+                    inputs.append({
+                        "name": inp.get("name"),
+                        "type": (inp.get("type") or "text").strip().lower(),
+                    })

-            flagged_reasons: List[str] = []
+                # Use the actual form markup for regex rules
+                form_markup = str(form)
+                # UI-friendly snippet
+                content_snippet = form_markup[:preview_len]

-            if not action or str(action).strip() == "":
-                flagged_reasons.append("No action specified")
-            else:
+                matches: List[Dict[str, Any]] = []
+                if engine is not None:
+                    for r in getattr(engine, "rules", []):
+                        if getattr(r, "category", None) != "form":
+                            continue
+                        rtype = getattr(r, "rule_type", None)
+
+                        try:
+                            ok = False
+                            reason = ""
+                            if rtype == "regex":
+                                # Run against the raw form HTML
+                                ok, reason = r.run(form_markup)
+                            elif rtype == "function":
+                                # Structured facts for function-style rules
+                                facts = {
+                                    "category": "form",
+                                    "base_url": base_url,
+                                    "base_hostname": base_hostname,
+                                    "action": action,
+                                    "action_hostname": urlparse(action).hostname or "",
+                                    "method": method,
+                                    "inputs": inputs,
+                                    "markup": form_markup,
+                                }
+                                ok, reason = r.run(facts)
+                            else:
+                                continue
+
+                            if ok:
+                                matches.append({
+                                    "name": getattr(r, "name", "unknown_rule"),
+                                    "description": (reason or "") or getattr(r, "description", ""),
+                                    "severity": getattr(r, "severity", None),
+                                    "tags": getattr(r, "tags", None),
+                                })
+                        except Exception as rule_exc:
+                            # Be defensive—bad rule shouldn't break the form pass
+                            try:
+                                self.logger.debug("Form rule error", extra={"rule": getattr(r, "name", "?"), "error": str(rule_exc)})
+                            except Exception:
+                                pass
+                            continue
+
+                if matches:
+                    results.append({
+                        "type": "form",
+                        "action": action,
+                        "method": method,
+                        "inputs": inputs,
+                        "content_snippet": content_snippet,
+                        "rules": matches,
+                    })
+
+            except Exception as exc:
+                # Keep analysis resilient
                try:
-                    action_host = urlparse(action).hostname
-                    if not str(action).startswith("/") and action_host != page_hostname:
-                        flagged_reasons.append("Submits to a different host")
+                    self.logger.error("Form analysis error", extra={"error": str(exc)})
                except Exception:
                    pass
+                results.append({
+                    "type": "form",
+                    "heuristics": [f"Form analysis error: {exc}"],
+                })

-            try:
-                if urlparse(action).scheme == "http" and urlparse(base_url).scheme == "https":
-                    flagged_reasons.append("Submits over insecure HTTP")
-            except Exception:
-                pass
-
-            for hidden in form.find_all("input", type="hidden"):
-                name_value = hidden.get("name") or ""
-                if "password" in name_value.lower():
-                    flagged_reasons.append("Hidden password field")
-
-            flagged = bool(flagged_reasons)
-
-            # Serialize a simple form snippet for rule category='form'
-            snippet_lines = []
-            snippet_lines.append(f"base_url={base_url}")
-            snippet_lines.append(f"base_hostname={page_hostname}")
-            snippet_lines.append(f"action={action}")
-            snippet_lines.append(f"method={method}")
-            snippet_lines.append("inputs=")
-
-            i = 0
-            n = len(inputs)
-            while i < n:
-                item = inputs[i]
-                snippet_lines.append(f"  - name={item.get('name')} type={item.get('type')}")
-                i = i + 1
-            form_snippet = "\n".join(snippet_lines)
-
-            # Per-form rule checks (PASS/FAIL list via engine)
-            rule_checks = self.run_rule_checks(form_snippet, category="form")
-
-            forms_info.append({
-                "action": action,
-                "method": method,
-                "inputs": inputs,
-                "flagged": flagged,
-                "flag_reasons": flagged_reasons,
-                "status": "flagged" if flagged else "possibly safe",
-                "rule_checks": rule_checks
-            })
-
-        return forms_info
+        return results

    def analyze_scripts(self, html: str, base_url: str = "") -> List[Dict[str, Any]]:
        """
@@ -370,7 +396,7 @@ class Browser:

        Writes:
          - /data/<uuid>/screenshot.png
-          - /data/<uuid>/source.txt
+          - /data/<uuid>/source.html
          - /data/<uuid>/results.json  (single source of truth for routes)

        Returns:
@@ -381,7 +407,7 @@ class Browser:
        run_dir.mkdir(parents=True, exist_ok=True)

        screenshot_path = run_dir / "screenshot.png"
-        source_path = run_dir / "source.txt"
+        source_path = run_dir / "source.html"
        results_path = run_dir / "results.json"

        redirects: List[Dict[str, Any]] = []
--- a/app/utils/enrichment.py
+++ b/app/utils/enrichment.py
@@ -9,8 +9,8 @@ from ipaddress import ip_address
 import socket

 # Local imports
-from .utils.cache_db import get_cache
-from .utils.settings import get_settings
+from app.utils.cache_db import get_cache
+from app.utils.settings import get_settings

 # Configure logging
 logging.basicConfig(level=logging.INFO, format="[%(levelname)s] %(message)s")
@@ -39,9 +39,6 @@ def enrich_url(url: str) -> dict:
    # --- GeoIP ---
    result["geoip"] = enrich_geoip(hostname)

-    # --- BEC Words ---
-    result["bec_words"] = [w for w in BEC_WORDS if w.lower() in url.lower()]
-
    return result


--- a/app/utils/external_fetcher.py
+++ b/app/utils/external_fetcher.py
@@ -0,0 +1,338 @@
+# sneakyscope/app/utils/external_fetch.py
+import hashlib
+import os
+import logging
+from dataclasses import dataclass
+from typing import Optional, Tuple, List
+from urllib.parse import urljoin, urlparse
+
+import requests
+
+from app.utils.settings import get_settings
+
+settings = get_settings()
+
+_ALLOWED_SCHEMES = {"http", "https"}
+
+
+@dataclass
+class FetchResult:
+    """
+    Outcome for a single external script fetch.
+    """
+    ok: bool
+    reason: str
+    source_url: str
+    final_url: str
+    status_code: Optional[int]
+    content_type: Optional[str]
+    bytes_fetched: int
+    truncated: bool
+    sha256_hex: Optional[str]
+    saved_path: Optional[str]
+
+
+class ExternalScriptFetcher:
+    """
+    Minimal, safe-by-default fetcher for external JS files.
+
+    Notes / assumptions:
+      - All artifacts for this run live under the UUID-backed `results_path` you pass in.
+      - Saves bytes to: <results_path>/<index>.js
+      - Manual redirects up to `max_redirects`.
+      - Streaming with a hard byte cap derived from `max_total_mb`.
+      - Never raises network exceptions to callers; failures are encoded in FetchResult.
+      - Settings are read from get_settings()['external_script_fetch'] with sane defaults.
+    """
+
+    def __init__(self, results_path: str, session: Optional[requests.Session] = None):
+        """
+        Args:
+            results_path: Absolute path to the run's UUID directory (e.g., /data/<run_uuid>).
+            session: Optional requests.Session to reuse connections; a new one is created if not provided.
+        """
+        # Derived value: MiB -> bytes
+        self.max_total_bytes: int = settings.external_fetch.max_total_mb * 1024 * 1024
+
+        # Logger
+        self.logger = logging.getLogger(__file__)
+
+        # Where to write artifacts for this job/run (UUID directory)
+        self.results_path = results_path
+
+        # HTTP session with a predictable UA
+        self.session = session or requests.Session()
+        self.session.headers.update({"User-Agent": "SneakyScope/1.0"})
+
+    # -------------------------
+    # Internal helper methods
+    # -------------------------
+
+    def _timeout(self) -> Tuple[float, float]:
+        """
+        Compute (connect_timeout, read_timeout) in seconds from max_time_ms.
+        Keeps a conservative split so either phase gets a fair chance.
+        """
+        total = max(0.1, settings.external_fetch.max_time_ms / 1000.0)
+        connect = min(1.5, total * 0.5)  # cap connect timeout
+        read = max(0.5, total * 0.5)     # floor read timeout
+        return (connect, read)
+
+    def _scheme_allowed(self, url: str) -> bool:
+        """
+        Return True if URL uses an allowed scheme (http/https).
+        """
+        scheme = (urlparse(url).scheme or "").lower()
+        return scheme in _ALLOWED_SCHEMES
+
+    def _artifact_path(self, index: int) -> str:
+        """
+        Build an output path like:
+            <results_path>/<index>.js
+
+        Ensures the directory exists.
+        """
+        base_dir = os.path.join(self.results_path)
+        # Make sure parent directories exist (idempotent)
+        os.makedirs(base_dir, exist_ok=True)
+        filename = f"{index}.js"
+        return os.path.join(base_dir, filename)
+
+    # -------------------------
+    # Public API
+    # -------------------------
+
+    def fetch_one(self, script_url: str, index: int) -> FetchResult:
+        """
+        Fetch exactly one external script with manual redirect handling and a hard per-file byte cap.
+
+        Args:
+            script_url: The script URL to retrieve.
+            index:     Numeric index used solely for naming the artifact file (<index>.js).
+
+        Returns:
+            FetchResult with status, metadata, and saved path (if successful).
+        """
+        # Feature gate: allow callers to rely on a consistent failure when globally disabled.
+        if not settings.external_fetch.enabled:
+            return FetchResult(
+                ok=False,
+                reason="Feature disabled",
+                source_url=script_url,
+                final_url=script_url,
+                status_code=None,
+                content_type=None,
+                bytes_fetched=0,
+                truncated=False,
+                sha256_hex=None,
+                saved_path=None,
+            )
+
+        # Scheme guard: refuse anything not http/https in this v1.
+        if not self._scheme_allowed(script_url):
+            return FetchResult(
+                ok=False,
+                reason="Scheme not allowed",
+                source_url=script_url,
+                final_url=script_url,
+                status_code=None,
+                content_type=None,
+                bytes_fetched=0,
+                truncated=False,
+                sha256_hex=None,
+                saved_path=None,
+            )
+
+        current_url = script_url
+        status_code: Optional[int] = None
+        content_type: Optional[str] = None
+        redirects_followed = 0
+
+        # Manual redirect loop so we can enforce max_redirects precisely.
+        while True:
+            try:
+                resp = self.session.get(
+                    current_url,
+                    stream=True,
+                    allow_redirects=False,
+                    timeout=self._timeout(),
+                )
+            except requests.exceptions.Timeout:
+                return FetchResult(
+                    ok=False,
+                    reason="Timeout",
+                    source_url=script_url,
+                    final_url=current_url,
+                    status_code=status_code,
+                    content_type=content_type,
+                    bytes_fetched=0,
+                    truncated=False,
+                    sha256_hex=None,
+                    saved_path=None,
+                )
+            except requests.exceptions.RequestException as e:
+                return FetchResult(
+                    ok=False,
+                    reason=f"Network error: {e.__class__.__name__}",
+                    source_url=script_url,
+                    final_url=current_url,
+                    status_code=status_code,
+                    content_type=content_type,
+                    bytes_fetched=0,
+                    truncated=False,
+                    sha256_hex=None,
+                    saved_path=None,
+                )
+
+            status_code = resp.status_code
+            content_type = resp.headers.get("Content-Type")
+
+            # Handle redirects explicitly (3xx with Location)
+            if status_code in (301, 302, 303, 307, 308) and "Location" in resp.headers:
+                if redirects_followed >= settings.external_fetch.max_redirects:
+                    return FetchResult(
+                        ok=False,
+                        reason="Max redirects exceeded",
+                        source_url=script_url,
+                        final_url=current_url,
+                        status_code=status_code,
+                        content_type=content_type,
+                        bytes_fetched=0,
+                        truncated=False,
+                        sha256_hex=None,
+                        saved_path=None,
+                    )
+                next_url = urljoin(current_url, resp.headers["Location"])
+                if not self._scheme_allowed(next_url):
+                    return FetchResult(
+                        ok=False,
+                        reason="Redirect to disallowed scheme",
+                        source_url=script_url,
+                        final_url=next_url,
+                        status_code=status_code,
+                        content_type=content_type,
+                        bytes_fetched=0,
+                        truncated=False,
+                        sha256_hex=None,
+                        saved_path=None,
+                    )
+                current_url = next_url
+                redirects_followed += 1
+                # Loop to follow next hop
+                continue
+
+            # Not a redirect: stream response body with a hard byte cap.
+            cap = self.max_total_bytes
+            total = 0
+            truncated = False
+            chunks: List[bytes] = []
+
+            try:
+                for chunk in resp.iter_content(chunk_size=8192):
+                    if not chunk:
+                        # Skip keep-alive chunks
+                        continue
+                    new_total = total + len(chunk)
+                    if new_total > cap:
+                        # Only keep what fits and stop
+                        remaining = cap - total
+                        if remaining > 0:
+                            chunks.append(chunk[:remaining])
+                            total += remaining
+                        truncated = True
+                        break
+                    chunks.append(chunk)
+                    total = new_total
+            except requests.exceptions.Timeout:
+                return FetchResult(
+                    ok=False,
+                    reason="Timeout while reading",
+                    source_url=script_url,
+                    final_url=current_url,
+                    status_code=status_code,
+                    content_type=content_type,
+                    bytes_fetched=total,
+                    truncated=truncated,
+                    sha256_hex=None,
+                    saved_path=None,
+                )
+            except requests.exceptions.RequestException as e:
+                return FetchResult(
+                    ok=False,
+                    reason=f"Network error while reading: {e.__class__.__name__}",
+                    source_url=script_url,
+                    final_url=current_url,
+                    status_code=status_code,
+                    content_type=content_type,
+                    bytes_fetched=total,
+                    truncated=truncated,
+                    sha256_hex=None,
+                    saved_path=None,
+                )
+
+            data = b"".join(chunks)
+            if not data:
+                return FetchResult(
+                    ok=False,
+                    reason="Empty response",
+                    source_url=script_url,
+                    final_url=current_url,
+                    status_code=status_code,
+                    content_type=content_type,
+                    bytes_fetched=0,
+                    truncated=False,
+                    sha256_hex=None,
+                    saved_path=None,
+                )
+
+            # Persist to <results_path>/<index>.js
+            out_path = self._artifact_path(index)
+            try:
+                with open(out_path, "wb") as f:
+                    f.write(data)
+            except OSError as e:
+                return FetchResult(
+                    ok=False,
+                    reason=f"Write error: {e.__class__.__name__}",
+                    source_url=script_url,
+                    final_url=current_url,
+                    status_code=status_code,
+                    content_type=content_type,
+                    bytes_fetched=total,
+                    truncated=truncated,
+                    sha256_hex=None,
+                    saved_path=None,
+                )
+
+            sha256_hex = hashlib.sha256(data).hexdigest()
+
+            # Structured log line for visibility/metrics
+            try:
+                self.logger.info(
+                    "External script fetched",
+                    extra={
+                        "source_url": script_url,
+                        "final_url": current_url,
+                        "status": status_code,
+                        "bytes": total,
+                        "truncated": truncated,
+                        "sha256": sha256_hex,
+                        "saved_path": out_path,
+                    },
+                )
+            except Exception:
+                # Logging should never break the pipeline
+                pass
+
+            return FetchResult(
+                ok=True,
+                reason="OK",
+                source_url=script_url,
+                final_url=current_url,
+                status_code=status_code,
+                content_type=content_type,
+                bytes_fetched=total,
+                truncated=truncated,
+                sha256_hex=sha256_hex,
+                saved_path=out_path,
+            )
--- a/app/utils/settings.py
+++ b/app/utils/settings.py
@@ -39,6 +39,14 @@ BASE_DIR = Path(__file__).resolve().parent.parent
 DEFAULT_SETTINGS_FILE = BASE_DIR / "config" / "settings.yaml"

 # ---------- CONFIG DATA CLASSES ----------
+@dataclass
+class External_FetchConfig:
+    enabled: bool = True
+    max_total_mb: int = 5 
+    max_time_ms: int = 3000 
+    max_redirects: int = 3 
+    concurrency: int = 3 
+
@dataclass
 class UIConfig:
    snippet_preview_len: int = 160
@@ -61,6 +69,7 @@ class AppConfig:
 class Settings:
    cache: Cache_Config = field(default_factory=Cache_Config)
    ui: UIConfig = field(default_factory=UIConfig)
+    external_fetch: External_FetchConfig = field(default_factory=External_FetchConfig)
    app: AppConfig = field(default_factory=AppConfig)

    @classmethod
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -1,23 +1,18 @@
 # SneakyScope — Roadmap (Updated 8-21-25)

 ## Priority 1 – Core Analysis / Stability
-
-* Opt-in “fetch external scripts” mode (off by default): on submission, download external script content (size/time limits) and run rules on fetched content.
-* Remove remaining legacy form “flagged\_reasons” plumbing once all equivalent function rules are in place.
-* Unit tests: YAML compilation, function-rule adapters, and per-script/per-form rule cases.
 * SSL/TLS intelligence: for HTTPS targets, pull certificate details from crt.sh (filtering expired); if a subdomain, also resolve the root domain to capture any wildcard certificates; probe the endpoint to enumerate supported TLS versions/ciphers and flag weak/legacy protocols.

 ## Priority 2 – API Layer

 * API endpoints: `/screenshot`, `/source`, `/analyse`.
-* OpenAPI spec: create `openapi/openapi.yaml` and serve at `/api/openapi.yaml`.
+* **OpenAPI**: add `POST /api/analyze_script` (request/response schemas, examples) to `openapi/openapi.yaml`; serve at `/api/openapi.yaml`.
 * Docs UI: Swagger UI or Redoc at `/docs`.
+* (Nice-to-have) API JSON error consistency: handlers for 400/403/404/405/500 that always return JSON.

 ## Priority 3 – UI / UX

 * Front page/input handling: auto-prepend `http://`/`https://`/`www.` for bare domains.
-* Source code viewer: embed page source in an editor view for readability.
-* Scripts table: toggle between “Only suspicious” and “All scripts”.
 * Rules Lab (WYSIWYG tester): paste a rule, validate/compile, run against sample text; lightweight nav entry.

 ## Priority 4 – Artifact Management & Ops
@@ -33,6 +28,6 @@
 * Domain reputation (local feeds): build and refresh a consolidated domain/URL reputation store from URLHaus database dump and OpenPhish community dataset (scheduled pulls with dedup/normalize).
 * Threat intel connectors (settings-driven): add `settings.yaml` entries for VirusTotal and ThreatFox API keys (plus future providers); when present, enrich lookups and merge results into the unified reputation checks during analysis.

-## Backlog / Far‑Off Plans
+## Backlog / Far-Off Plans

 * Server profile scan: run a lightweight nmap service/banner scan on common web/alt ports (80, 443, 8000, 8080, 8443, etc.) and SSH; combine with server headers to infer stack (e.g., IIS vs. Linux/\*nix).