feat: on-demand external script analysis + code viewer; refactor form analysis to rule engine

- API: add `POST /api/analyze_script` (app/blueprints/api.py)
  - Fetch one external script to artifacts, run rules, return findings + snippet
  - Uses new ExternalScriptFetcher (results_path aware) and job UUID
  - Returns: { ok, final_url, status_code, bytes, truncated, sha256, artifact_path, findings[], snippet, snippet_len }
  - TODO: document in openapi/openapi.yaml

- Fetcher: update `app/utils/external_fetch.py`
  - Constructed with `results_path` (UUID dir); writes to `<results_path>/scripts/fetched/<index>.js`
  - Loads settings via `get_settings()`, logs via std logging

- UI (results.html):
  - Move “Analyze external script” action into **Content Snippet** column for external rows
  - Clicking replaces button with `<details>` snippet, shows rule matches, and adds “open in viewer” link
  - Robust fetch handler (checks JSON, shows errors); builds viewer URL from absolute artifact path

- Viewer:
  - New route: `GET /view/artifact/<run_uuid>/<path:filename>` (app/blueprints/ui.py)
  - New template: Monaco-based read-only code viewer (viewer.html)
  - Removes SRI on loader to avoid integrity block; loads file via `raw_url` and detects language by extension

- Forms:
  - Refactor `analyze_forms` to mirror scripts analysis:
    - Uses rule engine (`category == "form"`) across regex/function rules
    - Emits rows only when matches exist
    - Includes `content_snippet`, `action`, `method`, `inputs`, `rules`
  - Replace legacy plumbing (`flagged`, `flag_reasons`, `status`) in output
  - Normalize form function rules to canonical returns `(bool, Optional[str])`:
    - `form_action_missing`
    - `form_http_on_https_page`
    - `form_submits_to_different_host`
    - Add minor hardening (lowercasing hosts, no-op actions, clearer reasons)

- CSS: add `.forms-table` to mirror `.scripts-table` (5 columns)
  - Fixed table layout, widths per column, chip/snippet styling, responsive tweaks

- Misc:
  - Fix “working outside app context” issue by avoiding `current_app` at import time (left storage logic inside routes)
  - Add “View Source” link to open page source in viewer

Refs:
- Roadmap: mark “Source code viewer” done; keep TODO to add `/api/analyze_script` to OpenAPI
This commit is contained in:
2025-08-21 15:32:24 -05:00
parent 05cf23ad67
commit 3a24b392f2
15 changed files with 1192 additions and 218 deletions

View File

@@ -19,7 +19,8 @@ from .rules.function_rules import (
form_action_missing,
)
from . import routes # blueprint
from app.blueprints import ui # ui blueprint
from app.blueprints import api # api blueprint
# from .utils import io_helpers # if need logging/setup later
# from .utils import cache_db # available for future injections
@@ -136,7 +137,8 @@ def create_app() -> Flask:
app.config["APP_VERSION"] = f"v{settings.app.version_major}.{settings.app.version_minor}"
# Register blueprints
app.register_blueprint(routes.bp)
app.register_blueprint(ui.bp)
app.register_blueprint(api.api_bp)
# Example log lines so we know we booted cleanly
app.logger.info(f"SneakyScope started: {app.config['APP_NAME']} {app.config['APP_VERSION']}")

212
app/blueprints/api.py Normal file
View File

@@ -0,0 +1,212 @@
# app/blueprints/api.py
"""
API blueprint for JSON endpoints.
Endpoints:
POST /api/analyze_script
Body:
{
"job_id": "<uuid>", # or "uuid": "<uuid>"
"url": "https://cdn.example.com/app.js",
"category": "script" # optional, defaults to "script"
}
Response:
{
"ok": true,
"final_url": "...",
"status_code": 200,
"bytes": 12345,
"truncated": false,
"sha256": "...",
"artifact_path": "/abs/path/to/<uuid>/scripts/fetched/<index>.js",
"findings": [ { "name": "...", "description": "...", "severity": "...", "tags": [...], "reason": "..." }, ... ],
"snippet": "<trimmed content>",
"snippet_len": 45678
}
"""
import os
import time
from flask import Blueprint, request, jsonify, current_app, send_file, abort
from pathlib import Path
from app.utils.settings import get_settings
from app.utils.external_fetcher import ExternalScriptFetcher
from werkzeug.exceptions import HTTPException
api_bp = Blueprint("api", __name__, url_prefix="/api")
def _resolve_results_path(job_id: str) -> str:
"""
Compute the absolute results directory for a given job UUID.
Prefers <BASE>/artifacts/<uuid>, falls back to <BASE>/<uuid>.
"""
base_dir = "/data"
candidate_with_artifacts = os.path.join(base_dir, job_id)
if os.path.isdir(candidate_with_artifacts):
return candidate_with_artifacts
fallback = os.path.join(base_dir, job_id)
os.makedirs(fallback, exist_ok=True)
return fallback
def _make_snippet(text: str, max_chars: int = 1200) -> str:
"""Produce a trimmed, safe-to-render snippet of the script contents."""
if not text:
return ""
snippet = text.strip()
return (snippet[:max_chars] + "") if len(snippet) > max_chars else snippet
@api_bp.errorhandler(400)
@api_bp.errorhandler(403)
@api_bp.errorhandler(404)
@api_bp.errorhandler(405)
def _api_err(err):
"""
Return JSON for common client errors.
"""
if isinstance(err, HTTPException):
code = err.code
name = (err.name or "error").lower()
else:
code = 400
name = "error"
return jsonify({"ok": False, "error": name}), code
@api_bp.errorhandler(500)
def _api_500(err):
"""
Return JSON for server errors and log the exception.
"""
try:
current_app.logger.exception("API 500")
except Exception:
pass
return jsonify({"ok": False, "error": "internal server error"}), 500
@api_bp.post("/analyze_script")
def analyze_script():
"""
Analyze EXACTLY one external script URL for a given job UUID.
Expected JSON body:
{ "job_id": "<uuid>", "url": "https://cdn.example.com/app.js", "category": "script" }
"""
body = request.get_json(silent=True) or {}
job_id_raw = body.get("job_id") or body.get("uuid")
script_url_raw = body.get("url")
category = (body.get("category") or "script").strip() or None # default to "script"
job_id = (job_id_raw or "").strip() if isinstance(job_id_raw, str) else ""
script_url = (script_url_raw or "").strip() if isinstance(script_url_raw, str) else ""
# log this request
current_app.logger.info(f"Got request to analyze {script_url} via API ")
if not job_id or not script_url:
return jsonify({"ok": False, "error": "Missing job_id (or uuid) or url"}), 400
settings = get_settings()
if not settings.external_fetch.enabled:
return jsonify({"ok": False, "error": "Feature disabled"}), 400
# Resolve the UUID-backed results directory for this run.
results_path = _resolve_results_path(job_id)
# Initialize the fetcher; it reads its own settings internally.
fetcher = ExternalScriptFetcher(results_path=results_path)
# Unique index for the saved file name: <results_path>/scripts/fetched/<index>.js
unique_index = int(time.time() * 1000)
outcome = fetcher.fetch_one(script_url=script_url, index=unique_index)
if not outcome.ok or not outcome.saved_path:
return jsonify({
"ok": False,
"error": outcome.reason,
"status_code": outcome.status_code,
"final_url": outcome.final_url
}), 502
# Read bytes and decode to UTF-8 for rules and snippet
try:
with open(outcome.saved_path, "rb") as fh:
js_text = fh.read().decode("utf-8", errors="ignore")
except Exception:
js_text = ""
# Pull the rules engine from the app (prefer attribute, then config).
findings = []
try:
engine = getattr(current_app, "rule_engine", None)
if engine is None:
engine = current_app.config.get("RULE_ENGINE")
except Exception:
engine = None
if engine is not None and hasattr(engine, "run_all"):
try:
# run_all returns PASS/FAIL for each rule; we only surface FAIL (matched) to the UI
all_results = engine.run_all(js_text, category=category)
if isinstance(all_results, list):
matched = []
for r in all_results:
try:
if (r.get("result") == "FAIL"):
matched.append({
"name": r.get("name"),
"description": r.get("description"),
"severity": r.get("severity"),
"tags": r.get("tags") or [],
"reason": r.get("reason"),
"category": r.get("category"),
})
except Exception:
# Ignore malformed entries
continue
findings = matched
except Exception as exc:
try:
current_app.logger.error("Rule engine error", extra={"error": str(exc)})
except Exception:
pass
findings = []
snippet = _make_snippet(js_text, max_chars=settings.ui.snippet_preview_len)
return jsonify({
"ok": True,
"final_url": outcome.final_url,
"status_code": outcome.status_code,
"bytes": outcome.bytes_fetched,
"truncated": outcome.truncated,
"sha256": outcome.sha256_hex,
"artifact_path": outcome.saved_path,
"findings": findings, # only FAILed rules
"snippet": snippet,
"snippet_len": len(js_text)
})
@api_bp.get("/artifacts/<run_uuid>/<filename>")
def get_artifact_raw(run_uuid, filename):
# prevent path traversal
if "/" in filename or ".." in filename:
abort(400)
run_dir = _resolve_results_path(run_uuid)
full_path = Path(run_dir) / filename
# if file is not there, give a 404
if not os.path.isfile(full_path):
abort(404)
# else return file
return send_file(full_path, as_attachment=False)

View File

@@ -1,3 +1,5 @@
# app/blueprints/ui.py
import os
import json
import asyncio
@@ -5,11 +7,10 @@ from pathlib import Path
from datetime import datetime
from flask import Blueprint, render_template, request, redirect, url_for, flash, current_app, send_file, abort
# from .browser import fetch_page_artifacts
from .utils.browser import get_browser
from .utils.enrichment import enrich_url
from .utils.settings import get_settings
from .utils.io_helpers import get_recent_results
from app.utils.browser import get_browser
from app.utils.enrichment import enrich_url
from app.utils.settings import get_settings
from app.utils.io_helpers import get_recent_results
bp = Blueprint("main", __name__)
@@ -34,9 +35,6 @@ def index():
The number of recent runs is controlled via settings.cache.recent_runs_count (int).
Falls back to 10 if not present or invalid.
"""
# Resolve SANDBOX_STORAGE from app config
storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
# Pull recent count from settings with a safe fallback
try:
# settings is already initialized at module import in your file
@@ -46,13 +44,15 @@ def index():
except Exception:
recent_count = 10
# Resolve SANDBOX_STORAGE from app config
storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
# Build the recent list (non-fatal if storage is empty or unreadable)
recent_results = get_recent_results(storage, recent_count, current_app.logger)
# Pass to template; your index.html will hide the card if list is empty
return render_template("index.html", recent_results=recent_results)
@bp.route("/analyze", methods=["POST"])
def analyze():
url = request.form.get("url", "").strip()
@@ -87,6 +87,7 @@ def analyze():
@bp.route("/results/<run_uuid>", methods=["GET"])
def view_result(run_uuid: str):
# Resolve SANDBOX_STORAGE from app config
storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
run_dir = storage / run_uuid
results_path = run_dir / "results.json"
@@ -105,6 +106,7 @@ def view_result(run_uuid: str):
@bp.route("/artifacts/<run_uuid>/<filename>", methods=["GET"])
def artifacts(run_uuid: str, filename: str):
# Resolve SANDBOX_STORAGE from app config
storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
run_dir = storage / run_uuid
full_path = run_dir / filename
@@ -123,3 +125,11 @@ def artifacts(run_uuid: str, filename: str):
return send_file(full_path)
@bp.get("/view/artifact/<run_uuid>/<filename>")
def view_artifact(run_uuid, filename):
# Build a safe raw URL that streams the file (you said you already have this route)
raw_url = url_for('api.get_artifact_raw', run_uuid=run_uuid, filename=filename)
# Optional: derive language server-side if you prefer
language = None # e.g., 'javascript'
return render_template('viewer.html', filename=filename, raw_url=raw_url, language=language)

View File

@@ -8,5 +8,11 @@ cache:
whois_cache_days: 7
geoip_cache_days: 7
external_script_fetch:
enabled: True
max_total_mb: 5
max_time_ms: 3000
max_redirects: 3
ui:
snippet_preview_len: 300

View File

@@ -22,6 +22,7 @@ from __future__ import annotations
from typing import Any, Dict, Optional
from urllib.parse import urlparse
_NOOP_ACTIONS = {"", "#", "javascript:void(0)", "javascript:void(0);"}
# ---------------------------------------------------------------------------
# Adapters
@@ -169,35 +170,48 @@ def script_third_party_host(facts: Dict[str, Any]):
# ---------------- Form rules ----------------
def form_submits_to_different_host(facts: Dict[str, Any]):
"""Flags <form> actions that submit to a different hostname than the page."""
base_host = facts.get("base_hostname") or ""
action = facts.get("action") or ""
try:
action_host = urlparse(action).hostname
if action_host and base_host and action_host != base_host:
return True, "Form submits to a different host"
except Exception:
# Parsing failed; treat as no match rather than erroring out
pass
def form_action_missing(facts: Dict[str, Any]):
"""Flags <form> elements with no meaningful action attribute."""
action = (facts.get("action") or "").strip()
if action in _NOOP_ACTIONS:
return True, "Form has no action attribute (or uses a no-op action)"
return False, None
def form_http_on_https_page(facts: Dict[str, Any]):
"""Flags forms submitting over HTTP while the page was loaded over HTTPS."""
base_url = facts.get("base_url") or ""
action = facts.get("action") or ""
try:
if urlparse(base_url).scheme == "https" and urlparse(action).scheme == "http":
return True, "Form submits over insecure HTTP"
except Exception:
pass
return False, None
def form_action_missing(facts: Dict[str, Any]):
"""Flags <form> elements with no action attribute."""
base_url = (facts.get("base_url") or "").strip()
action = (facts.get("action") or "").strip()
if not action:
return True, "Form has no action attribute"
try:
base_scheme = (urlparse(base_url).scheme or "").lower()
parsed_act = urlparse(action)
act_scheme = (parsed_act.scheme or "").lower()
except Exception:
return False, None # parsing trouble → dont flag
# Only flag absolute http:// actions on https pages.
# Relative or schemeless ('//host/...') isnt flagged here (it wont be HTTP on an HTTPS page).
if base_scheme == "https" and act_scheme == "http":
return True, f"Submits over insecure HTTP (action={parsed_act.geturl()})"
return False, None
def form_submits_to_different_host(facts: Dict[str, Any]):
"""Flags <form> actions that submit to a different hostname than the page."""
base_host = (facts.get("base_hostname") or "").strip().lower()
action = (facts.get("action") or "").strip()
if not action or action in _NOOP_ACTIONS:
return False, None
try:
parsed = urlparse(action)
act_host = (parsed.hostname or "").lower()
except Exception:
return False, None
# Only compare when the action specifies a host (absolute URL or schemeless //host/path).
if act_host and base_host and act_host != base_host:
return True, f"Submits to a different host ({act_host} vs {base_host})"
return False, None

View File

@@ -279,6 +279,7 @@ details ul, details p {
}
}
/* SCRIPTS TABLE */
.scripts-table td ul {
margin: 0.25rem 0 0.25rem 1rem;
padding-left: 1rem;
@@ -305,6 +306,59 @@ details ul, details p {
white-space: nowrap;
}
/* lists & small text inside cells */
.forms-table td ul {
margin: 0.25rem 0 0.25rem 1rem;
padding-left: 1rem;
}
.forms-table td small {
opacity: 0.85;
}
/* keep the table from exploding */
.forms-table {
table-layout: fixed;
width: 100%;
}
/* columns: Action | Method | Inputs | Matches | Form Snippet */
.forms-table th:nth-child(1) { width: 15rem; } /* Action */
.forms-table th:nth-child(2) { width: 5rem; } /* Method */
.forms-table th:nth-child(3) { width: 15rem; } /* Inputs */
.forms-table th:nth-child(5) { width: 24rem; } /* Snippet */
.forms-table th:nth-child(4) { width: auto; } /* Matches grows */
/* ellipsize cells by default */
.forms-table td,
.forms-table th {
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
/* nicer wrapping inside snippet/details & input chips */
.forms-table details { white-space: normal; }
.forms-table details > pre.code {
white-space: pre-wrap; /* let long lines wrap */
max-height: 28rem;
overflow: auto;
}
.forms-table .chips {
display: flex;
gap: 0.25rem;
flex-wrap: wrap;
white-space: normal; /* allow chip text to wrap if needed */
}
/* (optional) responsive tweaks */
@media (max-width: 1200px) {
.forms-table th:nth-child(1) { width: 22rem; }
.forms-table th:nth-child(3) { width: 16rem; }
.forms-table th:nth-child(5) { width: 18rem; }
}
/* let URLs/snippets wrap *inside* their cell when expanded content shows */
.breakable {
white-space: normal;

View File

@@ -31,3 +31,6 @@
</footer>
</body>
</html>
{% block page_js %}
{% endblock %}

View File

@@ -90,6 +90,9 @@
100% { transform: rotate(360deg); }
}
</style>
{% endblock %}
{% block page_js %}
<script>
const form = document.getElementById('analyze-form');

View File

@@ -79,21 +79,6 @@
{% endfor %}
{% endif %}
<!-- BEC Words -->
{% if enrichment.bec_words %}
<h3>BEC Words Detected</h3>
<table class="enrichment-table">
<thead>
<tr><th>Word</th></tr>
</thead>
<tbody>
{% for word in enrichment.bec_words %}
<tr><td>{{ word }}</td></tr>
{% endfor %}
</tbody>
</table>
{% endif %}
{% if not enrichment.whois and not enrichment.raw_whois and not enrichment.geoip and not enrichment.bec_words %}
<p>No enrichment data available.</p>
{% endif %}
@@ -130,42 +115,97 @@
<!-- Forms -->
<div class="card" id="forms">
<h2>Forms</h2>
{% if forms %}
{% for form in forms %}
<details class="card {% if form.flagged %}flagged{% endif %}" style="padding:0.5rem; margin-bottom:0.5rem;">
<summary>{{ form.status }} — Action: {{ form.action }} ({{ form.method | upper }})</summary>
<table class="enrichment-table">
{% if forms and forms|length > 0 %}
<table class="enrichment-table forms-table">
<thead>
<tr>
<th>Input Name</th>
<th>Type</th>
<th>Action</th>
<th>Method</th>
<th>Inputs</th>
<th>Matches (Rules)</th>
<th>Form Snippet</th>
</tr>
</thead>
<tbody>
{% for inp in form.inputs %}
{% for f in forms %}
<tr>
<td>{{ inp.name }}</td>
<td>{{ inp.type }}</td>
<!-- Action -->
<td class="breakable">
{% if f.action %}
{{ f.action[:25] }}{% if f.action|length > 25 %}…{% endif %}
{% else %}
<span class="text-dim">(no action)</span>
{% endif %}
</td>
<!-- Method -->
<td>{{ (f.method or 'get')|upper }}</td>
<!-- Inputs -->
<td>
{% if f.inputs and f.inputs|length > 0 %}
<div class="chips">
{% for inp in f.inputs %}
<span class="chip" title="{{ (inp.name or '') ~ ' : ' ~ (inp.type or 'text') }}">
{{ inp.name or '(unnamed)' }}<small> : {{ (inp.type or 'text') }}</small>
</span>
{% endfor %}
</div>
{% else %}
<span class="text-dim">None</span>
{% endif %}
</td>
<!-- Matches (Rules) -->
<td>
{% if f.rules and f.rules|length > 0 %}
<ul>
{% for r in f.rules %}
<li title="{{ r.description or '' }}">
{{ r.name }}
{% if r.severity %}
<span class="badge sev-{{ r.severity|lower }}">{{ r.severity|title }}</span>
{% endif %}
{% if r.tags %}
{% for t in r.tags %}
<span class="chip" title="Tag: {{ t }}">{{ t }}</span>
{% endfor %}
{% endif %}
{% if r.description %}
<small> — {{ r.description }}</small>
{% endif %}
</li>
{% endfor %}
</ul>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
<!-- Form Snippet -->
<td>
{% if f.content_snippet %}
<details>
<summary>View snippet ({{ f.content_snippet|length }} chars)</summary>
<pre class="code">{{ f.content_snippet }}</pre>
</details>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% if form.flagged %}
<p><strong>Flag Reasons:</strong></p>
<ul>
{% for reason in form.flag_reasons %}
<li>{{ reason }}</li>
{% endfor %}
</ul>
{% endif %}
</details>
{% endfor %}
{% else %}
<p>No forms detected.</p>
<p class="text-dim">No form issues detected.</p>
{% endif %}
<p><a href="#top-jump-list">Back to top</a></p>
</div>
<!-- Suspicious Scripts -->
<div class="card" id="scripts">
<h2>Suspicious Scripts</h2>
@@ -176,8 +216,8 @@
<tr>
<th>Type</th>
<th>Source URL</th>
<th>Content Snippet</th>
<th>Matches (Rules & Heuristics)</th>
<th>Content Snippet</th>
</tr>
</thead>
<tbody>
@@ -189,26 +229,12 @@
<!-- Source URL -->
<td class="breakable">
{% if s.src %}
<a href="{{ s.src }}" target="_blank">{{ s.src[:50] }}</a>
{% else %}
N/A
{% endif %}
<a href="{{ s.src }}" target="_blank" rel="noopener">{{ s.src[:50] }}</a>
{% else %} N/A {% endif %}
</td>
<!-- Inline content snippet (collapsible) -->
<td>
{% if s.content_snippet %}
<details>
<summary>View snippet ({{ s.content_snippet|length }} chars) </summary>
<pre class="code">({{ s.content_snippet}}</pre>
</details>
{% else %}
N/A
{% endif %}
</td>
<!-- Rules & Heuristics -->
<td>
<!-- Matches (Rules & Heuristics) -->
<td data-role="matches-cell">
{% set has_rules = s.rules and s.rules|length > 0 %}
{% set has_heur = s.heuristics and s.heuristics|length > 0 %}
@@ -244,13 +270,35 @@
{% endif %}
{% if not has_rules and not has_heur %}
N/A
<span class="text-dim">N/A</span>
{% endif %}
</td>
<!-- Content Snippet (reused for Analyze button / dynamic snippet) -->
<td data-role="snippet-cell">
{% if s.content_snippet %}
<details>
<summary>View snippet ({{ s.content_snippet|length }} chars)</summary>
<pre class="code">{{ s.content_snippet }}</pre>
</details>
{% else %}
{% if s.type == 'external' and s.src %}
<button
type="button"
class="btn btn-sm btn-primary btn-analyze-snippet"
data-url="{{ s.src }}"
data-job="{{ uuid }}">Analyze external script</button>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p>No suspicious scripts detected.</p>
{% endif %}
@@ -269,8 +317,154 @@
<!-- Source -->
<div class="card" id="source">
<h2>Source</h2>
<p><a href="{{ url_for('main.artifacts', run_uuid=uuid, filename='source.txt') }}" target="_blank">View Source</a></p>
<p><a href="{{ url_for('main.view_artifact', run_uuid=uuid, filename='source.html') }}" target="_blank">View Source</a></p>
<p><a href="#top-jump-list">Back to top</a></p>
</div>
{% endblock %}
{% block page_js %}
<script>
/**
* From an absolute artifact path like:
* /data/<uuid>/scripts/fetched/0.js
* /data/<uuid>/1755803694244.js
* C:\data\<uuid>\1755803694244.js
* return { uuid, rel } where rel is the path segment(s) after the uuid.
*/
function parseArtifactPath(artifactPath) {
if (!artifactPath) return { uuid: null, rel: null };
const norm = String(artifactPath).replace(/\\/g, '/'); // windows -> posix
const re = /\/([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})\/(.+)$/;
const m = norm.match(re);
if (!m) return { uuid: null, rel: null };
return { uuid: m[1], rel: m[2] };
}
/** Build /view/artifact/<uuid>/<path:filename> */
function buildViewerUrlFromAbsPath(artifactPath) {
const { uuid, rel } = parseArtifactPath(artifactPath);
if (!uuid || !rel) return '#';
const encodedRel = rel.split('/').map(encodeURIComponent).join('/');
return `/view/artifact/${encodeURIComponent(uuid)}/${encodedRel}`;
}
document.addEventListener('click', function (e) {
const btn = e.target.closest('.btn-analyze-snippet');
if (!btn) return;
const row = btn.closest('tr');
const snippetCell = btn.closest('[data-role="snippet-cell"]') || btn.parentElement;
const matchesCell = row ? row.querySelector('[data-role="matches-cell"]') : null;
const url = btn.dataset.url;
const job = btn.dataset.job;
// Replace button with a lightweight loading text
const loading = document.createElement('span');
loading.className = 'text-dim';
loading.textContent = 'Analyzing…';
btn.replaceWith(loading);
fetch('/api/analyze_script', {
method: 'POST',
headers: { 'Content-Type': 'application/json' }, // include CSRF header if applicable
body: JSON.stringify({ job_id: job, url: url})
})
.then(r => r.json())
.then(data => {
if (!data.ok) {
loading.textContent = 'Error: ' + (data.error || 'Unknown');
return;
}
// --- Build the snippet details element ---
const snippetText = data.snippet || ''; // backend should return a preview
const snippetLen = data.snippet_len || snippetText.length;
// --- File path / viewer things
const filepath = data.artifact_path || ''; // e.g., "/data/3ec90584-076e-457c-924b-861be7e11a34/1755803694244.js"
const viewerUrl = buildViewerUrlFromAbsPath(filepath);
const details = document.createElement('details');
const summary = document.createElement('summary');
summary.textContent = 'View snippet (' + data.snippet_len + ' chars' + (data.truncated ? ', truncated' : '') + ', ' + data.bytes + ' bytes)';
const pre = document.createElement('pre');
pre.className = 'code';
pre.textContent = snippetText; // textContent preserves literal code safely
// put things in the DOM
details.appendChild(summary);
details.appendChild(pre);
const link = document.createElement('a');
link.href = viewerUrl;
link.target = '_blank';
link.rel = 'noopener';
link.textContent = 'open in viewer';
summary.appendChild(document.createElement('br')); // line break under the summary text
summary.appendChild(link);
loading.replaceWith(details);
// Replace "Analyzing…" with the new details block
loading.replaceWith(details);
// --- Update the Matches cell with rule findings ---
if (matchesCell) {
if (Array.isArray(data.findings) && data.findings.length) {
const frag = document.createDocumentFragment();
const strong = document.createElement('strong');
strong.textContent = 'Rules';
const ul = document.createElement('ul');
data.findings.forEach(function (f) {
const li = document.createElement('li');
const name = f.name || 'Rule';
const desc = f.description ? ' — ' + f.description : '';
li.textContent = name + desc;
// Optional badges for severity/tags if present
if (f.severity) {
const badge = document.createElement('span');
badge.className = 'badge sev-' + String(f.severity).toLowerCase();
badge.textContent = String(f.severity).charAt(0).toUpperCase() + String(f.severity).slice(1);
li.appendChild(document.createTextNode(' '));
li.appendChild(badge);
}
if (Array.isArray(f.tags)) {
f.tags.forEach(function (t) {
const chip = document.createElement('span');
chip.className = 'chip';
chip.title = 'Tag: ' + t;
chip.textContent = t;
li.appendChild(document.createTextNode(' '));
li.appendChild(chip);
});
}
ul.appendChild(li);
});
frag.appendChild(strong);
frag.appendChild(ul);
// Replace placeholder N/A or existing heuristics-only content
matchesCell.innerHTML = '';
matchesCell.appendChild(frag);
} else {
matchesCell.innerHTML = '<span class="text-dim">No rule matches.</span>';
}
}
})
.catch(function (err) {
loading.textContent = 'Request failed: ' + err;
});
});
</script>
{% endblock %}

111
app/templates/viewer.html Normal file
View File

@@ -0,0 +1,111 @@
{% extends "base.html" %}
{% block content %}
<div style="max-width:1100px;margin:0 auto;padding:1rem 1.25rem;">
<header style="display:flex;align-items:center;justify-content:space-between;gap:1rem;flex-wrap:wrap;">
<div>
<h2 style="margin:0;font-size:1.1rem;">Code Viewer</h2>
<div class="text-dim" style="font-size:0.9rem;">
<strong>File:</strong> <span id="fileName">{{ filename }}</span>
</div>
</div>
<div style="display:flex;gap:.5rem;align-items:center;">
<button id="copyBtn" class="btn btn-sm">Copy</button>
<button id="wrapBtn" class="btn btn-sm">Toggle wrap</button>
<a id="openRaw" class="btn btn-sm" href="{{ raw_url }}" target="_blank" rel="noopener">Open raw</a>
<a id="downloadRaw" class="btn btn-sm" href="{{ raw_url }}" download>Download</a>
</div>
</header>
<div id="viewerStatus" class="text-dim" style="margin:.5rem 0 .75rem;"></div>
<div id="editor" style="height:72vh;border:1px solid #1f2a36;border-radius:8px;"></div>
</div>
<!-- Monaco AMD loader (no integrity to avoid mismatch) -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/vs/loader.min.js"
crossorigin="anonymous" referrerpolicy="no-referrer"></script>
<script>
(function () {
const RAW_URL = "{{ raw_url }}";
const FILENAME = "{{ filename }}";
const LANGUAGE = "{{ language|default('', true) }}";
const statusEl = document.getElementById('viewerStatus');
function extToLang(name) {
if (!name) return 'plaintext';
const m = name.toLowerCase().match(/\.([a-z0-9]+)$/);
const ext = m ? m[1] : '';
const map = {js:'javascript',mjs:'javascript',cjs:'javascript',ts:'typescript',json:'json',
html:'html',htm:'html',css:'css',py:'python',sh:'shell',bash:'shell',
yml:'yaml',yaml:'yaml',md:'markdown',txt:'plaintext',log:'plaintext'};
return map[ext] || 'plaintext';
}
// Wait until the AMD loader has defined window.require
function waitForRequire(msLeft = 5000) {
return new Promise((resolve, reject) => {
const t0 = performance.now();
(function poll() {
if (window.require && typeof window.require === 'function') return resolve();
if (performance.now() - t0 > msLeft) return reject(new Error('Monaco loader not available'));
setTimeout(poll, 25);
})();
});
}
function configureMonaco() {
// Point AMD loader at the CDN
require.config({ paths: { 'vs': 'https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/vs' } });
// Worker bootstrap
window.MonacoEnvironment = {
getWorkerUrl: function () {
const base = 'https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/';
const code = "self.MonacoEnvironment={baseUrl:'" + base + "'};importScripts('" + base + "vs/base/worker/workerMain.js');";
return 'data:text/javascript;charset=utf-8,' + encodeURIComponent(code);
}
};
}
async function main() {
try {
statusEl.textContent = 'Loading file…';
await waitForRequire();
configureMonaco();
const resp = await fetch(RAW_URL, { cache: 'no-store' });
const text = await resp.text();
require(['vs/editor/editor.main'], function () {
const editor = monaco.editor.create(document.getElementById('editor'), {
value: text,
language: LANGUAGE || extToLang(FILENAME),
readOnly: true,
automaticLayout: true,
wordWrap: 'on',
minimap: { enabled: false },
scrollBeyondLastLine: false,
theme: 'vs-dark'
});
// Buttons
document.getElementById('copyBtn')?.addEventListener('click', async () => {
try { await navigator.clipboard.writeText(editor.getValue()); statusEl.textContent = 'Copied.'; }
catch (e) { statusEl.textContent = 'Copy failed: ' + e; }
});
document.getElementById('wrapBtn')?.addEventListener('click', () => {
const opts = editor.getRawOptions();
editor.updateOptions({ wordWrap: opts.wordWrap === 'on' ? 'off' : 'on' });
});
statusEl.textContent = (resp.ok ? '' : `Warning: HTTP ${resp.status}`) + (text.length ? '' : ' (empty file)');
});
} catch (err) {
statusEl.textContent = 'Viewer error: ' + err.message;
}
}
main();
})();
</script>
{% endblock %}

View File

@@ -33,7 +33,7 @@ from flask import current_app
from playwright.async_api import async_playwright, TimeoutError as PWTimeoutError
from app.utils.io_helpers import safe_write
from app.enrichment import enrich_url
from app.utils.enrichment import enrich_url
from app.utils.settings import get_settings
# Load settings once for constants / defaults
@@ -202,85 +202,111 @@ class Browser:
# -----------------------------------------------------------------------
# Form & Script analysis (plumbing only; detection is in the rules engine)
# -----------------------------------------------------------------------
def analyze_forms(self, html: str, base_url: str) -> List[Dict[str, Any]]:
def analyze_forms(self, html: str, base_url: str = "") -> List[Dict[str, Any]]:
"""
Parse forms from the page HTML and apply rule-based checks (engine), keeping
only simple plumbing heuristics here (no security logic).
Collect form artifacts and evaluate per-form matches via the rules engine.
Only include rows that matched at least one rule.
Returns list of dicts with keys:
Returns list of dicts with keys (per matched form):
- type: "form"
- action, method, inputs
- flagged (bool), flag_reasons (list[str]), status (str)
- rule_checks: {'checks': [...], 'summary': {...}} (per-form snippet evaluation)
- content_snippet: str
- rules: List[{name, description, severity?, tags?}]
"""
soup = BeautifulSoup(html, "lxml")
forms_info: List[Dict[str, Any]] = []
page_hostname = urlparse(base_url).hostname
results: List[Dict[str, Any]] = []
engine = self._get_rule_engine()
base_hostname = urlparse(base_url).hostname or ""
# Match how scripts picks preview len
try:
preview_len = getattr(settings.ui, "snippet_preview_len", 200) # keep parity with scripts
except Exception:
preview_len = 200
for form in soup.find_all("form"):
action = form.get("action")
method = form.get("method", "get").lower()
try:
action = (form.get("action") or "").strip()
method = (form.get("method") or "get").strip().lower()
inputs: List[Dict[str, Any]] = []
for inp in form.find_all("input"):
input_name = inp.get("name")
input_type = inp.get("type", "text")
inputs.append({"name": input_name, "type": input_type})
inputs.append({
"name": inp.get("name"),
"type": (inp.get("type") or "text").strip().lower(),
})
flagged_reasons: List[str] = []
# Use the actual form markup for regex rules
form_markup = str(form)
# UI-friendly snippet
content_snippet = form_markup[:preview_len]
if not action or str(action).strip() == "":
flagged_reasons.append("No action specified")
matches: List[Dict[str, Any]] = []
if engine is not None:
for r in getattr(engine, "rules", []):
if getattr(r, "category", None) != "form":
continue
rtype = getattr(r, "rule_type", None)
try:
ok = False
reason = ""
if rtype == "regex":
# Run against the raw form HTML
ok, reason = r.run(form_markup)
elif rtype == "function":
# Structured facts for function-style rules
facts = {
"category": "form",
"base_url": base_url,
"base_hostname": base_hostname,
"action": action,
"action_hostname": urlparse(action).hostname or "",
"method": method,
"inputs": inputs,
"markup": form_markup,
}
ok, reason = r.run(facts)
else:
continue
if ok:
matches.append({
"name": getattr(r, "name", "unknown_rule"),
"description": (reason or "") or getattr(r, "description", ""),
"severity": getattr(r, "severity", None),
"tags": getattr(r, "tags", None),
})
except Exception as rule_exc:
# Be defensive—bad rule shouldn't break the form pass
try:
action_host = urlparse(action).hostname
if not str(action).startswith("/") and action_host != page_hostname:
flagged_reasons.append("Submits to a different host")
self.logger.debug("Form rule error", extra={"rule": getattr(r, "name", "?"), "error": str(rule_exc)})
except Exception:
pass
continue
try:
if urlparse(action).scheme == "http" and urlparse(base_url).scheme == "https":
flagged_reasons.append("Submits over insecure HTTP")
except Exception:
pass
for hidden in form.find_all("input", type="hidden"):
name_value = hidden.get("name") or ""
if "password" in name_value.lower():
flagged_reasons.append("Hidden password field")
flagged = bool(flagged_reasons)
# Serialize a simple form snippet for rule category='form'
snippet_lines = []
snippet_lines.append(f"base_url={base_url}")
snippet_lines.append(f"base_hostname={page_hostname}")
snippet_lines.append(f"action={action}")
snippet_lines.append(f"method={method}")
snippet_lines.append("inputs=")
i = 0
n = len(inputs)
while i < n:
item = inputs[i]
snippet_lines.append(f" - name={item.get('name')} type={item.get('type')}")
i = i + 1
form_snippet = "\n".join(snippet_lines)
# Per-form rule checks (PASS/FAIL list via engine)
rule_checks = self.run_rule_checks(form_snippet, category="form")
forms_info.append({
if matches:
results.append({
"type": "form",
"action": action,
"method": method,
"inputs": inputs,
"flagged": flagged,
"flag_reasons": flagged_reasons,
"status": "flagged" if flagged else "possibly safe",
"rule_checks": rule_checks
"content_snippet": content_snippet,
"rules": matches,
})
return forms_info
except Exception as exc:
# Keep analysis resilient
try:
self.logger.error("Form analysis error", extra={"error": str(exc)})
except Exception:
pass
results.append({
"type": "form",
"heuristics": [f"Form analysis error: {exc}"],
})
return results
def analyze_scripts(self, html: str, base_url: str = "") -> List[Dict[str, Any]]:
"""
@@ -370,7 +396,7 @@ class Browser:
Writes:
- /data/<uuid>/screenshot.png
- /data/<uuid>/source.txt
- /data/<uuid>/source.html
- /data/<uuid>/results.json (single source of truth for routes)
Returns:
@@ -381,7 +407,7 @@ class Browser:
run_dir.mkdir(parents=True, exist_ok=True)
screenshot_path = run_dir / "screenshot.png"
source_path = run_dir / "source.txt"
source_path = run_dir / "source.html"
results_path = run_dir / "results.json"
redirects: List[Dict[str, Any]] = []

View File

@@ -9,8 +9,8 @@ from ipaddress import ip_address
import socket
# Local imports
from .utils.cache_db import get_cache
from .utils.settings import get_settings
from app.utils.cache_db import get_cache
from app.utils.settings import get_settings
# Configure logging
logging.basicConfig(level=logging.INFO, format="[%(levelname)s] %(message)s")
@@ -39,9 +39,6 @@ def enrich_url(url: str) -> dict:
# --- GeoIP ---
result["geoip"] = enrich_geoip(hostname)
# --- BEC Words ---
result["bec_words"] = [w for w in BEC_WORDS if w.lower() in url.lower()]
return result

View File

@@ -0,0 +1,338 @@
# sneakyscope/app/utils/external_fetch.py
import hashlib
import os
import logging
from dataclasses import dataclass
from typing import Optional, Tuple, List
from urllib.parse import urljoin, urlparse
import requests
from app.utils.settings import get_settings
settings = get_settings()
_ALLOWED_SCHEMES = {"http", "https"}
@dataclass
class FetchResult:
"""
Outcome for a single external script fetch.
"""
ok: bool
reason: str
source_url: str
final_url: str
status_code: Optional[int]
content_type: Optional[str]
bytes_fetched: int
truncated: bool
sha256_hex: Optional[str]
saved_path: Optional[str]
class ExternalScriptFetcher:
"""
Minimal, safe-by-default fetcher for external JS files.
Notes / assumptions:
- All artifacts for this run live under the UUID-backed `results_path` you pass in.
- Saves bytes to: <results_path>/<index>.js
- Manual redirects up to `max_redirects`.
- Streaming with a hard byte cap derived from `max_total_mb`.
- Never raises network exceptions to callers; failures are encoded in FetchResult.
- Settings are read from get_settings()['external_script_fetch'] with sane defaults.
"""
def __init__(self, results_path: str, session: Optional[requests.Session] = None):
"""
Args:
results_path: Absolute path to the run's UUID directory (e.g., /data/<run_uuid>).
session: Optional requests.Session to reuse connections; a new one is created if not provided.
"""
# Derived value: MiB -> bytes
self.max_total_bytes: int = settings.external_fetch.max_total_mb * 1024 * 1024
# Logger
self.logger = logging.getLogger(__file__)
# Where to write artifacts for this job/run (UUID directory)
self.results_path = results_path
# HTTP session with a predictable UA
self.session = session or requests.Session()
self.session.headers.update({"User-Agent": "SneakyScope/1.0"})
# -------------------------
# Internal helper methods
# -------------------------
def _timeout(self) -> Tuple[float, float]:
"""
Compute (connect_timeout, read_timeout) in seconds from max_time_ms.
Keeps a conservative split so either phase gets a fair chance.
"""
total = max(0.1, settings.external_fetch.max_time_ms / 1000.0)
connect = min(1.5, total * 0.5) # cap connect timeout
read = max(0.5, total * 0.5) # floor read timeout
return (connect, read)
def _scheme_allowed(self, url: str) -> bool:
"""
Return True if URL uses an allowed scheme (http/https).
"""
scheme = (urlparse(url).scheme or "").lower()
return scheme in _ALLOWED_SCHEMES
def _artifact_path(self, index: int) -> str:
"""
Build an output path like:
<results_path>/<index>.js
Ensures the directory exists.
"""
base_dir = os.path.join(self.results_path)
# Make sure parent directories exist (idempotent)
os.makedirs(base_dir, exist_ok=True)
filename = f"{index}.js"
return os.path.join(base_dir, filename)
# -------------------------
# Public API
# -------------------------
def fetch_one(self, script_url: str, index: int) -> FetchResult:
"""
Fetch exactly one external script with manual redirect handling and a hard per-file byte cap.
Args:
script_url: The script URL to retrieve.
index: Numeric index used solely for naming the artifact file (<index>.js).
Returns:
FetchResult with status, metadata, and saved path (if successful).
"""
# Feature gate: allow callers to rely on a consistent failure when globally disabled.
if not settings.external_fetch.enabled:
return FetchResult(
ok=False,
reason="Feature disabled",
source_url=script_url,
final_url=script_url,
status_code=None,
content_type=None,
bytes_fetched=0,
truncated=False,
sha256_hex=None,
saved_path=None,
)
# Scheme guard: refuse anything not http/https in this v1.
if not self._scheme_allowed(script_url):
return FetchResult(
ok=False,
reason="Scheme not allowed",
source_url=script_url,
final_url=script_url,
status_code=None,
content_type=None,
bytes_fetched=0,
truncated=False,
sha256_hex=None,
saved_path=None,
)
current_url = script_url
status_code: Optional[int] = None
content_type: Optional[str] = None
redirects_followed = 0
# Manual redirect loop so we can enforce max_redirects precisely.
while True:
try:
resp = self.session.get(
current_url,
stream=True,
allow_redirects=False,
timeout=self._timeout(),
)
except requests.exceptions.Timeout:
return FetchResult(
ok=False,
reason="Timeout",
source_url=script_url,
final_url=current_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=0,
truncated=False,
sha256_hex=None,
saved_path=None,
)
except requests.exceptions.RequestException as e:
return FetchResult(
ok=False,
reason=f"Network error: {e.__class__.__name__}",
source_url=script_url,
final_url=current_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=0,
truncated=False,
sha256_hex=None,
saved_path=None,
)
status_code = resp.status_code
content_type = resp.headers.get("Content-Type")
# Handle redirects explicitly (3xx with Location)
if status_code in (301, 302, 303, 307, 308) and "Location" in resp.headers:
if redirects_followed >= settings.external_fetch.max_redirects:
return FetchResult(
ok=False,
reason="Max redirects exceeded",
source_url=script_url,
final_url=current_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=0,
truncated=False,
sha256_hex=None,
saved_path=None,
)
next_url = urljoin(current_url, resp.headers["Location"])
if not self._scheme_allowed(next_url):
return FetchResult(
ok=False,
reason="Redirect to disallowed scheme",
source_url=script_url,
final_url=next_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=0,
truncated=False,
sha256_hex=None,
saved_path=None,
)
current_url = next_url
redirects_followed += 1
# Loop to follow next hop
continue
# Not a redirect: stream response body with a hard byte cap.
cap = self.max_total_bytes
total = 0
truncated = False
chunks: List[bytes] = []
try:
for chunk in resp.iter_content(chunk_size=8192):
if not chunk:
# Skip keep-alive chunks
continue
new_total = total + len(chunk)
if new_total > cap:
# Only keep what fits and stop
remaining = cap - total
if remaining > 0:
chunks.append(chunk[:remaining])
total += remaining
truncated = True
break
chunks.append(chunk)
total = new_total
except requests.exceptions.Timeout:
return FetchResult(
ok=False,
reason="Timeout while reading",
source_url=script_url,
final_url=current_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=total,
truncated=truncated,
sha256_hex=None,
saved_path=None,
)
except requests.exceptions.RequestException as e:
return FetchResult(
ok=False,
reason=f"Network error while reading: {e.__class__.__name__}",
source_url=script_url,
final_url=current_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=total,
truncated=truncated,
sha256_hex=None,
saved_path=None,
)
data = b"".join(chunks)
if not data:
return FetchResult(
ok=False,
reason="Empty response",
source_url=script_url,
final_url=current_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=0,
truncated=False,
sha256_hex=None,
saved_path=None,
)
# Persist to <results_path>/<index>.js
out_path = self._artifact_path(index)
try:
with open(out_path, "wb") as f:
f.write(data)
except OSError as e:
return FetchResult(
ok=False,
reason=f"Write error: {e.__class__.__name__}",
source_url=script_url,
final_url=current_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=total,
truncated=truncated,
sha256_hex=None,
saved_path=None,
)
sha256_hex = hashlib.sha256(data).hexdigest()
# Structured log line for visibility/metrics
try:
self.logger.info(
"External script fetched",
extra={
"source_url": script_url,
"final_url": current_url,
"status": status_code,
"bytes": total,
"truncated": truncated,
"sha256": sha256_hex,
"saved_path": out_path,
},
)
except Exception:
# Logging should never break the pipeline
pass
return FetchResult(
ok=True,
reason="OK",
source_url=script_url,
final_url=current_url,
status_code=status_code,
content_type=content_type,
bytes_fetched=total,
truncated=truncated,
sha256_hex=sha256_hex,
saved_path=out_path,
)

View File

@@ -39,6 +39,14 @@ BASE_DIR = Path(__file__).resolve().parent.parent
DEFAULT_SETTINGS_FILE = BASE_DIR / "config" / "settings.yaml"
# ---------- CONFIG DATA CLASSES ----------
@dataclass
class External_FetchConfig:
enabled: bool = True
max_total_mb: int = 5
max_time_ms: int = 3000
max_redirects: int = 3
concurrency: int = 3
@dataclass
class UIConfig:
snippet_preview_len: int = 160
@@ -61,6 +69,7 @@ class AppConfig:
class Settings:
cache: Cache_Config = field(default_factory=Cache_Config)
ui: UIConfig = field(default_factory=UIConfig)
external_fetch: External_FetchConfig = field(default_factory=External_FetchConfig)
app: AppConfig = field(default_factory=AppConfig)
@classmethod

View File

@@ -1,23 +1,18 @@
# SneakyScope — Roadmap (Updated 8-21-25)
## Priority 1 Core Analysis / Stability
* Opt-in “fetch external scripts” mode (off by default): on submission, download external script content (size/time limits) and run rules on fetched content.
* Remove remaining legacy form “flagged\_reasons” plumbing once all equivalent function rules are in place.
* Unit tests: YAML compilation, function-rule adapters, and per-script/per-form rule cases.
* SSL/TLS intelligence: for HTTPS targets, pull certificate details from crt.sh (filtering expired); if a subdomain, also resolve the root domain to capture any wildcard certificates; probe the endpoint to enumerate supported TLS versions/ciphers and flag weak/legacy protocols.
## Priority 2 API Layer
* API endpoints: `/screenshot`, `/source`, `/analyse`.
* OpenAPI spec: create `openapi/openapi.yaml` and serve at `/api/openapi.yaml`.
* **OpenAPI**: add `POST /api/analyze_script` (request/response schemas, examples) to `openapi/openapi.yaml`; serve at `/api/openapi.yaml`.
* Docs UI: Swagger UI or Redoc at `/docs`.
* (Nice-to-have) API JSON error consistency: handlers for 400/403/404/405/500 that always return JSON.
## Priority 3 UI / UX
* Front page/input handling: auto-prepend `http://`/`https://`/`www.` for bare domains.
* Source code viewer: embed page source in an editor view for readability.
* Scripts table: toggle between “Only suspicious” and “All scripts”.
* Rules Lab (WYSIWYG tester): paste a rule, validate/compile, run against sample text; lightweight nav entry.
## Priority 4 Artifact Management & Ops
@@ -33,6 +28,6 @@
* Domain reputation (local feeds): build and refresh a consolidated domain/URL reputation store from URLHaus database dump and OpenPhish community dataset (scheduled pulls with dedup/normalize).
* Threat intel connectors (settings-driven): add `settings.yaml` entries for VirusTotal and ThreatFox API keys (plus future providers); when present, enrich lookups and merge results into the unified reputation checks during analysis.
## Backlog / FarOff Plans
## Backlog / Far-Off Plans
* Server profile scan: run a lightweight nmap service/banner scan on common web/alt ports (80, 443, 8000, 8080, 8443, etc.) and SSH; combine with server headers to infer stack (e.g., IIS vs. Linux/\*nix).