feat: on-demand external script analysis + code viewer; refactor form analysis to rule engine
- API: add `POST /api/analyze_script` (app/blueprints/api.py)
- Fetch one external script to artifacts, run rules, return findings + snippet
- Uses new ExternalScriptFetcher (results_path aware) and job UUID
- Returns: { ok, final_url, status_code, bytes, truncated, sha256, artifact_path, findings[], snippet, snippet_len }
- TODO: document in openapi/openapi.yaml
- Fetcher: update `app/utils/external_fetch.py`
- Constructed with `results_path` (UUID dir); writes to `<results_path>/scripts/fetched/<index>.js`
- Loads settings via `get_settings()`, logs via std logging
- UI (results.html):
- Move “Analyze external script” action into **Content Snippet** column for external rows
- Clicking replaces button with `<details>` snippet, shows rule matches, and adds “open in viewer” link
- Robust fetch handler (checks JSON, shows errors); builds viewer URL from absolute artifact path
- Viewer:
- New route: `GET /view/artifact/<run_uuid>/<path:filename>` (app/blueprints/ui.py)
- New template: Monaco-based read-only code viewer (viewer.html)
- Removes SRI on loader to avoid integrity block; loads file via `raw_url` and detects language by extension
- Forms:
- Refactor `analyze_forms` to mirror scripts analysis:
- Uses rule engine (`category == "form"`) across regex/function rules
- Emits rows only when matches exist
- Includes `content_snippet`, `action`, `method`, `inputs`, `rules`
- Replace legacy plumbing (`flagged`, `flag_reasons`, `status`) in output
- Normalize form function rules to canonical returns `(bool, Optional[str])`:
- `form_action_missing`
- `form_http_on_https_page`
- `form_submits_to_different_host`
- Add minor hardening (lowercasing hosts, no-op actions, clearer reasons)
- CSS: add `.forms-table` to mirror `.scripts-table` (5 columns)
- Fixed table layout, widths per column, chip/snippet styling, responsive tweaks
- Misc:
- Fix “working outside app context” issue by avoiding `current_app` at import time (left storage logic inside routes)
- Add “View Source” link to open page source in viewer
Refs:
- Roadmap: mark “Source code viewer” done; keep TODO to add `/api/analyze_script` to OpenAPI
This commit is contained in:
@@ -19,7 +19,8 @@ from .rules.function_rules import (
|
||||
form_action_missing,
|
||||
)
|
||||
|
||||
from . import routes # blueprint
|
||||
from app.blueprints import ui # ui blueprint
|
||||
from app.blueprints import api # api blueprint
|
||||
|
||||
# from .utils import io_helpers # if need logging/setup later
|
||||
# from .utils import cache_db # available for future injections
|
||||
@@ -136,7 +137,8 @@ def create_app() -> Flask:
|
||||
app.config["APP_VERSION"] = f"v{settings.app.version_major}.{settings.app.version_minor}"
|
||||
|
||||
# Register blueprints
|
||||
app.register_blueprint(routes.bp)
|
||||
app.register_blueprint(ui.bp)
|
||||
app.register_blueprint(api.api_bp)
|
||||
|
||||
# Example log lines so we know we booted cleanly
|
||||
app.logger.info(f"SneakyScope started: {app.config['APP_NAME']} {app.config['APP_VERSION']}")
|
||||
|
||||
212
app/blueprints/api.py
Normal file
212
app/blueprints/api.py
Normal file
@@ -0,0 +1,212 @@
|
||||
# app/blueprints/api.py
|
||||
"""
|
||||
API blueprint for JSON endpoints.
|
||||
|
||||
Endpoints:
|
||||
POST /api/analyze_script
|
||||
Body:
|
||||
{
|
||||
"job_id": "<uuid>", # or "uuid": "<uuid>"
|
||||
"url": "https://cdn.example.com/app.js",
|
||||
"category": "script" # optional, defaults to "script"
|
||||
}
|
||||
Response:
|
||||
{
|
||||
"ok": true,
|
||||
"final_url": "...",
|
||||
"status_code": 200,
|
||||
"bytes": 12345,
|
||||
"truncated": false,
|
||||
"sha256": "...",
|
||||
"artifact_path": "/abs/path/to/<uuid>/scripts/fetched/<index>.js",
|
||||
"findings": [ { "name": "...", "description": "...", "severity": "...", "tags": [...], "reason": "..." }, ... ],
|
||||
"snippet": "<trimmed content>",
|
||||
"snippet_len": 45678
|
||||
}
|
||||
"""
|
||||
|
||||
import os
|
||||
import time
|
||||
from flask import Blueprint, request, jsonify, current_app, send_file, abort
|
||||
from pathlib import Path
|
||||
|
||||
from app.utils.settings import get_settings
|
||||
from app.utils.external_fetcher import ExternalScriptFetcher
|
||||
from werkzeug.exceptions import HTTPException
|
||||
|
||||
api_bp = Blueprint("api", __name__, url_prefix="/api")
|
||||
|
||||
|
||||
def _resolve_results_path(job_id: str) -> str:
|
||||
"""
|
||||
Compute the absolute results directory for a given job UUID.
|
||||
Prefers <BASE>/artifacts/<uuid>, falls back to <BASE>/<uuid>.
|
||||
"""
|
||||
base_dir = "/data"
|
||||
|
||||
candidate_with_artifacts = os.path.join(base_dir, job_id)
|
||||
if os.path.isdir(candidate_with_artifacts):
|
||||
return candidate_with_artifacts
|
||||
|
||||
fallback = os.path.join(base_dir, job_id)
|
||||
os.makedirs(fallback, exist_ok=True)
|
||||
return fallback
|
||||
|
||||
|
||||
def _make_snippet(text: str, max_chars: int = 1200) -> str:
|
||||
"""Produce a trimmed, safe-to-render snippet of the script contents."""
|
||||
if not text:
|
||||
return ""
|
||||
snippet = text.strip()
|
||||
return (snippet[:max_chars] + "…") if len(snippet) > max_chars else snippet
|
||||
|
||||
@api_bp.errorhandler(400)
|
||||
@api_bp.errorhandler(403)
|
||||
@api_bp.errorhandler(404)
|
||||
@api_bp.errorhandler(405)
|
||||
def _api_err(err):
|
||||
"""
|
||||
Return JSON for common client errors.
|
||||
"""
|
||||
if isinstance(err, HTTPException):
|
||||
code = err.code
|
||||
name = (err.name or "error").lower()
|
||||
else:
|
||||
code = 400
|
||||
name = "error"
|
||||
return jsonify({"ok": False, "error": name}), code
|
||||
|
||||
|
||||
@api_bp.errorhandler(500)
|
||||
def _api_500(err):
|
||||
"""
|
||||
Return JSON for server errors and log the exception.
|
||||
"""
|
||||
try:
|
||||
current_app.logger.exception("API 500")
|
||||
except Exception:
|
||||
pass
|
||||
return jsonify({"ok": False, "error": "internal server error"}), 500
|
||||
|
||||
|
||||
@api_bp.post("/analyze_script")
|
||||
def analyze_script():
|
||||
"""
|
||||
Analyze EXACTLY one external script URL for a given job UUID.
|
||||
|
||||
Expected JSON body:
|
||||
{ "job_id": "<uuid>", "url": "https://cdn.example.com/app.js", "category": "script" }
|
||||
"""
|
||||
body = request.get_json(silent=True) or {}
|
||||
|
||||
job_id_raw = body.get("job_id") or body.get("uuid")
|
||||
script_url_raw = body.get("url")
|
||||
category = (body.get("category") or "script").strip() or None # default to "script"
|
||||
|
||||
job_id = (job_id_raw or "").strip() if isinstance(job_id_raw, str) else ""
|
||||
script_url = (script_url_raw or "").strip() if isinstance(script_url_raw, str) else ""
|
||||
|
||||
# log this request
|
||||
current_app.logger.info(f"Got request to analyze {script_url} via API ")
|
||||
|
||||
if not job_id or not script_url:
|
||||
return jsonify({"ok": False, "error": "Missing job_id (or uuid) or url"}), 400
|
||||
|
||||
settings = get_settings()
|
||||
|
||||
if not settings.external_fetch.enabled:
|
||||
return jsonify({"ok": False, "error": "Feature disabled"}), 400
|
||||
|
||||
# Resolve the UUID-backed results directory for this run.
|
||||
results_path = _resolve_results_path(job_id)
|
||||
|
||||
# Initialize the fetcher; it reads its own settings internally.
|
||||
fetcher = ExternalScriptFetcher(results_path=results_path)
|
||||
|
||||
# Unique index for the saved file name: <results_path>/scripts/fetched/<index>.js
|
||||
unique_index = int(time.time() * 1000)
|
||||
|
||||
outcome = fetcher.fetch_one(script_url=script_url, index=unique_index)
|
||||
if not outcome.ok or not outcome.saved_path:
|
||||
return jsonify({
|
||||
"ok": False,
|
||||
"error": outcome.reason,
|
||||
"status_code": outcome.status_code,
|
||||
"final_url": outcome.final_url
|
||||
}), 502
|
||||
|
||||
# Read bytes and decode to UTF-8 for rules and snippet
|
||||
try:
|
||||
with open(outcome.saved_path, "rb") as fh:
|
||||
js_text = fh.read().decode("utf-8", errors="ignore")
|
||||
except Exception:
|
||||
js_text = ""
|
||||
|
||||
# Pull the rules engine from the app (prefer attribute, then config).
|
||||
findings = []
|
||||
try:
|
||||
engine = getattr(current_app, "rule_engine", None)
|
||||
if engine is None:
|
||||
engine = current_app.config.get("RULE_ENGINE")
|
||||
except Exception:
|
||||
engine = None
|
||||
|
||||
if engine is not None and hasattr(engine, "run_all"):
|
||||
try:
|
||||
# run_all returns PASS/FAIL for each rule; we only surface FAIL (matched) to the UI
|
||||
all_results = engine.run_all(js_text, category=category)
|
||||
if isinstance(all_results, list):
|
||||
matched = []
|
||||
for r in all_results:
|
||||
try:
|
||||
if (r.get("result") == "FAIL"):
|
||||
matched.append({
|
||||
"name": r.get("name"),
|
||||
"description": r.get("description"),
|
||||
"severity": r.get("severity"),
|
||||
"tags": r.get("tags") or [],
|
||||
"reason": r.get("reason"),
|
||||
"category": r.get("category"),
|
||||
})
|
||||
except Exception:
|
||||
# Ignore malformed entries
|
||||
continue
|
||||
findings = matched
|
||||
except Exception as exc:
|
||||
try:
|
||||
current_app.logger.error("Rule engine error", extra={"error": str(exc)})
|
||||
except Exception:
|
||||
pass
|
||||
findings = []
|
||||
|
||||
snippet = _make_snippet(js_text, max_chars=settings.ui.snippet_preview_len)
|
||||
|
||||
return jsonify({
|
||||
"ok": True,
|
||||
"final_url": outcome.final_url,
|
||||
"status_code": outcome.status_code,
|
||||
"bytes": outcome.bytes_fetched,
|
||||
"truncated": outcome.truncated,
|
||||
"sha256": outcome.sha256_hex,
|
||||
"artifact_path": outcome.saved_path,
|
||||
"findings": findings, # only FAILed rules
|
||||
"snippet": snippet,
|
||||
"snippet_len": len(js_text)
|
||||
})
|
||||
|
||||
|
||||
@api_bp.get("/artifacts/<run_uuid>/<filename>")
|
||||
def get_artifact_raw(run_uuid, filename):
|
||||
# prevent path traversal
|
||||
if "/" in filename or ".." in filename:
|
||||
abort(400)
|
||||
|
||||
run_dir = _resolve_results_path(run_uuid)
|
||||
full_path = Path(run_dir) / filename
|
||||
|
||||
# if file is not there, give a 404
|
||||
if not os.path.isfile(full_path):
|
||||
abort(404)
|
||||
|
||||
# else return file
|
||||
return send_file(full_path, as_attachment=False)
|
||||
@@ -1,3 +1,5 @@
|
||||
# app/blueprints/ui.py
|
||||
|
||||
import os
|
||||
import json
|
||||
import asyncio
|
||||
@@ -5,11 +7,10 @@ from pathlib import Path
|
||||
from datetime import datetime
|
||||
from flask import Blueprint, render_template, request, redirect, url_for, flash, current_app, send_file, abort
|
||||
|
||||
# from .browser import fetch_page_artifacts
|
||||
from .utils.browser import get_browser
|
||||
from .utils.enrichment import enrich_url
|
||||
from .utils.settings import get_settings
|
||||
from .utils.io_helpers import get_recent_results
|
||||
from app.utils.browser import get_browser
|
||||
from app.utils.enrichment import enrich_url
|
||||
from app.utils.settings import get_settings
|
||||
from app.utils.io_helpers import get_recent_results
|
||||
|
||||
bp = Blueprint("main", __name__)
|
||||
|
||||
@@ -34,9 +35,6 @@ def index():
|
||||
The number of recent runs is controlled via settings.cache.recent_runs_count (int).
|
||||
Falls back to 10 if not present or invalid.
|
||||
"""
|
||||
# Resolve SANDBOX_STORAGE from app config
|
||||
storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
|
||||
|
||||
# Pull recent count from settings with a safe fallback
|
||||
try:
|
||||
# settings is already initialized at module import in your file
|
||||
@@ -46,13 +44,15 @@ def index():
|
||||
except Exception:
|
||||
recent_count = 10
|
||||
|
||||
# Resolve SANDBOX_STORAGE from app config
|
||||
storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
|
||||
|
||||
# Build the recent list (non-fatal if storage is empty or unreadable)
|
||||
recent_results = get_recent_results(storage, recent_count, current_app.logger)
|
||||
|
||||
# Pass to template; your index.html will hide the card if list is empty
|
||||
return render_template("index.html", recent_results=recent_results)
|
||||
|
||||
|
||||
@bp.route("/analyze", methods=["POST"])
|
||||
def analyze():
|
||||
url = request.form.get("url", "").strip()
|
||||
@@ -87,6 +87,7 @@ def analyze():
|
||||
|
||||
@bp.route("/results/<run_uuid>", methods=["GET"])
|
||||
def view_result(run_uuid: str):
|
||||
# Resolve SANDBOX_STORAGE from app config
|
||||
storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
|
||||
run_dir = storage / run_uuid
|
||||
results_path = run_dir / "results.json"
|
||||
@@ -105,6 +106,7 @@ def view_result(run_uuid: str):
|
||||
|
||||
@bp.route("/artifacts/<run_uuid>/<filename>", methods=["GET"])
|
||||
def artifacts(run_uuid: str, filename: str):
|
||||
# Resolve SANDBOX_STORAGE from app config
|
||||
storage = Path(current_app.config["SANDBOX_STORAGE"]).resolve()
|
||||
run_dir = storage / run_uuid
|
||||
full_path = run_dir / filename
|
||||
@@ -123,3 +125,11 @@ def artifacts(run_uuid: str, filename: str):
|
||||
return send_file(full_path)
|
||||
|
||||
|
||||
@bp.get("/view/artifact/<run_uuid>/<filename>")
|
||||
def view_artifact(run_uuid, filename):
|
||||
# Build a safe raw URL that streams the file (you said you already have this route)
|
||||
raw_url = url_for('api.get_artifact_raw', run_uuid=run_uuid, filename=filename)
|
||||
# Optional: derive language server-side if you prefer
|
||||
language = None # e.g., 'javascript'
|
||||
return render_template('viewer.html', filename=filename, raw_url=raw_url, language=language)
|
||||
|
||||
@@ -8,5 +8,11 @@ cache:
|
||||
whois_cache_days: 7
|
||||
geoip_cache_days: 7
|
||||
|
||||
external_script_fetch:
|
||||
enabled: True
|
||||
max_total_mb: 5
|
||||
max_time_ms: 3000
|
||||
max_redirects: 3
|
||||
|
||||
ui:
|
||||
snippet_preview_len: 300
|
||||
|
||||
@@ -22,6 +22,7 @@ from __future__ import annotations
|
||||
from typing import Any, Dict, Optional
|
||||
from urllib.parse import urlparse
|
||||
|
||||
_NOOP_ACTIONS = {"", "#", "javascript:void(0)", "javascript:void(0);"}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Adapters
|
||||
@@ -169,35 +170,48 @@ def script_third_party_host(facts: Dict[str, Any]):
|
||||
|
||||
# ---------------- Form rules ----------------
|
||||
|
||||
def form_submits_to_different_host(facts: Dict[str, Any]):
|
||||
"""Flags <form> actions that submit to a different hostname than the page."""
|
||||
base_host = facts.get("base_hostname") or ""
|
||||
action = facts.get("action") or ""
|
||||
try:
|
||||
action_host = urlparse(action).hostname
|
||||
if action_host and base_host and action_host != base_host:
|
||||
return True, "Form submits to a different host"
|
||||
except Exception:
|
||||
# Parsing failed; treat as no match rather than erroring out
|
||||
pass
|
||||
def form_action_missing(facts: Dict[str, Any]):
|
||||
"""Flags <form> elements with no meaningful action attribute."""
|
||||
action = (facts.get("action") or "").strip()
|
||||
if action in _NOOP_ACTIONS:
|
||||
return True, "Form has no action attribute (or uses a no-op action)"
|
||||
return False, None
|
||||
|
||||
|
||||
def form_http_on_https_page(facts: Dict[str, Any]):
|
||||
"""Flags forms submitting over HTTP while the page was loaded over HTTPS."""
|
||||
base_url = facts.get("base_url") or ""
|
||||
action = facts.get("action") or ""
|
||||
base_url = (facts.get("base_url") or "").strip()
|
||||
action = (facts.get("action") or "").strip()
|
||||
|
||||
try:
|
||||
if urlparse(base_url).scheme == "https" and urlparse(action).scheme == "http":
|
||||
return True, "Form submits over insecure HTTP"
|
||||
base_scheme = (urlparse(base_url).scheme or "").lower()
|
||||
parsed_act = urlparse(action)
|
||||
act_scheme = (parsed_act.scheme or "").lower()
|
||||
except Exception:
|
||||
pass
|
||||
return False, None # parsing trouble → don’t flag
|
||||
|
||||
# Only flag absolute http:// actions on https pages.
|
||||
# Relative or schemeless ('//host/...') isn’t flagged here (it won’t be HTTP on an HTTPS page).
|
||||
if base_scheme == "https" and act_scheme == "http":
|
||||
return True, f"Submits over insecure HTTP (action={parsed_act.geturl()})"
|
||||
return False, None
|
||||
|
||||
|
||||
def form_action_missing(facts: Dict[str, Any]):
|
||||
"""Flags <form> elements with no action attribute."""
|
||||
action = (facts.get("action") or "").strip()
|
||||
if not action:
|
||||
return True, "Form has no action attribute"
|
||||
def form_submits_to_different_host(facts: Dict[str, Any]):
|
||||
"""Flags <form> actions that submit to a different hostname than the page."""
|
||||
base_host = (facts.get("base_hostname") or "").strip().lower()
|
||||
action = (facts.get("action") or "").strip()
|
||||
|
||||
if not action or action in _NOOP_ACTIONS:
|
||||
return False, None
|
||||
|
||||
try:
|
||||
parsed = urlparse(action)
|
||||
act_host = (parsed.hostname or "").lower()
|
||||
except Exception:
|
||||
return False, None
|
||||
|
||||
# Only compare when the action specifies a host (absolute URL or schemeless //host/path).
|
||||
if act_host and base_host and act_host != base_host:
|
||||
return True, f"Submits to a different host ({act_host} vs {base_host})"
|
||||
return False, None
|
||||
@@ -279,6 +279,7 @@ details ul, details p {
|
||||
}
|
||||
}
|
||||
|
||||
/* SCRIPTS TABLE */
|
||||
.scripts-table td ul {
|
||||
margin: 0.25rem 0 0.25rem 1rem;
|
||||
padding-left: 1rem;
|
||||
@@ -305,6 +306,59 @@ details ul, details p {
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
|
||||
/* lists & small text inside cells */
|
||||
.forms-table td ul {
|
||||
margin: 0.25rem 0 0.25rem 1rem;
|
||||
padding-left: 1rem;
|
||||
}
|
||||
.forms-table td small {
|
||||
opacity: 0.85;
|
||||
}
|
||||
|
||||
/* keep the table from exploding */
|
||||
.forms-table {
|
||||
table-layout: fixed;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
/* columns: Action | Method | Inputs | Matches | Form Snippet */
|
||||
.forms-table th:nth-child(1) { width: 15rem; } /* Action */
|
||||
.forms-table th:nth-child(2) { width: 5rem; } /* Method */
|
||||
.forms-table th:nth-child(3) { width: 15rem; } /* Inputs */
|
||||
.forms-table th:nth-child(5) { width: 24rem; } /* Snippet */
|
||||
.forms-table th:nth-child(4) { width: auto; } /* Matches grows */
|
||||
|
||||
/* ellipsize cells by default */
|
||||
.forms-table td,
|
||||
.forms-table th {
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
/* nicer wrapping inside snippet/details & input chips */
|
||||
.forms-table details { white-space: normal; }
|
||||
.forms-table details > pre.code {
|
||||
white-space: pre-wrap; /* let long lines wrap */
|
||||
max-height: 28rem;
|
||||
overflow: auto;
|
||||
}
|
||||
.forms-table .chips {
|
||||
display: flex;
|
||||
gap: 0.25rem;
|
||||
flex-wrap: wrap;
|
||||
white-space: normal; /* allow chip text to wrap if needed */
|
||||
}
|
||||
|
||||
/* (optional) responsive tweaks */
|
||||
@media (max-width: 1200px) {
|
||||
.forms-table th:nth-child(1) { width: 22rem; }
|
||||
.forms-table th:nth-child(3) { width: 16rem; }
|
||||
.forms-table th:nth-child(5) { width: 18rem; }
|
||||
}
|
||||
|
||||
|
||||
/* let URLs/snippets wrap *inside* their cell when expanded content shows */
|
||||
.breakable {
|
||||
white-space: normal;
|
||||
|
||||
@@ -31,3 +31,6 @@
|
||||
</footer>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
{% block page_js %}
|
||||
{% endblock %}
|
||||
@@ -90,6 +90,9 @@
|
||||
100% { transform: rotate(360deg); }
|
||||
}
|
||||
</style>
|
||||
{% endblock %}
|
||||
|
||||
{% block page_js %}
|
||||
|
||||
<script>
|
||||
const form = document.getElementById('analyze-form');
|
||||
|
||||
@@ -79,21 +79,6 @@
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
|
||||
<!-- BEC Words -->
|
||||
{% if enrichment.bec_words %}
|
||||
<h3>BEC Words Detected</h3>
|
||||
<table class="enrichment-table">
|
||||
<thead>
|
||||
<tr><th>Word</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for word in enrichment.bec_words %}
|
||||
<tr><td>{{ word }}</td></tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
{% endif %}
|
||||
|
||||
{% if not enrichment.whois and not enrichment.raw_whois and not enrichment.geoip and not enrichment.bec_words %}
|
||||
<p>No enrichment data available.</p>
|
||||
{% endif %}
|
||||
@@ -129,90 +114,131 @@
|
||||
|
||||
<!-- Forms -->
|
||||
<div class="card" id="forms">
|
||||
<h2>Forms</h2>
|
||||
{% if forms %}
|
||||
{% for form in forms %}
|
||||
<details class="card {% if form.flagged %}flagged{% endif %}" style="padding:0.5rem; margin-bottom:0.5rem;">
|
||||
<summary>{{ form.status }} — Action: {{ form.action }} ({{ form.method | upper }})</summary>
|
||||
<table class="enrichment-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Input Name</th>
|
||||
<th>Type</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for inp in form.inputs %}
|
||||
<tr>
|
||||
<td>{{ inp.name }}</td>
|
||||
<td>{{ inp.type }}</td>
|
||||
</tr>
|
||||
<h2>Forms</h2>
|
||||
|
||||
{% if forms and forms|length > 0 %}
|
||||
<table class="enrichment-table forms-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Action</th>
|
||||
<th>Method</th>
|
||||
<th>Inputs</th>
|
||||
<th>Matches (Rules)</th>
|
||||
<th>Form Snippet</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for f in forms %}
|
||||
<tr>
|
||||
<!-- Action -->
|
||||
<td class="breakable">
|
||||
{% if f.action %}
|
||||
{{ f.action[:25] }}{% if f.action|length > 25 %}…{% endif %}
|
||||
{% else %}
|
||||
<span class="text-dim">(no action)</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
|
||||
<!-- Method -->
|
||||
<td>{{ (f.method or 'get')|upper }}</td>
|
||||
|
||||
<!-- Inputs -->
|
||||
<td>
|
||||
{% if f.inputs and f.inputs|length > 0 %}
|
||||
<div class="chips">
|
||||
{% for inp in f.inputs %}
|
||||
<span class="chip" title="{{ (inp.name or '') ~ ' : ' ~ (inp.type or 'text') }}">
|
||||
{{ inp.name or '(unnamed)' }}<small> : {{ (inp.type or 'text') }}</small>
|
||||
</span>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
{% if form.flagged %}
|
||||
<p><strong>Flag Reasons:</strong></p>
|
||||
<ul>
|
||||
{% for reason in form.flag_reasons %}
|
||||
<li>{{ reason }}</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
{% endif %}
|
||||
</details>
|
||||
{% endfor %}
|
||||
{% else %}
|
||||
<p>No forms detected.</p>
|
||||
{% endif %}
|
||||
<p><a href="#top-jump-list">Back to top</a></p>
|
||||
</div>
|
||||
{% else %}
|
||||
<span class="text-dim">None</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
|
||||
<!-- Matches (Rules) -->
|
||||
<td>
|
||||
{% if f.rules and f.rules|length > 0 %}
|
||||
<ul>
|
||||
{% for r in f.rules %}
|
||||
<li title="{{ r.description or '' }}">
|
||||
{{ r.name }}
|
||||
{% if r.severity %}
|
||||
<span class="badge sev-{{ r.severity|lower }}">{{ r.severity|title }}</span>
|
||||
{% endif %}
|
||||
{% if r.tags %}
|
||||
{% for t in r.tags %}
|
||||
<span class="chip" title="Tag: {{ t }}">{{ t }}</span>
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
{% if r.description %}
|
||||
<small> — {{ r.description }}</small>
|
||||
{% endif %}
|
||||
</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
{% else %}
|
||||
<span class="text-dim">N/A</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
|
||||
<!-- Form Snippet -->
|
||||
<td>
|
||||
{% if f.content_snippet %}
|
||||
<details>
|
||||
<summary>View snippet ({{ f.content_snippet|length }} chars)</summary>
|
||||
<pre class="code">{{ f.content_snippet }}</pre>
|
||||
</details>
|
||||
{% else %}
|
||||
<span class="text-dim">N/A</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
{% else %}
|
||||
<p class="text-dim">No form issues detected.</p>
|
||||
{% endif %}
|
||||
|
||||
<p><a href="#top-jump-list">Back to top</a></p>
|
||||
</div>
|
||||
|
||||
|
||||
<!-- Suspicious Scripts -->
|
||||
<div class="card" id="scripts">
|
||||
<h2>Suspicious Scripts</h2>
|
||||
|
||||
{% if suspicious_scripts %}
|
||||
<table class="enrichment-table scripts-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Type</th>
|
||||
<th>Source URL</th>
|
||||
<th>Content Snippet</th>
|
||||
<th>Matches (Rules & Heuristics)</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for s in suspicious_scripts %}
|
||||
<tr>
|
||||
<!-- Type -->
|
||||
<td>{{ s.type or 'unknown' }}</td>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Type</th>
|
||||
<th>Source URL</th>
|
||||
<th>Matches (Rules & Heuristics)</th>
|
||||
<th>Content Snippet</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for s in suspicious_scripts %}
|
||||
<tr>
|
||||
<!-- Type -->
|
||||
<td>{{ s.type or 'unknown' }}</td>
|
||||
|
||||
<!-- Source URL -->
|
||||
<td class="breakable">
|
||||
{% if s.src %}
|
||||
<a href="{{ s.src }}" target="_blank">{{ s.src[:50] }}</a>
|
||||
{% else %}
|
||||
N/A
|
||||
{% endif %}
|
||||
</td>
|
||||
<!-- Source URL -->
|
||||
<td class="breakable">
|
||||
{% if s.src %}
|
||||
<a href="{{ s.src }}" target="_blank" rel="noopener">{{ s.src[:50] }}</a>
|
||||
{% else %} N/A {% endif %}
|
||||
</td>
|
||||
|
||||
<!-- Inline content snippet (collapsible) -->
|
||||
<td>
|
||||
{% if s.content_snippet %}
|
||||
<details>
|
||||
<summary>View snippet ({{ s.content_snippet|length }} chars) </summary>
|
||||
<pre class="code">({{ s.content_snippet}}</pre>
|
||||
</details>
|
||||
{% else %}
|
||||
N/A
|
||||
{% endif %}
|
||||
</td>
|
||||
<!-- Matches (Rules & Heuristics) -->
|
||||
<td data-role="matches-cell">
|
||||
{% set has_rules = s.rules and s.rules|length > 0 %}
|
||||
{% set has_heur = s.heuristics and s.heuristics|length > 0 %}
|
||||
|
||||
<!-- Rules & Heuristics -->
|
||||
<td>
|
||||
{% set has_rules = s.rules and s.rules|length > 0 %}
|
||||
{% set has_heur = s.heuristics and s.heuristics|length > 0 %}
|
||||
|
||||
{% if has_rules %}
|
||||
{% if has_rules %}
|
||||
<strong>Rules</strong>
|
||||
<ul>
|
||||
{% for r in s.rules %}
|
||||
@@ -234,23 +260,45 @@
|
||||
</ul>
|
||||
{% endif %}
|
||||
|
||||
{% if has_heur %}
|
||||
<strong>Heuristics</strong>
|
||||
<ul>
|
||||
{% for h in s.heuristics %}
|
||||
<li>{{ h }}</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
{% endif %}
|
||||
{% if has_heur %}
|
||||
<strong>Heuristics</strong>
|
||||
<ul>
|
||||
{% for h in s.heuristics %}
|
||||
<li>{{ h }}</li>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
{% endif %}
|
||||
|
||||
{% if not has_rules and not has_heur %}
|
||||
N/A
|
||||
{% if not has_rules and not has_heur %}
|
||||
<span class="text-dim">N/A</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
|
||||
<!-- Content Snippet (reused for Analyze button / dynamic snippet) -->
|
||||
<td data-role="snippet-cell">
|
||||
{% if s.content_snippet %}
|
||||
<details>
|
||||
<summary>View snippet ({{ s.content_snippet|length }} chars)</summary>
|
||||
<pre class="code">{{ s.content_snippet }}</pre>
|
||||
</details>
|
||||
{% else %}
|
||||
{% if s.type == 'external' and s.src %}
|
||||
<button
|
||||
type="button"
|
||||
class="btn btn-sm btn-primary btn-analyze-snippet"
|
||||
data-url="{{ s.src }}"
|
||||
data-job="{{ uuid }}">Analyze external script</button>
|
||||
{% else %}
|
||||
<span class="text-dim">N/A</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
|
||||
{% else %}
|
||||
<p>No suspicious scripts detected.</p>
|
||||
{% endif %}
|
||||
@@ -269,8 +317,154 @@
|
||||
<!-- Source -->
|
||||
<div class="card" id="source">
|
||||
<h2>Source</h2>
|
||||
<p><a href="{{ url_for('main.artifacts', run_uuid=uuid, filename='source.txt') }}" target="_blank">View Source</a></p>
|
||||
<p><a href="{{ url_for('main.view_artifact', run_uuid=uuid, filename='source.html') }}" target="_blank">View Source</a></p>
|
||||
<p><a href="#top-jump-list">Back to top</a></p>
|
||||
</div>
|
||||
|
||||
{% endblock %}
|
||||
|
||||
{% block page_js %}
|
||||
<script>
|
||||
/**
|
||||
* From an absolute artifact path like:
|
||||
* /data/<uuid>/scripts/fetched/0.js
|
||||
* /data/<uuid>/1755803694244.js
|
||||
* C:\data\<uuid>\1755803694244.js
|
||||
* return { uuid, rel } where rel is the path segment(s) after the uuid.
|
||||
*/
|
||||
function parseArtifactPath(artifactPath) {
|
||||
if (!artifactPath) return { uuid: null, rel: null };
|
||||
const norm = String(artifactPath).replace(/\\/g, '/'); // windows -> posix
|
||||
const re = /\/([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})\/(.+)$/;
|
||||
const m = norm.match(re);
|
||||
if (!m) return { uuid: null, rel: null };
|
||||
return { uuid: m[1], rel: m[2] };
|
||||
}
|
||||
|
||||
/** Build /view/artifact/<uuid>/<path:filename> */
|
||||
function buildViewerUrlFromAbsPath(artifactPath) {
|
||||
const { uuid, rel } = parseArtifactPath(artifactPath);
|
||||
if (!uuid || !rel) return '#';
|
||||
const encodedRel = rel.split('/').map(encodeURIComponent).join('/');
|
||||
return `/view/artifact/${encodeURIComponent(uuid)}/${encodedRel}`;
|
||||
}
|
||||
|
||||
document.addEventListener('click', function (e) {
|
||||
const btn = e.target.closest('.btn-analyze-snippet');
|
||||
if (!btn) return;
|
||||
|
||||
const row = btn.closest('tr');
|
||||
const snippetCell = btn.closest('[data-role="snippet-cell"]') || btn.parentElement;
|
||||
const matchesCell = row ? row.querySelector('[data-role="matches-cell"]') : null;
|
||||
|
||||
const url = btn.dataset.url;
|
||||
const job = btn.dataset.job;
|
||||
|
||||
// Replace button with a lightweight loading text
|
||||
const loading = document.createElement('span');
|
||||
loading.className = 'text-dim';
|
||||
loading.textContent = 'Analyzing…';
|
||||
btn.replaceWith(loading);
|
||||
|
||||
fetch('/api/analyze_script', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' }, // include CSRF header if applicable
|
||||
body: JSON.stringify({ job_id: job, url: url})
|
||||
})
|
||||
.then(r => r.json())
|
||||
.then(data => {
|
||||
if (!data.ok) {
|
||||
loading.textContent = 'Error: ' + (data.error || 'Unknown');
|
||||
return;
|
||||
}
|
||||
|
||||
// --- Build the snippet details element ---
|
||||
const snippetText = data.snippet || ''; // backend should return a preview
|
||||
const snippetLen = data.snippet_len || snippetText.length;
|
||||
|
||||
// --- File path / viewer things
|
||||
const filepath = data.artifact_path || ''; // e.g., "/data/3ec90584-076e-457c-924b-861be7e11a34/1755803694244.js"
|
||||
const viewerUrl = buildViewerUrlFromAbsPath(filepath);
|
||||
|
||||
|
||||
const details = document.createElement('details');
|
||||
const summary = document.createElement('summary');
|
||||
summary.textContent = 'View snippet (' + data.snippet_len + ' chars' + (data.truncated ? ', truncated' : '') + ', ' + data.bytes + ' bytes)';
|
||||
|
||||
const pre = document.createElement('pre');
|
||||
pre.className = 'code';
|
||||
pre.textContent = snippetText; // textContent preserves literal code safely
|
||||
|
||||
// put things in the DOM
|
||||
details.appendChild(summary);
|
||||
details.appendChild(pre);
|
||||
|
||||
const link = document.createElement('a');
|
||||
link.href = viewerUrl;
|
||||
link.target = '_blank';
|
||||
link.rel = 'noopener';
|
||||
link.textContent = 'open in viewer';
|
||||
|
||||
summary.appendChild(document.createElement('br')); // line break under the summary text
|
||||
summary.appendChild(link);
|
||||
|
||||
loading.replaceWith(details);
|
||||
|
||||
// Replace "Analyzing…" with the new details block
|
||||
loading.replaceWith(details);
|
||||
|
||||
// --- Update the Matches cell with rule findings ---
|
||||
if (matchesCell) {
|
||||
if (Array.isArray(data.findings) && data.findings.length) {
|
||||
const frag = document.createDocumentFragment();
|
||||
const strong = document.createElement('strong');
|
||||
strong.textContent = 'Rules';
|
||||
const ul = document.createElement('ul');
|
||||
|
||||
data.findings.forEach(function (f) {
|
||||
const li = document.createElement('li');
|
||||
const name = f.name || 'Rule';
|
||||
const desc = f.description ? ' — ' + f.description : '';
|
||||
li.textContent = name + desc;
|
||||
|
||||
// Optional badges for severity/tags if present
|
||||
if (f.severity) {
|
||||
const badge = document.createElement('span');
|
||||
badge.className = 'badge sev-' + String(f.severity).toLowerCase();
|
||||
badge.textContent = String(f.severity).charAt(0).toUpperCase() + String(f.severity).slice(1);
|
||||
li.appendChild(document.createTextNode(' '));
|
||||
li.appendChild(badge);
|
||||
}
|
||||
if (Array.isArray(f.tags)) {
|
||||
f.tags.forEach(function (t) {
|
||||
const chip = document.createElement('span');
|
||||
chip.className = 'chip';
|
||||
chip.title = 'Tag: ' + t;
|
||||
chip.textContent = t;
|
||||
li.appendChild(document.createTextNode(' '));
|
||||
li.appendChild(chip);
|
||||
});
|
||||
}
|
||||
|
||||
ul.appendChild(li);
|
||||
});
|
||||
|
||||
frag.appendChild(strong);
|
||||
frag.appendChild(ul);
|
||||
|
||||
// Replace placeholder N/A or existing heuristics-only content
|
||||
matchesCell.innerHTML = '';
|
||||
matchesCell.appendChild(frag);
|
||||
} else {
|
||||
matchesCell.innerHTML = '<span class="text-dim">No rule matches.</span>';
|
||||
}
|
||||
}
|
||||
})
|
||||
.catch(function (err) {
|
||||
loading.textContent = 'Request failed: ' + err;
|
||||
});
|
||||
});
|
||||
</script>
|
||||
|
||||
|
||||
{% endblock %}
|
||||
111
app/templates/viewer.html
Normal file
111
app/templates/viewer.html
Normal file
@@ -0,0 +1,111 @@
|
||||
{% extends "base.html" %}
|
||||
{% block content %}
|
||||
<div style="max-width:1100px;margin:0 auto;padding:1rem 1.25rem;">
|
||||
<header style="display:flex;align-items:center;justify-content:space-between;gap:1rem;flex-wrap:wrap;">
|
||||
<div>
|
||||
<h2 style="margin:0;font-size:1.1rem;">Code Viewer</h2>
|
||||
<div class="text-dim" style="font-size:0.9rem;">
|
||||
<strong>File:</strong> <span id="fileName">{{ filename }}</span>
|
||||
</div>
|
||||
</div>
|
||||
<div style="display:flex;gap:.5rem;align-items:center;">
|
||||
<button id="copyBtn" class="btn btn-sm">Copy</button>
|
||||
<button id="wrapBtn" class="btn btn-sm">Toggle wrap</button>
|
||||
<a id="openRaw" class="btn btn-sm" href="{{ raw_url }}" target="_blank" rel="noopener">Open raw</a>
|
||||
<a id="downloadRaw" class="btn btn-sm" href="{{ raw_url }}" download>Download</a>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<div id="viewerStatus" class="text-dim" style="margin:.5rem 0 .75rem;"></div>
|
||||
<div id="editor" style="height:72vh;border:1px solid #1f2a36;border-radius:8px;"></div>
|
||||
</div>
|
||||
|
||||
<!-- Monaco AMD loader (no integrity to avoid mismatch) -->
|
||||
<script src="https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/vs/loader.min.js"
|
||||
crossorigin="anonymous" referrerpolicy="no-referrer"></script>
|
||||
|
||||
<script>
|
||||
(function () {
|
||||
const RAW_URL = "{{ raw_url }}";
|
||||
const FILENAME = "{{ filename }}";
|
||||
const LANGUAGE = "{{ language|default('', true) }}";
|
||||
|
||||
const statusEl = document.getElementById('viewerStatus');
|
||||
|
||||
function extToLang(name) {
|
||||
if (!name) return 'plaintext';
|
||||
const m = name.toLowerCase().match(/\.([a-z0-9]+)$/);
|
||||
const ext = m ? m[1] : '';
|
||||
const map = {js:'javascript',mjs:'javascript',cjs:'javascript',ts:'typescript',json:'json',
|
||||
html:'html',htm:'html',css:'css',py:'python',sh:'shell',bash:'shell',
|
||||
yml:'yaml',yaml:'yaml',md:'markdown',txt:'plaintext',log:'plaintext'};
|
||||
return map[ext] || 'plaintext';
|
||||
}
|
||||
|
||||
// Wait until the AMD loader has defined window.require
|
||||
function waitForRequire(msLeft = 5000) {
|
||||
return new Promise((resolve, reject) => {
|
||||
const t0 = performance.now();
|
||||
(function poll() {
|
||||
if (window.require && typeof window.require === 'function') return resolve();
|
||||
if (performance.now() - t0 > msLeft) return reject(new Error('Monaco loader not available'));
|
||||
setTimeout(poll, 25);
|
||||
})();
|
||||
});
|
||||
}
|
||||
|
||||
function configureMonaco() {
|
||||
// Point AMD loader at the CDN
|
||||
require.config({ paths: { 'vs': 'https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/vs' } });
|
||||
// Worker bootstrap
|
||||
window.MonacoEnvironment = {
|
||||
getWorkerUrl: function () {
|
||||
const base = 'https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/';
|
||||
const code = "self.MonacoEnvironment={baseUrl:'" + base + "'};importScripts('" + base + "vs/base/worker/workerMain.js');";
|
||||
return 'data:text/javascript;charset=utf-8,' + encodeURIComponent(code);
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
async function main() {
|
||||
try {
|
||||
statusEl.textContent = 'Loading file…';
|
||||
await waitForRequire();
|
||||
configureMonaco();
|
||||
|
||||
const resp = await fetch(RAW_URL, { cache: 'no-store' });
|
||||
const text = await resp.text();
|
||||
|
||||
require(['vs/editor/editor.main'], function () {
|
||||
const editor = monaco.editor.create(document.getElementById('editor'), {
|
||||
value: text,
|
||||
language: LANGUAGE || extToLang(FILENAME),
|
||||
readOnly: true,
|
||||
automaticLayout: true,
|
||||
wordWrap: 'on',
|
||||
minimap: { enabled: false },
|
||||
scrollBeyondLastLine: false,
|
||||
theme: 'vs-dark'
|
||||
});
|
||||
|
||||
// Buttons
|
||||
document.getElementById('copyBtn')?.addEventListener('click', async () => {
|
||||
try { await navigator.clipboard.writeText(editor.getValue()); statusEl.textContent = 'Copied.'; }
|
||||
catch (e) { statusEl.textContent = 'Copy failed: ' + e; }
|
||||
});
|
||||
document.getElementById('wrapBtn')?.addEventListener('click', () => {
|
||||
const opts = editor.getRawOptions();
|
||||
editor.updateOptions({ wordWrap: opts.wordWrap === 'on' ? 'off' : 'on' });
|
||||
});
|
||||
|
||||
statusEl.textContent = (resp.ok ? '' : `Warning: HTTP ${resp.status}`) + (text.length ? '' : ' (empty file)');
|
||||
});
|
||||
} catch (err) {
|
||||
statusEl.textContent = 'Viewer error: ' + err.message;
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
})();
|
||||
</script>
|
||||
{% endblock %}
|
||||
@@ -33,7 +33,7 @@ from flask import current_app
|
||||
from playwright.async_api import async_playwright, TimeoutError as PWTimeoutError
|
||||
|
||||
from app.utils.io_helpers import safe_write
|
||||
from app.enrichment import enrich_url
|
||||
from app.utils.enrichment import enrich_url
|
||||
from app.utils.settings import get_settings
|
||||
|
||||
# Load settings once for constants / defaults
|
||||
@@ -202,85 +202,111 @@ class Browser:
|
||||
# -----------------------------------------------------------------------
|
||||
# Form & Script analysis (plumbing only; detection is in the rules engine)
|
||||
# -----------------------------------------------------------------------
|
||||
def analyze_forms(self, html: str, base_url: str) -> List[Dict[str, Any]]:
|
||||
def analyze_forms(self, html: str, base_url: str = "") -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Parse forms from the page HTML and apply rule-based checks (engine), keeping
|
||||
only simple plumbing heuristics here (no security logic).
|
||||
Collect form artifacts and evaluate per-form matches via the rules engine.
|
||||
Only include rows that matched at least one rule.
|
||||
|
||||
Returns list of dicts with keys:
|
||||
- action, method, inputs
|
||||
- flagged (bool), flag_reasons (list[str]), status (str)
|
||||
- rule_checks: {'checks': [...], 'summary': {...}} (per-form snippet evaluation)
|
||||
Returns list of dicts with keys (per matched form):
|
||||
- type: "form"
|
||||
- action, method, inputs
|
||||
- content_snippet: str
|
||||
- rules: List[{name, description, severity?, tags?}]
|
||||
"""
|
||||
soup = BeautifulSoup(html, "lxml")
|
||||
forms_info: List[Dict[str, Any]] = []
|
||||
page_hostname = urlparse(base_url).hostname
|
||||
results: List[Dict[str, Any]] = []
|
||||
|
||||
engine = self._get_rule_engine()
|
||||
base_hostname = urlparse(base_url).hostname or ""
|
||||
# Match how scripts picks preview len
|
||||
try:
|
||||
preview_len = getattr(settings.ui, "snippet_preview_len", 200) # keep parity with scripts
|
||||
except Exception:
|
||||
preview_len = 200
|
||||
|
||||
for form in soup.find_all("form"):
|
||||
action = form.get("action")
|
||||
method = form.get("method", "get").lower()
|
||||
try:
|
||||
action = (form.get("action") or "").strip()
|
||||
method = (form.get("method") or "get").strip().lower()
|
||||
|
||||
inputs: List[Dict[str, Any]] = []
|
||||
for inp in form.find_all("input"):
|
||||
input_name = inp.get("name")
|
||||
input_type = inp.get("type", "text")
|
||||
inputs.append({"name": input_name, "type": input_type})
|
||||
inputs: List[Dict[str, Any]] = []
|
||||
for inp in form.find_all("input"):
|
||||
inputs.append({
|
||||
"name": inp.get("name"),
|
||||
"type": (inp.get("type") or "text").strip().lower(),
|
||||
})
|
||||
|
||||
flagged_reasons: List[str] = []
|
||||
# Use the actual form markup for regex rules
|
||||
form_markup = str(form)
|
||||
# UI-friendly snippet
|
||||
content_snippet = form_markup[:preview_len]
|
||||
|
||||
if not action or str(action).strip() == "":
|
||||
flagged_reasons.append("No action specified")
|
||||
else:
|
||||
matches: List[Dict[str, Any]] = []
|
||||
if engine is not None:
|
||||
for r in getattr(engine, "rules", []):
|
||||
if getattr(r, "category", None) != "form":
|
||||
continue
|
||||
rtype = getattr(r, "rule_type", None)
|
||||
|
||||
try:
|
||||
ok = False
|
||||
reason = ""
|
||||
if rtype == "regex":
|
||||
# Run against the raw form HTML
|
||||
ok, reason = r.run(form_markup)
|
||||
elif rtype == "function":
|
||||
# Structured facts for function-style rules
|
||||
facts = {
|
||||
"category": "form",
|
||||
"base_url": base_url,
|
||||
"base_hostname": base_hostname,
|
||||
"action": action,
|
||||
"action_hostname": urlparse(action).hostname or "",
|
||||
"method": method,
|
||||
"inputs": inputs,
|
||||
"markup": form_markup,
|
||||
}
|
||||
ok, reason = r.run(facts)
|
||||
else:
|
||||
continue
|
||||
|
||||
if ok:
|
||||
matches.append({
|
||||
"name": getattr(r, "name", "unknown_rule"),
|
||||
"description": (reason or "") or getattr(r, "description", ""),
|
||||
"severity": getattr(r, "severity", None),
|
||||
"tags": getattr(r, "tags", None),
|
||||
})
|
||||
except Exception as rule_exc:
|
||||
# Be defensive—bad rule shouldn't break the form pass
|
||||
try:
|
||||
self.logger.debug("Form rule error", extra={"rule": getattr(r, "name", "?"), "error": str(rule_exc)})
|
||||
except Exception:
|
||||
pass
|
||||
continue
|
||||
|
||||
if matches:
|
||||
results.append({
|
||||
"type": "form",
|
||||
"action": action,
|
||||
"method": method,
|
||||
"inputs": inputs,
|
||||
"content_snippet": content_snippet,
|
||||
"rules": matches,
|
||||
})
|
||||
|
||||
except Exception as exc:
|
||||
# Keep analysis resilient
|
||||
try:
|
||||
action_host = urlparse(action).hostname
|
||||
if not str(action).startswith("/") and action_host != page_hostname:
|
||||
flagged_reasons.append("Submits to a different host")
|
||||
self.logger.error("Form analysis error", extra={"error": str(exc)})
|
||||
except Exception:
|
||||
pass
|
||||
results.append({
|
||||
"type": "form",
|
||||
"heuristics": [f"Form analysis error: {exc}"],
|
||||
})
|
||||
|
||||
try:
|
||||
if urlparse(action).scheme == "http" and urlparse(base_url).scheme == "https":
|
||||
flagged_reasons.append("Submits over insecure HTTP")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
for hidden in form.find_all("input", type="hidden"):
|
||||
name_value = hidden.get("name") or ""
|
||||
if "password" in name_value.lower():
|
||||
flagged_reasons.append("Hidden password field")
|
||||
|
||||
flagged = bool(flagged_reasons)
|
||||
|
||||
# Serialize a simple form snippet for rule category='form'
|
||||
snippet_lines = []
|
||||
snippet_lines.append(f"base_url={base_url}")
|
||||
snippet_lines.append(f"base_hostname={page_hostname}")
|
||||
snippet_lines.append(f"action={action}")
|
||||
snippet_lines.append(f"method={method}")
|
||||
snippet_lines.append("inputs=")
|
||||
|
||||
i = 0
|
||||
n = len(inputs)
|
||||
while i < n:
|
||||
item = inputs[i]
|
||||
snippet_lines.append(f" - name={item.get('name')} type={item.get('type')}")
|
||||
i = i + 1
|
||||
form_snippet = "\n".join(snippet_lines)
|
||||
|
||||
# Per-form rule checks (PASS/FAIL list via engine)
|
||||
rule_checks = self.run_rule_checks(form_snippet, category="form")
|
||||
|
||||
forms_info.append({
|
||||
"action": action,
|
||||
"method": method,
|
||||
"inputs": inputs,
|
||||
"flagged": flagged,
|
||||
"flag_reasons": flagged_reasons,
|
||||
"status": "flagged" if flagged else "possibly safe",
|
||||
"rule_checks": rule_checks
|
||||
})
|
||||
|
||||
return forms_info
|
||||
return results
|
||||
|
||||
def analyze_scripts(self, html: str, base_url: str = "") -> List[Dict[str, Any]]:
|
||||
"""
|
||||
@@ -370,7 +396,7 @@ class Browser:
|
||||
|
||||
Writes:
|
||||
- /data/<uuid>/screenshot.png
|
||||
- /data/<uuid>/source.txt
|
||||
- /data/<uuid>/source.html
|
||||
- /data/<uuid>/results.json (single source of truth for routes)
|
||||
|
||||
Returns:
|
||||
@@ -381,7 +407,7 @@ class Browser:
|
||||
run_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
screenshot_path = run_dir / "screenshot.png"
|
||||
source_path = run_dir / "source.txt"
|
||||
source_path = run_dir / "source.html"
|
||||
results_path = run_dir / "results.json"
|
||||
|
||||
redirects: List[Dict[str, Any]] = []
|
||||
|
||||
@@ -9,8 +9,8 @@ from ipaddress import ip_address
|
||||
import socket
|
||||
|
||||
# Local imports
|
||||
from .utils.cache_db import get_cache
|
||||
from .utils.settings import get_settings
|
||||
from app.utils.cache_db import get_cache
|
||||
from app.utils.settings import get_settings
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(level=logging.INFO, format="[%(levelname)s] %(message)s")
|
||||
@@ -39,9 +39,6 @@ def enrich_url(url: str) -> dict:
|
||||
# --- GeoIP ---
|
||||
result["geoip"] = enrich_geoip(hostname)
|
||||
|
||||
# --- BEC Words ---
|
||||
result["bec_words"] = [w for w in BEC_WORDS if w.lower() in url.lower()]
|
||||
|
||||
return result
|
||||
|
||||
|
||||
|
||||
338
app/utils/external_fetcher.py
Normal file
338
app/utils/external_fetcher.py
Normal file
@@ -0,0 +1,338 @@
|
||||
# sneakyscope/app/utils/external_fetch.py
|
||||
import hashlib
|
||||
import os
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional, Tuple, List
|
||||
from urllib.parse import urljoin, urlparse
|
||||
|
||||
import requests
|
||||
|
||||
from app.utils.settings import get_settings
|
||||
|
||||
settings = get_settings()
|
||||
|
||||
_ALLOWED_SCHEMES = {"http", "https"}
|
||||
|
||||
|
||||
@dataclass
|
||||
class FetchResult:
|
||||
"""
|
||||
Outcome for a single external script fetch.
|
||||
"""
|
||||
ok: bool
|
||||
reason: str
|
||||
source_url: str
|
||||
final_url: str
|
||||
status_code: Optional[int]
|
||||
content_type: Optional[str]
|
||||
bytes_fetched: int
|
||||
truncated: bool
|
||||
sha256_hex: Optional[str]
|
||||
saved_path: Optional[str]
|
||||
|
||||
|
||||
class ExternalScriptFetcher:
|
||||
"""
|
||||
Minimal, safe-by-default fetcher for external JS files.
|
||||
|
||||
Notes / assumptions:
|
||||
- All artifacts for this run live under the UUID-backed `results_path` you pass in.
|
||||
- Saves bytes to: <results_path>/<index>.js
|
||||
- Manual redirects up to `max_redirects`.
|
||||
- Streaming with a hard byte cap derived from `max_total_mb`.
|
||||
- Never raises network exceptions to callers; failures are encoded in FetchResult.
|
||||
- Settings are read from get_settings()['external_script_fetch'] with sane defaults.
|
||||
"""
|
||||
|
||||
def __init__(self, results_path: str, session: Optional[requests.Session] = None):
|
||||
"""
|
||||
Args:
|
||||
results_path: Absolute path to the run's UUID directory (e.g., /data/<run_uuid>).
|
||||
session: Optional requests.Session to reuse connections; a new one is created if not provided.
|
||||
"""
|
||||
# Derived value: MiB -> bytes
|
||||
self.max_total_bytes: int = settings.external_fetch.max_total_mb * 1024 * 1024
|
||||
|
||||
# Logger
|
||||
self.logger = logging.getLogger(__file__)
|
||||
|
||||
# Where to write artifacts for this job/run (UUID directory)
|
||||
self.results_path = results_path
|
||||
|
||||
# HTTP session with a predictable UA
|
||||
self.session = session or requests.Session()
|
||||
self.session.headers.update({"User-Agent": "SneakyScope/1.0"})
|
||||
|
||||
# -------------------------
|
||||
# Internal helper methods
|
||||
# -------------------------
|
||||
|
||||
def _timeout(self) -> Tuple[float, float]:
|
||||
"""
|
||||
Compute (connect_timeout, read_timeout) in seconds from max_time_ms.
|
||||
Keeps a conservative split so either phase gets a fair chance.
|
||||
"""
|
||||
total = max(0.1, settings.external_fetch.max_time_ms / 1000.0)
|
||||
connect = min(1.5, total * 0.5) # cap connect timeout
|
||||
read = max(0.5, total * 0.5) # floor read timeout
|
||||
return (connect, read)
|
||||
|
||||
def _scheme_allowed(self, url: str) -> bool:
|
||||
"""
|
||||
Return True if URL uses an allowed scheme (http/https).
|
||||
"""
|
||||
scheme = (urlparse(url).scheme or "").lower()
|
||||
return scheme in _ALLOWED_SCHEMES
|
||||
|
||||
def _artifact_path(self, index: int) -> str:
|
||||
"""
|
||||
Build an output path like:
|
||||
<results_path>/<index>.js
|
||||
|
||||
Ensures the directory exists.
|
||||
"""
|
||||
base_dir = os.path.join(self.results_path)
|
||||
# Make sure parent directories exist (idempotent)
|
||||
os.makedirs(base_dir, exist_ok=True)
|
||||
filename = f"{index}.js"
|
||||
return os.path.join(base_dir, filename)
|
||||
|
||||
# -------------------------
|
||||
# Public API
|
||||
# -------------------------
|
||||
|
||||
def fetch_one(self, script_url: str, index: int) -> FetchResult:
|
||||
"""
|
||||
Fetch exactly one external script with manual redirect handling and a hard per-file byte cap.
|
||||
|
||||
Args:
|
||||
script_url: The script URL to retrieve.
|
||||
index: Numeric index used solely for naming the artifact file (<index>.js).
|
||||
|
||||
Returns:
|
||||
FetchResult with status, metadata, and saved path (if successful).
|
||||
"""
|
||||
# Feature gate: allow callers to rely on a consistent failure when globally disabled.
|
||||
if not settings.external_fetch.enabled:
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason="Feature disabled",
|
||||
source_url=script_url,
|
||||
final_url=script_url,
|
||||
status_code=None,
|
||||
content_type=None,
|
||||
bytes_fetched=0,
|
||||
truncated=False,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
|
||||
# Scheme guard: refuse anything not http/https in this v1.
|
||||
if not self._scheme_allowed(script_url):
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason="Scheme not allowed",
|
||||
source_url=script_url,
|
||||
final_url=script_url,
|
||||
status_code=None,
|
||||
content_type=None,
|
||||
bytes_fetched=0,
|
||||
truncated=False,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
|
||||
current_url = script_url
|
||||
status_code: Optional[int] = None
|
||||
content_type: Optional[str] = None
|
||||
redirects_followed = 0
|
||||
|
||||
# Manual redirect loop so we can enforce max_redirects precisely.
|
||||
while True:
|
||||
try:
|
||||
resp = self.session.get(
|
||||
current_url,
|
||||
stream=True,
|
||||
allow_redirects=False,
|
||||
timeout=self._timeout(),
|
||||
)
|
||||
except requests.exceptions.Timeout:
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason="Timeout",
|
||||
source_url=script_url,
|
||||
final_url=current_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=0,
|
||||
truncated=False,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
except requests.exceptions.RequestException as e:
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason=f"Network error: {e.__class__.__name__}",
|
||||
source_url=script_url,
|
||||
final_url=current_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=0,
|
||||
truncated=False,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
|
||||
status_code = resp.status_code
|
||||
content_type = resp.headers.get("Content-Type")
|
||||
|
||||
# Handle redirects explicitly (3xx with Location)
|
||||
if status_code in (301, 302, 303, 307, 308) and "Location" in resp.headers:
|
||||
if redirects_followed >= settings.external_fetch.max_redirects:
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason="Max redirects exceeded",
|
||||
source_url=script_url,
|
||||
final_url=current_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=0,
|
||||
truncated=False,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
next_url = urljoin(current_url, resp.headers["Location"])
|
||||
if not self._scheme_allowed(next_url):
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason="Redirect to disallowed scheme",
|
||||
source_url=script_url,
|
||||
final_url=next_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=0,
|
||||
truncated=False,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
current_url = next_url
|
||||
redirects_followed += 1
|
||||
# Loop to follow next hop
|
||||
continue
|
||||
|
||||
# Not a redirect: stream response body with a hard byte cap.
|
||||
cap = self.max_total_bytes
|
||||
total = 0
|
||||
truncated = False
|
||||
chunks: List[bytes] = []
|
||||
|
||||
try:
|
||||
for chunk in resp.iter_content(chunk_size=8192):
|
||||
if not chunk:
|
||||
# Skip keep-alive chunks
|
||||
continue
|
||||
new_total = total + len(chunk)
|
||||
if new_total > cap:
|
||||
# Only keep what fits and stop
|
||||
remaining = cap - total
|
||||
if remaining > 0:
|
||||
chunks.append(chunk[:remaining])
|
||||
total += remaining
|
||||
truncated = True
|
||||
break
|
||||
chunks.append(chunk)
|
||||
total = new_total
|
||||
except requests.exceptions.Timeout:
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason="Timeout while reading",
|
||||
source_url=script_url,
|
||||
final_url=current_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=total,
|
||||
truncated=truncated,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
except requests.exceptions.RequestException as e:
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason=f"Network error while reading: {e.__class__.__name__}",
|
||||
source_url=script_url,
|
||||
final_url=current_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=total,
|
||||
truncated=truncated,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
|
||||
data = b"".join(chunks)
|
||||
if not data:
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason="Empty response",
|
||||
source_url=script_url,
|
||||
final_url=current_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=0,
|
||||
truncated=False,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
|
||||
# Persist to <results_path>/<index>.js
|
||||
out_path = self._artifact_path(index)
|
||||
try:
|
||||
with open(out_path, "wb") as f:
|
||||
f.write(data)
|
||||
except OSError as e:
|
||||
return FetchResult(
|
||||
ok=False,
|
||||
reason=f"Write error: {e.__class__.__name__}",
|
||||
source_url=script_url,
|
||||
final_url=current_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=total,
|
||||
truncated=truncated,
|
||||
sha256_hex=None,
|
||||
saved_path=None,
|
||||
)
|
||||
|
||||
sha256_hex = hashlib.sha256(data).hexdigest()
|
||||
|
||||
# Structured log line for visibility/metrics
|
||||
try:
|
||||
self.logger.info(
|
||||
"External script fetched",
|
||||
extra={
|
||||
"source_url": script_url,
|
||||
"final_url": current_url,
|
||||
"status": status_code,
|
||||
"bytes": total,
|
||||
"truncated": truncated,
|
||||
"sha256": sha256_hex,
|
||||
"saved_path": out_path,
|
||||
},
|
||||
)
|
||||
except Exception:
|
||||
# Logging should never break the pipeline
|
||||
pass
|
||||
|
||||
return FetchResult(
|
||||
ok=True,
|
||||
reason="OK",
|
||||
source_url=script_url,
|
||||
final_url=current_url,
|
||||
status_code=status_code,
|
||||
content_type=content_type,
|
||||
bytes_fetched=total,
|
||||
truncated=truncated,
|
||||
sha256_hex=sha256_hex,
|
||||
saved_path=out_path,
|
||||
)
|
||||
@@ -39,6 +39,14 @@ BASE_DIR = Path(__file__).resolve().parent.parent
|
||||
DEFAULT_SETTINGS_FILE = BASE_DIR / "config" / "settings.yaml"
|
||||
|
||||
# ---------- CONFIG DATA CLASSES ----------
|
||||
@dataclass
|
||||
class External_FetchConfig:
|
||||
enabled: bool = True
|
||||
max_total_mb: int = 5
|
||||
max_time_ms: int = 3000
|
||||
max_redirects: int = 3
|
||||
concurrency: int = 3
|
||||
|
||||
@dataclass
|
||||
class UIConfig:
|
||||
snippet_preview_len: int = 160
|
||||
@@ -61,6 +69,7 @@ class AppConfig:
|
||||
class Settings:
|
||||
cache: Cache_Config = field(default_factory=Cache_Config)
|
||||
ui: UIConfig = field(default_factory=UIConfig)
|
||||
external_fetch: External_FetchConfig = field(default_factory=External_FetchConfig)
|
||||
app: AppConfig = field(default_factory=AppConfig)
|
||||
|
||||
@classmethod
|
||||
|
||||
@@ -1,23 +1,18 @@
|
||||
# SneakyScope — Roadmap (Updated 8-21-25)
|
||||
|
||||
## Priority 1 – Core Analysis / Stability
|
||||
|
||||
* Opt-in “fetch external scripts” mode (off by default): on submission, download external script content (size/time limits) and run rules on fetched content.
|
||||
* Remove remaining legacy form “flagged\_reasons” plumbing once all equivalent function rules are in place.
|
||||
* Unit tests: YAML compilation, function-rule adapters, and per-script/per-form rule cases.
|
||||
* SSL/TLS intelligence: for HTTPS targets, pull certificate details from crt.sh (filtering expired); if a subdomain, also resolve the root domain to capture any wildcard certificates; probe the endpoint to enumerate supported TLS versions/ciphers and flag weak/legacy protocols.
|
||||
|
||||
## Priority 2 – API Layer
|
||||
|
||||
* API endpoints: `/screenshot`, `/source`, `/analyse`.
|
||||
* OpenAPI spec: create `openapi/openapi.yaml` and serve at `/api/openapi.yaml`.
|
||||
* **OpenAPI**: add `POST /api/analyze_script` (request/response schemas, examples) to `openapi/openapi.yaml`; serve at `/api/openapi.yaml`.
|
||||
* Docs UI: Swagger UI or Redoc at `/docs`.
|
||||
* (Nice-to-have) API JSON error consistency: handlers for 400/403/404/405/500 that always return JSON.
|
||||
|
||||
## Priority 3 – UI / UX
|
||||
|
||||
* Front page/input handling: auto-prepend `http://`/`https://`/`www.` for bare domains.
|
||||
* Source code viewer: embed page source in an editor view for readability.
|
||||
* Scripts table: toggle between “Only suspicious” and “All scripts”.
|
||||
* Rules Lab (WYSIWYG tester): paste a rule, validate/compile, run against sample text; lightweight nav entry.
|
||||
|
||||
## Priority 4 – Artifact Management & Ops
|
||||
@@ -33,6 +28,6 @@
|
||||
* Domain reputation (local feeds): build and refresh a consolidated domain/URL reputation store from URLHaus database dump and OpenPhish community dataset (scheduled pulls with dedup/normalize).
|
||||
* Threat intel connectors (settings-driven): add `settings.yaml` entries for VirusTotal and ThreatFox API keys (plus future providers); when present, enrich lookups and merge results into the unified reputation checks during analysis.
|
||||
|
||||
## Backlog / Far‑Off Plans
|
||||
## Backlog / Far-Off Plans
|
||||
|
||||
* Server profile scan: run a lightweight nmap service/banner scan on common web/alt ports (80, 443, 8000, 8080, 8443, etc.) and SSH; combine with server headers to infer stack (e.g., IIS vs. Linux/\*nix).
|
||||
|
||||
Reference in New Issue
Block a user