feat: on-demand external script analysis + code viewer; refactor form analysis to rule engine

- API: add `POST /api/analyze_script` (app/blueprints/api.py)
  - Fetch one external script to artifacts, run rules, return findings + snippet
  - Uses new ExternalScriptFetcher (results_path aware) and job UUID
  - Returns: { ok, final_url, status_code, bytes, truncated, sha256, artifact_path, findings[], snippet, snippet_len }
  - TODO: document in openapi/openapi.yaml

- Fetcher: update `app/utils/external_fetch.py`
  - Constructed with `results_path` (UUID dir); writes to `<results_path>/scripts/fetched/<index>.js`
  - Loads settings via `get_settings()`, logs via std logging

- UI (results.html):
  - Move “Analyze external script” action into **Content Snippet** column for external rows
  - Clicking replaces button with `<details>` snippet, shows rule matches, and adds “open in viewer” link
  - Robust fetch handler (checks JSON, shows errors); builds viewer URL from absolute artifact path

- Viewer:
  - New route: `GET /view/artifact/<run_uuid>/<path:filename>` (app/blueprints/ui.py)
  - New template: Monaco-based read-only code viewer (viewer.html)
  - Removes SRI on loader to avoid integrity block; loads file via `raw_url` and detects language by extension

- Forms:
  - Refactor `analyze_forms` to mirror scripts analysis:
    - Uses rule engine (`category == "form"`) across regex/function rules
    - Emits rows only when matches exist
    - Includes `content_snippet`, `action`, `method`, `inputs`, `rules`
  - Replace legacy plumbing (`flagged`, `flag_reasons`, `status`) in output
  - Normalize form function rules to canonical returns `(bool, Optional[str])`:
    - `form_action_missing`
    - `form_http_on_https_page`
    - `form_submits_to_different_host`
    - Add minor hardening (lowercasing hosts, no-op actions, clearer reasons)

- CSS: add `.forms-table` to mirror `.scripts-table` (5 columns)
  - Fixed table layout, widths per column, chip/snippet styling, responsive tweaks

- Misc:
  - Fix “working outside app context” issue by avoiding `current_app` at import time (left storage logic inside routes)
  - Add “View Source” link to open page source in viewer

Refs:
- Roadmap: mark “Source code viewer” done; keep TODO to add `/api/analyze_script` to OpenAPI
This commit is contained in:
2025-08-21 15:32:24 -05:00
parent 05cf23ad67
commit 3a24b392f2
15 changed files with 1192 additions and 218 deletions

View File

@@ -30,4 +30,7 @@
<small>{{ app_name }} - A self-hosted URL analysis sandbox - {{ app_version }}</small>
</footer>
</body>
</html>
</html>
{% block page_js %}
{% endblock %}

View File

@@ -90,6 +90,9 @@
100% { transform: rotate(360deg); }
}
</style>
{% endblock %}
{% block page_js %}
<script>
const form = document.getElementById('analyze-form');

View File

@@ -79,21 +79,6 @@
{% endfor %}
{% endif %}
<!-- BEC Words -->
{% if enrichment.bec_words %}
<h3>BEC Words Detected</h3>
<table class="enrichment-table">
<thead>
<tr><th>Word</th></tr>
</thead>
<tbody>
{% for word in enrichment.bec_words %}
<tr><td>{{ word }}</td></tr>
{% endfor %}
</tbody>
</table>
{% endif %}
{% if not enrichment.whois and not enrichment.raw_whois and not enrichment.geoip and not enrichment.bec_words %}
<p>No enrichment data available.</p>
{% endif %}
@@ -129,90 +114,131 @@
<!-- Forms -->
<div class="card" id="forms">
<h2>Forms</h2>
{% if forms %}
{% for form in forms %}
<details class="card {% if form.flagged %}flagged{% endif %}" style="padding:0.5rem; margin-bottom:0.5rem;">
<summary>{{ form.status }} — Action: {{ form.action }} ({{ form.method | upper }})</summary>
<table class="enrichment-table">
<thead>
<tr>
<th>Input Name</th>
<th>Type</th>
</tr>
</thead>
<tbody>
{% for inp in form.inputs %}
<tr>
<td>{{ inp.name }}</td>
<td>{{ inp.type }}</td>
</tr>
<h2>Forms</h2>
{% if forms and forms|length > 0 %}
<table class="enrichment-table forms-table">
<thead>
<tr>
<th>Action</th>
<th>Method</th>
<th>Inputs</th>
<th>Matches (Rules)</th>
<th>Form Snippet</th>
</tr>
</thead>
<tbody>
{% for f in forms %}
<tr>
<!-- Action -->
<td class="breakable">
{% if f.action %}
{{ f.action[:25] }}{% if f.action|length > 25 %}…{% endif %}
{% else %}
<span class="text-dim">(no action)</span>
{% endif %}
</td>
<!-- Method -->
<td>{{ (f.method or 'get')|upper }}</td>
<!-- Inputs -->
<td>
{% if f.inputs and f.inputs|length > 0 %}
<div class="chips">
{% for inp in f.inputs %}
<span class="chip" title="{{ (inp.name or '') ~ ' : ' ~ (inp.type or 'text') }}">
{{ inp.name or '(unnamed)' }}<small> : {{ (inp.type or 'text') }}</small>
</span>
{% endfor %}
</tbody>
</table>
{% if form.flagged %}
<p><strong>Flag Reasons:</strong></p>
<ul>
{% for reason in form.flag_reasons %}
<li>{{ reason }}</li>
{% endfor %}
</ul>
{% endif %}
</details>
{% endfor %}
{% else %}
<p>No forms detected.</p>
{% endif %}
<p><a href="#top-jump-list">Back to top</a></p>
</div>
{% else %}
<span class="text-dim">None</span>
{% endif %}
</td>
<!-- Matches (Rules) -->
<td>
{% if f.rules and f.rules|length > 0 %}
<ul>
{% for r in f.rules %}
<li title="{{ r.description or '' }}">
{{ r.name }}
{% if r.severity %}
<span class="badge sev-{{ r.severity|lower }}">{{ r.severity|title }}</span>
{% endif %}
{% if r.tags %}
{% for t in r.tags %}
<span class="chip" title="Tag: {{ t }}">{{ t }}</span>
{% endfor %}
{% endif %}
{% if r.description %}
<small> — {{ r.description }}</small>
{% endif %}
</li>
{% endfor %}
</ul>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
<!-- Form Snippet -->
<td>
{% if f.content_snippet %}
<details>
<summary>View snippet ({{ f.content_snippet|length }} chars)</summary>
<pre class="code">{{ f.content_snippet }}</pre>
</details>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p class="text-dim">No form issues detected.</p>
{% endif %}
<p><a href="#top-jump-list">Back to top</a></p>
</div>
<!-- Suspicious Scripts -->
<div class="card" id="scripts">
<h2>Suspicious Scripts</h2>
{% if suspicious_scripts %}
<table class="enrichment-table scripts-table">
<thead>
<tr>
<th>Type</th>
<th>Source URL</th>
<th>Content Snippet</th>
<th>Matches (Rules & Heuristics)</th>
</tr>
</thead>
<tbody>
{% for s in suspicious_scripts %}
<tr>
<!-- Type -->
<td>{{ s.type or 'unknown' }}</td>
<thead>
<tr>
<th>Type</th>
<th>Source URL</th>
<th>Matches (Rules & Heuristics)</th>
<th>Content Snippet</th>
</tr>
</thead>
<tbody>
{% for s in suspicious_scripts %}
<tr>
<!-- Type -->
<td>{{ s.type or 'unknown' }}</td>
<!-- Source URL -->
<td class="breakable">
{% if s.src %}
<a href="{{ s.src }}" target="_blank">{{ s.src[:50] }}</a>
{% else %}
N/A
{% endif %}
</td>
<!-- Source URL -->
<td class="breakable">
{% if s.src %}
<a href="{{ s.src }}" target="_blank" rel="noopener">{{ s.src[:50] }}</a>
{% else %} N/A {% endif %}
</td>
<!-- Inline content snippet (collapsible) -->
<td>
{% if s.content_snippet %}
<details>
<summary>View snippet ({{ s.content_snippet|length }} chars) </summary>
<pre class="code">({{ s.content_snippet}}</pre>
</details>
{% else %}
N/A
{% endif %}
</td>
<!-- Matches (Rules & Heuristics) -->
<td data-role="matches-cell">
{% set has_rules = s.rules and s.rules|length > 0 %}
{% set has_heur = s.heuristics and s.heuristics|length > 0 %}
<!-- Rules & Heuristics -->
<td>
{% set has_rules = s.rules and s.rules|length > 0 %}
{% set has_heur = s.heuristics and s.heuristics|length > 0 %}
{% if has_rules %}
{% if has_rules %}
<strong>Rules</strong>
<ul>
{% for r in s.rules %}
@@ -234,23 +260,45 @@
</ul>
{% endif %}
{% if has_heur %}
<strong>Heuristics</strong>
<ul>
{% for h in s.heuristics %}
<li>{{ h }}</li>
{% endfor %}
</ul>
{% endif %}
{% if has_heur %}
<strong>Heuristics</strong>
<ul>
{% for h in s.heuristics %}
<li>{{ h }}</li>
{% endfor %}
</ul>
{% endif %}
{% if not has_rules and not has_heur %}
N/A
{% if not has_rules and not has_heur %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
<!-- Content Snippet (reused for Analyze button / dynamic snippet) -->
<td data-role="snippet-cell">
{% if s.content_snippet %}
<details>
<summary>View snippet ({{ s.content_snippet|length }} chars)</summary>
<pre class="code">{{ s.content_snippet }}</pre>
</details>
{% else %}
{% if s.type == 'external' and s.src %}
<button
type="button"
class="btn btn-sm btn-primary btn-analyze-snippet"
data-url="{{ s.src }}"
data-job="{{ uuid }}">Analyze external script</button>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
</tr>
{% endfor %}
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p>No suspicious scripts detected.</p>
{% endif %}
@@ -269,8 +317,154 @@
<!-- Source -->
<div class="card" id="source">
<h2>Source</h2>
<p><a href="{{ url_for('main.artifacts', run_uuid=uuid, filename='source.txt') }}" target="_blank">View Source</a></p>
<p><a href="{{ url_for('main.view_artifact', run_uuid=uuid, filename='source.html') }}" target="_blank">View Source</a></p>
<p><a href="#top-jump-list">Back to top</a></p>
</div>
{% endblock %}
{% block page_js %}
<script>
/**
* From an absolute artifact path like:
* /data/<uuid>/scripts/fetched/0.js
* /data/<uuid>/1755803694244.js
* C:\data\<uuid>\1755803694244.js
* return { uuid, rel } where rel is the path segment(s) after the uuid.
*/
function parseArtifactPath(artifactPath) {
if (!artifactPath) return { uuid: null, rel: null };
const norm = String(artifactPath).replace(/\\/g, '/'); // windows -> posix
const re = /\/([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})\/(.+)$/;
const m = norm.match(re);
if (!m) return { uuid: null, rel: null };
return { uuid: m[1], rel: m[2] };
}
/** Build /view/artifact/<uuid>/<path:filename> */
function buildViewerUrlFromAbsPath(artifactPath) {
const { uuid, rel } = parseArtifactPath(artifactPath);
if (!uuid || !rel) return '#';
const encodedRel = rel.split('/').map(encodeURIComponent).join('/');
return `/view/artifact/${encodeURIComponent(uuid)}/${encodedRel}`;
}
document.addEventListener('click', function (e) {
const btn = e.target.closest('.btn-analyze-snippet');
if (!btn) return;
const row = btn.closest('tr');
const snippetCell = btn.closest('[data-role="snippet-cell"]') || btn.parentElement;
const matchesCell = row ? row.querySelector('[data-role="matches-cell"]') : null;
const url = btn.dataset.url;
const job = btn.dataset.job;
// Replace button with a lightweight loading text
const loading = document.createElement('span');
loading.className = 'text-dim';
loading.textContent = 'Analyzing…';
btn.replaceWith(loading);
fetch('/api/analyze_script', {
method: 'POST',
headers: { 'Content-Type': 'application/json' }, // include CSRF header if applicable
body: JSON.stringify({ job_id: job, url: url})
})
.then(r => r.json())
.then(data => {
if (!data.ok) {
loading.textContent = 'Error: ' + (data.error || 'Unknown');
return;
}
// --- Build the snippet details element ---
const snippetText = data.snippet || ''; // backend should return a preview
const snippetLen = data.snippet_len || snippetText.length;
// --- File path / viewer things
const filepath = data.artifact_path || ''; // e.g., "/data/3ec90584-076e-457c-924b-861be7e11a34/1755803694244.js"
const viewerUrl = buildViewerUrlFromAbsPath(filepath);
const details = document.createElement('details');
const summary = document.createElement('summary');
summary.textContent = 'View snippet (' + data.snippet_len + ' chars' + (data.truncated ? ', truncated' : '') + ', ' + data.bytes + ' bytes)';
const pre = document.createElement('pre');
pre.className = 'code';
pre.textContent = snippetText; // textContent preserves literal code safely
// put things in the DOM
details.appendChild(summary);
details.appendChild(pre);
const link = document.createElement('a');
link.href = viewerUrl;
link.target = '_blank';
link.rel = 'noopener';
link.textContent = 'open in viewer';
summary.appendChild(document.createElement('br')); // line break under the summary text
summary.appendChild(link);
loading.replaceWith(details);
// Replace "Analyzing…" with the new details block
loading.replaceWith(details);
// --- Update the Matches cell with rule findings ---
if (matchesCell) {
if (Array.isArray(data.findings) && data.findings.length) {
const frag = document.createDocumentFragment();
const strong = document.createElement('strong');
strong.textContent = 'Rules';
const ul = document.createElement('ul');
data.findings.forEach(function (f) {
const li = document.createElement('li');
const name = f.name || 'Rule';
const desc = f.description ? ' — ' + f.description : '';
li.textContent = name + desc;
// Optional badges for severity/tags if present
if (f.severity) {
const badge = document.createElement('span');
badge.className = 'badge sev-' + String(f.severity).toLowerCase();
badge.textContent = String(f.severity).charAt(0).toUpperCase() + String(f.severity).slice(1);
li.appendChild(document.createTextNode(' '));
li.appendChild(badge);
}
if (Array.isArray(f.tags)) {
f.tags.forEach(function (t) {
const chip = document.createElement('span');
chip.className = 'chip';
chip.title = 'Tag: ' + t;
chip.textContent = t;
li.appendChild(document.createTextNode(' '));
li.appendChild(chip);
});
}
ul.appendChild(li);
});
frag.appendChild(strong);
frag.appendChild(ul);
// Replace placeholder N/A or existing heuristics-only content
matchesCell.innerHTML = '';
matchesCell.appendChild(frag);
} else {
matchesCell.innerHTML = '<span class="text-dim">No rule matches.</span>';
}
}
})
.catch(function (err) {
loading.textContent = 'Request failed: ' + err;
});
});
</script>
{% endblock %}

111
app/templates/viewer.html Normal file
View File

@@ -0,0 +1,111 @@
{% extends "base.html" %}
{% block content %}
<div style="max-width:1100px;margin:0 auto;padding:1rem 1.25rem;">
<header style="display:flex;align-items:center;justify-content:space-between;gap:1rem;flex-wrap:wrap;">
<div>
<h2 style="margin:0;font-size:1.1rem;">Code Viewer</h2>
<div class="text-dim" style="font-size:0.9rem;">
<strong>File:</strong> <span id="fileName">{{ filename }}</span>
</div>
</div>
<div style="display:flex;gap:.5rem;align-items:center;">
<button id="copyBtn" class="btn btn-sm">Copy</button>
<button id="wrapBtn" class="btn btn-sm">Toggle wrap</button>
<a id="openRaw" class="btn btn-sm" href="{{ raw_url }}" target="_blank" rel="noopener">Open raw</a>
<a id="downloadRaw" class="btn btn-sm" href="{{ raw_url }}" download>Download</a>
</div>
</header>
<div id="viewerStatus" class="text-dim" style="margin:.5rem 0 .75rem;"></div>
<div id="editor" style="height:72vh;border:1px solid #1f2a36;border-radius:8px;"></div>
</div>
<!-- Monaco AMD loader (no integrity to avoid mismatch) -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/vs/loader.min.js"
crossorigin="anonymous" referrerpolicy="no-referrer"></script>
<script>
(function () {
const RAW_URL = "{{ raw_url }}";
const FILENAME = "{{ filename }}";
const LANGUAGE = "{{ language|default('', true) }}";
const statusEl = document.getElementById('viewerStatus');
function extToLang(name) {
if (!name) return 'plaintext';
const m = name.toLowerCase().match(/\.([a-z0-9]+)$/);
const ext = m ? m[1] : '';
const map = {js:'javascript',mjs:'javascript',cjs:'javascript',ts:'typescript',json:'json',
html:'html',htm:'html',css:'css',py:'python',sh:'shell',bash:'shell',
yml:'yaml',yaml:'yaml',md:'markdown',txt:'plaintext',log:'plaintext'};
return map[ext] || 'plaintext';
}
// Wait until the AMD loader has defined window.require
function waitForRequire(msLeft = 5000) {
return new Promise((resolve, reject) => {
const t0 = performance.now();
(function poll() {
if (window.require && typeof window.require === 'function') return resolve();
if (performance.now() - t0 > msLeft) return reject(new Error('Monaco loader not available'));
setTimeout(poll, 25);
})();
});
}
function configureMonaco() {
// Point AMD loader at the CDN
require.config({ paths: { 'vs': 'https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/vs' } });
// Worker bootstrap
window.MonacoEnvironment = {
getWorkerUrl: function () {
const base = 'https://cdnjs.cloudflare.com/ajax/libs/monaco-editor/0.49.0/min/';
const code = "self.MonacoEnvironment={baseUrl:'" + base + "'};importScripts('" + base + "vs/base/worker/workerMain.js');";
return 'data:text/javascript;charset=utf-8,' + encodeURIComponent(code);
}
};
}
async function main() {
try {
statusEl.textContent = 'Loading file…';
await waitForRequire();
configureMonaco();
const resp = await fetch(RAW_URL, { cache: 'no-store' });
const text = await resp.text();
require(['vs/editor/editor.main'], function () {
const editor = monaco.editor.create(document.getElementById('editor'), {
value: text,
language: LANGUAGE || extToLang(FILENAME),
readOnly: true,
automaticLayout: true,
wordWrap: 'on',
minimap: { enabled: false },
scrollBeyondLastLine: false,
theme: 'vs-dark'
});
// Buttons
document.getElementById('copyBtn')?.addEventListener('click', async () => {
try { await navigator.clipboard.writeText(editor.getValue()); statusEl.textContent = 'Copied.'; }
catch (e) { statusEl.textContent = 'Copy failed: ' + e; }
});
document.getElementById('wrapBtn')?.addEventListener('click', () => {
const opts = editor.getRawOptions();
editor.updateOptions({ wordWrap: opts.wordWrap === 'on' ? 'off' : 'on' });
});
statusEl.textContent = (resp.ok ? '' : `Warning: HTTP ${resp.status}`) + (text.length ? '' : ' (empty file)');
});
} catch (err) {
statusEl.textContent = 'Viewer error: ' + err.message;
}
}
main();
})();
</script>
{% endblock %}