feat: on-demand external script analysis + code viewer; refactor form analysis to rule engine

- API: add `POST /api/analyze_script` (app/blueprints/api.py)
  - Fetch one external script to artifacts, run rules, return findings + snippet
  - Uses new ExternalScriptFetcher (results_path aware) and job UUID
  - Returns: { ok, final_url, status_code, bytes, truncated, sha256, artifact_path, findings[], snippet, snippet_len }
  - TODO: document in openapi/openapi.yaml

- Fetcher: update `app/utils/external_fetch.py`
  - Constructed with `results_path` (UUID dir); writes to `<results_path>/scripts/fetched/<index>.js`
  - Loads settings via `get_settings()`, logs via std logging

- UI (results.html):
  - Move “Analyze external script” action into **Content Snippet** column for external rows
  - Clicking replaces button with `<details>` snippet, shows rule matches, and adds “open in viewer” link
  - Robust fetch handler (checks JSON, shows errors); builds viewer URL from absolute artifact path

- Viewer:
  - New route: `GET /view/artifact/<run_uuid>/<path:filename>` (app/blueprints/ui.py)
  - New template: Monaco-based read-only code viewer (viewer.html)
  - Removes SRI on loader to avoid integrity block; loads file via `raw_url` and detects language by extension

- Forms:
  - Refactor `analyze_forms` to mirror scripts analysis:
    - Uses rule engine (`category == "form"`) across regex/function rules
    - Emits rows only when matches exist
    - Includes `content_snippet`, `action`, `method`, `inputs`, `rules`
  - Replace legacy plumbing (`flagged`, `flag_reasons`, `status`) in output
  - Normalize form function rules to canonical returns `(bool, Optional[str])`:
    - `form_action_missing`
    - `form_http_on_https_page`
    - `form_submits_to_different_host`
    - Add minor hardening (lowercasing hosts, no-op actions, clearer reasons)

- CSS: add `.forms-table` to mirror `.scripts-table` (5 columns)
  - Fixed table layout, widths per column, chip/snippet styling, responsive tweaks

- Misc:
  - Fix “working outside app context” issue by avoiding `current_app` at import time (left storage logic inside routes)
  - Add “View Source” link to open page source in viewer

Refs:
- Roadmap: mark “Source code viewer” done; keep TODO to add `/api/analyze_script` to OpenAPI
This commit is contained in:
2025-08-21 15:32:24 -05:00
parent 05cf23ad67
commit 3a24b392f2
15 changed files with 1192 additions and 218 deletions

View File

@@ -79,21 +79,6 @@
{% endfor %}
{% endif %}
<!-- BEC Words -->
{% if enrichment.bec_words %}
<h3>BEC Words Detected</h3>
<table class="enrichment-table">
<thead>
<tr><th>Word</th></tr>
</thead>
<tbody>
{% for word in enrichment.bec_words %}
<tr><td>{{ word }}</td></tr>
{% endfor %}
</tbody>
</table>
{% endif %}
{% if not enrichment.whois and not enrichment.raw_whois and not enrichment.geoip and not enrichment.bec_words %}
<p>No enrichment data available.</p>
{% endif %}
@@ -129,90 +114,131 @@
<!-- Forms -->
<div class="card" id="forms">
<h2>Forms</h2>
{% if forms %}
{% for form in forms %}
<details class="card {% if form.flagged %}flagged{% endif %}" style="padding:0.5rem; margin-bottom:0.5rem;">
<summary>{{ form.status }} — Action: {{ form.action }} ({{ form.method | upper }})</summary>
<table class="enrichment-table">
<thead>
<tr>
<th>Input Name</th>
<th>Type</th>
</tr>
</thead>
<tbody>
{% for inp in form.inputs %}
<tr>
<td>{{ inp.name }}</td>
<td>{{ inp.type }}</td>
</tr>
<h2>Forms</h2>
{% if forms and forms|length > 0 %}
<table class="enrichment-table forms-table">
<thead>
<tr>
<th>Action</th>
<th>Method</th>
<th>Inputs</th>
<th>Matches (Rules)</th>
<th>Form Snippet</th>
</tr>
</thead>
<tbody>
{% for f in forms %}
<tr>
<!-- Action -->
<td class="breakable">
{% if f.action %}
{{ f.action[:25] }}{% if f.action|length > 25 %}…{% endif %}
{% else %}
<span class="text-dim">(no action)</span>
{% endif %}
</td>
<!-- Method -->
<td>{{ (f.method or 'get')|upper }}</td>
<!-- Inputs -->
<td>
{% if f.inputs and f.inputs|length > 0 %}
<div class="chips">
{% for inp in f.inputs %}
<span class="chip" title="{{ (inp.name or '') ~ ' : ' ~ (inp.type or 'text') }}">
{{ inp.name or '(unnamed)' }}<small> : {{ (inp.type or 'text') }}</small>
</span>
{% endfor %}
</tbody>
</table>
{% if form.flagged %}
<p><strong>Flag Reasons:</strong></p>
<ul>
{% for reason in form.flag_reasons %}
<li>{{ reason }}</li>
{% endfor %}
</ul>
{% endif %}
</details>
{% endfor %}
{% else %}
<p>No forms detected.</p>
{% endif %}
<p><a href="#top-jump-list">Back to top</a></p>
</div>
{% else %}
<span class="text-dim">None</span>
{% endif %}
</td>
<!-- Matches (Rules) -->
<td>
{% if f.rules and f.rules|length > 0 %}
<ul>
{% for r in f.rules %}
<li title="{{ r.description or '' }}">
{{ r.name }}
{% if r.severity %}
<span class="badge sev-{{ r.severity|lower }}">{{ r.severity|title }}</span>
{% endif %}
{% if r.tags %}
{% for t in r.tags %}
<span class="chip" title="Tag: {{ t }}">{{ t }}</span>
{% endfor %}
{% endif %}
{% if r.description %}
<small> — {{ r.description }}</small>
{% endif %}
</li>
{% endfor %}
</ul>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
<!-- Form Snippet -->
<td>
{% if f.content_snippet %}
<details>
<summary>View snippet ({{ f.content_snippet|length }} chars)</summary>
<pre class="code">{{ f.content_snippet }}</pre>
</details>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p class="text-dim">No form issues detected.</p>
{% endif %}
<p><a href="#top-jump-list">Back to top</a></p>
</div>
<!-- Suspicious Scripts -->
<div class="card" id="scripts">
<h2>Suspicious Scripts</h2>
{% if suspicious_scripts %}
<table class="enrichment-table scripts-table">
<thead>
<tr>
<th>Type</th>
<th>Source URL</th>
<th>Content Snippet</th>
<th>Matches (Rules & Heuristics)</th>
</tr>
</thead>
<tbody>
{% for s in suspicious_scripts %}
<tr>
<!-- Type -->
<td>{{ s.type or 'unknown' }}</td>
<thead>
<tr>
<th>Type</th>
<th>Source URL</th>
<th>Matches (Rules & Heuristics)</th>
<th>Content Snippet</th>
</tr>
</thead>
<tbody>
{% for s in suspicious_scripts %}
<tr>
<!-- Type -->
<td>{{ s.type or 'unknown' }}</td>
<!-- Source URL -->
<td class="breakable">
{% if s.src %}
<a href="{{ s.src }}" target="_blank">{{ s.src[:50] }}</a>
{% else %}
N/A
{% endif %}
</td>
<!-- Source URL -->
<td class="breakable">
{% if s.src %}
<a href="{{ s.src }}" target="_blank" rel="noopener">{{ s.src[:50] }}</a>
{% else %} N/A {% endif %}
</td>
<!-- Inline content snippet (collapsible) -->
<td>
{% if s.content_snippet %}
<details>
<summary>View snippet ({{ s.content_snippet|length }} chars) </summary>
<pre class="code">({{ s.content_snippet}}</pre>
</details>
{% else %}
N/A
{% endif %}
</td>
<!-- Matches (Rules & Heuristics) -->
<td data-role="matches-cell">
{% set has_rules = s.rules and s.rules|length > 0 %}
{% set has_heur = s.heuristics and s.heuristics|length > 0 %}
<!-- Rules & Heuristics -->
<td>
{% set has_rules = s.rules and s.rules|length > 0 %}
{% set has_heur = s.heuristics and s.heuristics|length > 0 %}
{% if has_rules %}
{% if has_rules %}
<strong>Rules</strong>
<ul>
{% for r in s.rules %}
@@ -234,23 +260,45 @@
</ul>
{% endif %}
{% if has_heur %}
<strong>Heuristics</strong>
<ul>
{% for h in s.heuristics %}
<li>{{ h }}</li>
{% endfor %}
</ul>
{% endif %}
{% if has_heur %}
<strong>Heuristics</strong>
<ul>
{% for h in s.heuristics %}
<li>{{ h }}</li>
{% endfor %}
</ul>
{% endif %}
{% if not has_rules and not has_heur %}
N/A
{% if not has_rules and not has_heur %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
<!-- Content Snippet (reused for Analyze button / dynamic snippet) -->
<td data-role="snippet-cell">
{% if s.content_snippet %}
<details>
<summary>View snippet ({{ s.content_snippet|length }} chars)</summary>
<pre class="code">{{ s.content_snippet }}</pre>
</details>
{% else %}
{% if s.type == 'external' and s.src %}
<button
type="button"
class="btn btn-sm btn-primary btn-analyze-snippet"
data-url="{{ s.src }}"
data-job="{{ uuid }}">Analyze external script</button>
{% else %}
<span class="text-dim">N/A</span>
{% endif %}
</td>
</tr>
{% endfor %}
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<p>No suspicious scripts detected.</p>
{% endif %}
@@ -269,8 +317,154 @@
<!-- Source -->
<div class="card" id="source">
<h2>Source</h2>
<p><a href="{{ url_for('main.artifacts', run_uuid=uuid, filename='source.txt') }}" target="_blank">View Source</a></p>
<p><a href="{{ url_for('main.view_artifact', run_uuid=uuid, filename='source.html') }}" target="_blank">View Source</a></p>
<p><a href="#top-jump-list">Back to top</a></p>
</div>
{% endblock %}
{% block page_js %}
<script>
/**
* From an absolute artifact path like:
* /data/<uuid>/scripts/fetched/0.js
* /data/<uuid>/1755803694244.js
* C:\data\<uuid>\1755803694244.js
* return { uuid, rel } where rel is the path segment(s) after the uuid.
*/
function parseArtifactPath(artifactPath) {
if (!artifactPath) return { uuid: null, rel: null };
const norm = String(artifactPath).replace(/\\/g, '/'); // windows -> posix
const re = /\/([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})\/(.+)$/;
const m = norm.match(re);
if (!m) return { uuid: null, rel: null };
return { uuid: m[1], rel: m[2] };
}
/** Build /view/artifact/<uuid>/<path:filename> */
function buildViewerUrlFromAbsPath(artifactPath) {
const { uuid, rel } = parseArtifactPath(artifactPath);
if (!uuid || !rel) return '#';
const encodedRel = rel.split('/').map(encodeURIComponent).join('/');
return `/view/artifact/${encodeURIComponent(uuid)}/${encodedRel}`;
}
document.addEventListener('click', function (e) {
const btn = e.target.closest('.btn-analyze-snippet');
if (!btn) return;
const row = btn.closest('tr');
const snippetCell = btn.closest('[data-role="snippet-cell"]') || btn.parentElement;
const matchesCell = row ? row.querySelector('[data-role="matches-cell"]') : null;
const url = btn.dataset.url;
const job = btn.dataset.job;
// Replace button with a lightweight loading text
const loading = document.createElement('span');
loading.className = 'text-dim';
loading.textContent = 'Analyzing…';
btn.replaceWith(loading);
fetch('/api/analyze_script', {
method: 'POST',
headers: { 'Content-Type': 'application/json' }, // include CSRF header if applicable
body: JSON.stringify({ job_id: job, url: url})
})
.then(r => r.json())
.then(data => {
if (!data.ok) {
loading.textContent = 'Error: ' + (data.error || 'Unknown');
return;
}
// --- Build the snippet details element ---
const snippetText = data.snippet || ''; // backend should return a preview
const snippetLen = data.snippet_len || snippetText.length;
// --- File path / viewer things
const filepath = data.artifact_path || ''; // e.g., "/data/3ec90584-076e-457c-924b-861be7e11a34/1755803694244.js"
const viewerUrl = buildViewerUrlFromAbsPath(filepath);
const details = document.createElement('details');
const summary = document.createElement('summary');
summary.textContent = 'View snippet (' + data.snippet_len + ' chars' + (data.truncated ? ', truncated' : '') + ', ' + data.bytes + ' bytes)';
const pre = document.createElement('pre');
pre.className = 'code';
pre.textContent = snippetText; // textContent preserves literal code safely
// put things in the DOM
details.appendChild(summary);
details.appendChild(pre);
const link = document.createElement('a');
link.href = viewerUrl;
link.target = '_blank';
link.rel = 'noopener';
link.textContent = 'open in viewer';
summary.appendChild(document.createElement('br')); // line break under the summary text
summary.appendChild(link);
loading.replaceWith(details);
// Replace "Analyzing…" with the new details block
loading.replaceWith(details);
// --- Update the Matches cell with rule findings ---
if (matchesCell) {
if (Array.isArray(data.findings) && data.findings.length) {
const frag = document.createDocumentFragment();
const strong = document.createElement('strong');
strong.textContent = 'Rules';
const ul = document.createElement('ul');
data.findings.forEach(function (f) {
const li = document.createElement('li');
const name = f.name || 'Rule';
const desc = f.description ? ' — ' + f.description : '';
li.textContent = name + desc;
// Optional badges for severity/tags if present
if (f.severity) {
const badge = document.createElement('span');
badge.className = 'badge sev-' + String(f.severity).toLowerCase();
badge.textContent = String(f.severity).charAt(0).toUpperCase() + String(f.severity).slice(1);
li.appendChild(document.createTextNode(' '));
li.appendChild(badge);
}
if (Array.isArray(f.tags)) {
f.tags.forEach(function (t) {
const chip = document.createElement('span');
chip.className = 'chip';
chip.title = 'Tag: ' + t;
chip.textContent = t;
li.appendChild(document.createTextNode(' '));
li.appendChild(chip);
});
}
ul.appendChild(li);
});
frag.appendChild(strong);
frag.appendChild(ul);
// Replace placeholder N/A or existing heuristics-only content
matchesCell.innerHTML = '';
matchesCell.appendChild(frag);
} else {
matchesCell.innerHTML = '<span class="text-dim">No rule matches.</span>';
}
}
})
.catch(function (err) {
loading.textContent = 'Request failed: ' + err;
});
});
</script>
{% endblock %}