SneakyScope

Author	SHA1	Message	Date
Phillip Tarrant	55cd81aec0	feat(text): add text analysis pipeline & surface results in UI - engine: add analyse_text() to extract visible page text and evaluate category="text" rules; collect matched phrases and expose as `content_snippet` (deduped, length-capped via settings.ui.snippet_preview_len). - engine: removed unused code - browser: removed double call for enrichment - engine: improve regex compilation — honor per-rule flags (string or list) and default IGNORECASE when category=="text". - engine: add dispatch logging "[engine] applying categories: …" gated by settings.app.print_rule_dispatch. - ui(templates): add `templates/partials/result_text.html` mirroring the forms table; renders page-level records and their matched rules. - ui(controller): wire `analyse_text()` into scan path and expose `payload["suspicious_text"]`. - rules(text): add `identity_verification_prompt`, `gated_document_access`, `email_collection_prompt`; broaden `credential_reset`. fix: text indicators were not displayed due to missing analyzer and mismatched result shape. Result shape: suspicious_text: [ { "type": "page", "content_snippet": "...matched phrases…", "rules": [ {"name": "...", "description": "...", "severity": "medium", "tags": ["..."]} ] } ]	2025-08-22 17:18:50 -05:00
Phillip Tarrant	693f7d67b9	feat: HTTPS auto-normalization; robust TLS intel UI; global rules state; clean logging; preload - Add SSL/TLS intelligence pipeline: - crt.sh lookup with expired-filtering and root-domain wildcard resolution - live TLS version/cipher probe with weak/legacy flags and probe notes - UI: card + matrix rendering, raw JSON toggle, and host/wildcard cert lists - Front page: checkbox to optionally fetch certificate/CT data - Introduce `URLNormalizer` with punycode support and typo repair - Auto-prepend `https://` for bare domains (e.g., `google.com`) - Optional quick HTTPS reachability + `http://` fallback - Provide singleton via function-cached `@singleton_loader`: - `get_url_normalizer()` reads defaults from Settings (if present) - Standardize function-rule return shape to `(bool, dict\|None)` across `form_` and `script_` rules; include structured payloads (`note`, hosts, ext, etc.) - Harden `FunctionRuleAdapter`: - Coerce legacy returns `(bool)`, `(bool, str)` → normalized outputs - Adapt non-dict inputs to facts (category-aware and via provided adapter) - Return `(True, dict)` on match, `(False, None)` on miss - Bind-time logging with file:line + function id for diagnostics - `RuleEngine`: - Back rules by private `self._rules`; `rules` property returns copy - Idempotent `add_rule(replace=False)` with in-place replace and regex (re)compile - Fix AttributeError from property assignment during `__init__` - Replace hidden singleton factory with explicit builder + global state: - `app/rules/factory.py::build_rules_engine()` builds and logs totals - `app/state.py` exposes `set_rules_engine()` / `get_rules_engine()` as the SOF - `app/wsgi.py` builds once at preload and publishes via `set_rules_engine()` - Add lightweight debug hooks (`SS_DEBUG_RULES=1`) to trace engine id and rule counts - Unify logging wiring: - `wire_logging_once(app)` clears and attaches a single handler chain - Create two named loggers: `sneakyscope.app` and `sneakyscope.engine` - Disable propagation to prevent dupes; include pid/logger name in format - Remove stray/duplicate handlers and import-time logging - Optional dedup filter for bursty repeats (kept off by default) - Gunicorn: enable `--preload` in entrypoint to avoid thread races and double registration - Documented foreground vs background log “double consumer” caveat (attach vs `compose logs`) - Jinja: replace `{% return %}` with structured `if/elif/else` branches - Add toggle button to show raw JSON for TLS/CT section - Consumers should import the rules engine via: - `from app.state import get_rules_engine` - Use `build_rules_engine()` only during preload/init to construct the instance, then publish with `set_rules_engine()`. Do not call old singleton factories. - New/changed modules (high level): - `app/utils/urltools.py` (+) — URLNormalizer + `get_url_normalizer()` - `app/rules/function_rules.py` (±) — normalized payload returns - `engine/function_rule_adapter.py` (±) — coercion, fact adaptation, bind logs - `app/utils/rules_engine.py` (±) — `_rules`, idempotent `add_rule`, fixes - `app/rules/factory.py` (±) — pure builder; totals logged post-registration - `app/state.py` (+) — process-global rules engine - `app/logging_setup.py` (±) — single chain, two named loggers - `app/wsgi.py` (±) — preload build + `set_rules_engine()` - `entrypoint.sh` (±) — add `--preload` - templates (±) — TLS card, raw toggle; front-page checkbox Closes: flaky rule-type warnings, duplicate logs, and multi-worker race on rules init.	2025-08-21 22:05:16 -05:00
Phillip Tarrant	3a24b392f2	feat: on-demand external script analysis + code viewer; refactor form analysis to rule engine - API: add `POST /api/analyze_script` (app/blueprints/api.py) - Fetch one external script to artifacts, run rules, return findings + snippet - Uses new ExternalScriptFetcher (results_path aware) and job UUID - Returns: { ok, final_url, status_code, bytes, truncated, sha256, artifact_path, findings[], snippet, snippet_len } - TODO: document in openapi/openapi.yaml - Fetcher: update `app/utils/external_fetch.py` - Constructed with `results_path` (UUID dir); writes to `<results_path>/scripts/fetched/<index>.js` - Loads settings via `get_settings()`, logs via std logging - UI (results.html): - Move “Analyze external script” action into Content Snippet column for external rows - Clicking replaces button with `<details>` snippet, shows rule matches, and adds “open in viewer” link - Robust fetch handler (checks JSON, shows errors); builds viewer URL from absolute artifact path - Viewer: - New route: `GET /view/artifact/<run_uuid>/<path:filename>` (app/blueprints/ui.py) - New template: Monaco-based read-only code viewer (viewer.html) - Removes SRI on loader to avoid integrity block; loads file via `raw_url` and detects language by extension - Forms: - Refactor `analyze_forms` to mirror scripts analysis: - Uses rule engine (`category == "form"`) across regex/function rules - Emits rows only when matches exist - Includes `content_snippet`, `action`, `method`, `inputs`, `rules` - Replace legacy plumbing (`flagged`, `flag_reasons`, `status`) in output - Normalize form function rules to canonical returns `(bool, Optional[str])`: - `form_action_missing` - `form_http_on_https_page` - `form_submits_to_different_host` - Add minor hardening (lowercasing hosts, no-op actions, clearer reasons) - CSS: add `.forms-table` to mirror `.scripts-table` (5 columns) - Fixed table layout, widths per column, chip/snippet styling, responsive tweaks - Misc: - Fix “working outside app context” issue by avoiding `current_app` at import time (left storage logic inside routes) - Add “View Source” link to open page source in viewer Refs: - Roadmap: mark “Source code viewer” done; keep TODO to add `/api/analyze_script` to OpenAPI	2025-08-21 15:32:24 -05:00
Phillip Tarrant	1eb2a52f17	feat(engine,ui): unify detection in rules engine, add function rules & per-script matches; improve scripts table UX Core changes - Centralize detection in the Rules Engine; browser.py now focuses on fetch/extract/persist. - Add class-based adapters: - FactAdapter: converts snippets → structured facts. - FunctionRuleAdapter: wraps dict-based rule functions for engine input (str or dict). - Register function rules (code-based) alongside YAML rules: - form_action_missing - form_http_on_https_page - form_submits_to_different_host - script_src_uses_data_or_blob - script_src_has_dangerous_extension - script_third_party_host Rules & YAML - Expand/normalize YAML rules with severities + tags; tighten patterns. - Add new regex rules: new_function_usage, unescape_usage, string_timer_usage, long_hex_constants. - Move iframe rule to `text` category. - Keep existing script/form/text rules; all compile under IGNORECASE. Browser / analysis refactor - browser.py: - Remove inline heuristics; rely on engine for PASS/FAIL, reason, severity, tags. - Build page-level overview (`rule_checks`) across categories. - Analyze forms: add `base_url` + `base_hostname` to snippet so function rules can evaluate; include per-form rule_checks. - Analyze scripts: per-script evaluation: - Inline -> run regex script rules on inline text. - External -> run function script rules with a facts dict (src/src_hostname/base_url/base_hostname). - Only include scripts that matched ≥1 rule; attach severity/tags to matches. - Persist single source of truth: `/data/<uuid>/results.json`. - Backward-compat: `fetch_page_artifacts(..., engine=...)` kwarg accepted/ignored. UI/UX - Suspicious Scripts table now shows only matched scripts. - Add severity badges and tag chips; tooltips show rule description. - Prevent table blowouts: - Fixed layout + ellipsis + wrapping helpers (`.scripts-table`, `.breakable`, `details pre.code`). - Shortened inline snippet preview (configurable). - Minor template niceties (e.g., rel="noopener" on external links where applicable). Config - Add `ui.snippet_preview_len` to settings.yaml; default 160. - Load into `app.config["SNIPPET_PREVIEW_LEN"]` and use in `analyze_scripts`. Init / wiring - Import and register function rules as `Rule(...)` objects (not dicts). - Hook Rules Engine to Flask logger for verbose/diagnostic output. - Log totals on startup; keep YAML path override via `SNEAKYSCOPE_RULES_FILE`. Bug fixes - Fix boot crash: pass `Rule` instances to `engine.add_rule()` instead of dicts. - Fix “N/A” in scripts table by actually computing per-script matches. - Ensure form rules fire by including `base_url`/`base_hostname` in form snippets. Roadmap - Update roadmap to reflect completed items: - “Show each check and whether it triggered (pass/fail list per rule)” - Severity levels + tags in Suspicious Scripts - Results.json as route source of truth - Scripts table UX (badges, tooltips, layout fix)	2025-08-20 21:33:30 -05:00

4 Commits