feat(engine,ui): unify detection in rules engine, add function rules & per-script matches; improve scripts table UX

Core changes - Centralize detection in the Rules Engine; browser.py now focuses on fetch/extract/persist. - Add class-based adapters: - FactAdapter: converts snippets → structured facts. - FunctionRuleAdapter: wraps dict-based rule functions for engine input (str or dict). - Register function rules (code-based) alongside YAML rules: - form_action_missing - form_http_on_https_page - form_submits_to_different_host - script_src_uses_data_or_blob - script_src_has_dangerous_extension - script_third_party_host Rules & YAML - Expand/normalize YAML rules with severities + tags; tighten patterns. - Add new regex rules: new_function_usage, unescape_usage, string_timer_usage, long_hex_constants. - Move iframe rule to `text` category. - Keep existing script/form/text rules; all compile under IGNORECASE. Browser / analysis refactor - browser.py: - Remove inline heuristics; rely on engine for PASS/FAIL, reason, severity, tags. - Build page-level overview (`rule_checks`) across categories. - Analyze forms: add `base_url` + `base_hostname` to snippet so function rules can evaluate; include per-form rule_checks. - Analyze scripts: **per-script evaluation**: - Inline -> run regex script rules on inline text. - External -> run function script rules with a facts dict (src/src_hostname/base_url/base_hostname). - Only include scripts that matched ≥1 rule; attach severity/tags to matches. - Persist single source of truth: `/data/<uuid>/results.json`. - Backward-compat: `fetch_page_artifacts(..., engine=...)` kwarg accepted/ignored. UI/UX - Suspicious Scripts table now shows only matched scripts. - Add severity badges and tag chips; tooltips show rule description. - Prevent table blowouts: - Fixed layout + ellipsis + wrapping helpers (`.scripts-table`, `.breakable`, `details pre.code`). - Shortened inline snippet preview (configurable). - Minor template niceties (e.g., rel="noopener" on external links where applicable). Config - Add `ui.snippet_preview_len` to settings.yaml; default 160. - Load into `app.config["SNIPPET_PREVIEW_LEN"]` and use in `analyze_scripts`. Init / wiring - Import and register function rules as `Rule(...)` objects (not dicts). - Hook Rules Engine to Flask logger for verbose/diagnostic output. - Log totals on startup; keep YAML path override via `SNEAKYSCOPE_RULES_FILE`. Bug fixes - Fix boot crash: pass `Rule` instances to `engine.add_rule()` instead of dicts. - Fix “N/A” in scripts table by actually computing per-script matches. - Ensure form rules fire by including `base_url`/`base_hostname` in form snippets. Roadmap - Update roadmap to reflect completed items: - “Show each check and whether it triggered (pass/fail list per rule)” - Severity levels + tags in Suspicious Scripts - Results.json as route source of truth - Scripts table UX (badges, tooltips, layout fix)
2025-08-20 21:33:30 -05:00
parent 70d29f9f95
commit 1eb2a52f17
14 changed files with 1108 additions and 423 deletions
--- a/docs/Feature
+++ b/docs/Feature
@@ -0,0 +1,31 @@
+# Feature Session Plan – SneakyScope
+
+**Feature:**
+
+* \[Short description of the feature or improvement]
+
+**Effort:**
+
+* Easy / Medium / Hard
+
+**Dependencies:**
+
+* \[List of prerequisites or related tasks that must be done first]
+
+**Design Notes:**
+
+* \[Goals, considerations, analyst/UX needs, edge cases, pitfalls to avoid]
+
+**Implementation Tasks:**
+
+* [ ] Step 1 – …
+* [ ] Step 2 – …
+* [ ] Step 3 – …
+
+**Validation / Testing:**
+
+* \[How we’ll verify it works — e.g., test cases, UI check, API output, logs]
+
+**Next Steps After Completion:**
+
+* \[What this unblocks or enables, i.e. the next feature/dependency]
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,6 @@
+# Roadmap and chats
+
+## Vibecode? Brotha Ewww
+No, I don't "vibe code". There is a huge difference between asking AI to do everything vs "give me a boiler plate function" and tweaking from there. I've been coding over 20 years, these fingers have typed enough. So, yes I use AI while I code. 
+
+Some of these little files in here are just helpful ways I've started using AI to help keep me on track with the project. Feel free to borrow.
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -1,71 +1,32 @@
+# SneakyScope — Roadmap (Updated 8-20-25)

-## Priority 1 – Core Functionality / Stability
+## Priority 1 – Core Analysis / Stability

-**Permissions / Storage Paths**
+* Opt-in “fetch external scripts” mode (off by default): on submission, download external script content (size/time limits) and run rules on fetched content.
+* Remove remaining legacy form “flagged\_reasons” plumbing once all equivalent function rules are in place.
+* Unit tests: YAML compilation, function-rule adapters, and per-script/per-form rule cases.

-* ✅ `/data` and other mounted volumes setup handled by `sandbox.sh`
-* ✅ Downloads, screenshots, and HTML artifacts are written correctly (`safe_write` in `io_helpers.py`)
+## Priority 2 – API Layer

---
+* API endpoints: `/screenshot`, `/source`, `/analyse`.
+* OpenAPI spec: create `openapi/openapi.yaml` and serve at `/api/openapi.yaml`.
+* Docs UI: Swagger UI or Redoc at `/docs`.

-## Priority 2 – Data Accuracy / Enrichment
+## Priority 3 – UI / UX

-**WHOIS & GeoIP Enhancements**
+* Front page/input handling: auto-prepend `http://`/`https://`/`www.` for bare domains.
+* Source code viewer: embed page source in an editor view for readability.
+* Scripts table: toggle between “Only suspicious” and “All scripts”.
+* Rules Lab (WYSIWYG tester): paste a rule, validate/compile, run against sample text; lightweight nav entry.

-* ✅ Implemented Python-based WHOIS parsing with fallback to raw WHOIS text
-* ✅ Default `"Possible Privacy"` or `"N/A"` for missing WHOIS fields
-* ✅ GeoIP + ASN + ISP info displayed per IP in **accordion tables**
-* ✅ Cache WHOIS and GeoIP results to reduce repeated queries
+## Priority 4 – Artifact Management & Ops

-**Suspicious Scripts & Forms**
+* Retention/cleanup policy for old artifacts (age/size thresholds).
+* Make periodic maintenance scripts for storage; cleanup options set in `settings.yaml`.
+* Results caching UX: add “Re-run analysis” vs. “Load from cache” controls in the results UI.

-* [ ] Expand flagged script and form output with reasons for analysts
-* [ ] Show each check and if it triggered flags (pass/fail for each check)
+## Priority 5 – Extras / Integrations

-**Add Suspicious BEC words**
-
-* ✅ Look for things like `"reset password"`
-* ✅ Make configurable via a config file (yaml doc with rules)
-
---
-
-## Priority 3 – User Interface / UX
-
-**Front Page / Input Handling**
-
-* [ ] Automatically prepend `http://`, `https://`, and/or `www.` if a user only enters a domain
-
-**Result Templates / Cards**
-* [ ] load sourcecode for webpage in a code editor view or code block on page so that it's easier to read
-* [ ] Update result cards with clear, analyst-friendly explanations
-* [ ] Include flagged logic and reason lists for scripts and forms
-* ✅ Display GeoIP results in accordion tables (✅ done)
-
---
-
-## Priority 4 – API Layer
-
-**API Endpoints**
-
-* [ ] Add `/screenshot` endpoint
-* [ ] Add `/source` endpoint
-* [ ] Add `/analyse` endpoint
-
-**OpenAPI + Docs**
-
-* [ ] Create initial `openapi/openapi.yaml` spec file
-* [ ] Serve spec at `/api/openapi.yaml`
-* [ ] Wire up Swagger UI or Redoc at `/docs` for interactive API exploration
-
---
-
-## Priority 5 – Optional / Cleanup
-
-**Artifact Management**
-* [ ] Implement saving of results from a UUID as "results.json" so we don't rerun all the rules and just load from cache.
-* [ ] Implement cleanup or retention policy for old artifacts
-* [ ] Optional: Add periodic maintenance scripts for storage
-
-**Extra Features**
-
-* [ ] Placeholder for additional features (e.g., bulk URL analysis, alerting, integrations)
+* Bulk URL analysis (batch/queue).
+* Alerting & integrations (webhooks, Slack, email).
+* Optional: analyst verdict tags and export (CSV/JSON).
--- a/docs/workflow.md
+++ b/docs/workflow.md
@@ -0,0 +1,12 @@
+### 🛠 SneakyScope Feature Workflow
+
+1. Pick feature from roadmap
+2. Drop in **Feature Session Plan** template
+3. Fill in description, effort, dependencies, design notes
+4. Expand into tasks → implement code → test/validate
+5. Update roadmap (remove/complete, reorder if needed)
+
+---
+
+This way, every session starts with the same rhythm, and we don’t lose context between chats.
+