feat(engine,ui): unify detection in rules engine, add function rules & per-script matches; improve scripts table UX
Core changes
- Centralize detection in the Rules Engine; browser.py now focuses on fetch/extract/persist.
- Add class-based adapters:
- FactAdapter: converts snippets → structured facts.
- FunctionRuleAdapter: wraps dict-based rule functions for engine input (str or dict).
- Register function rules (code-based) alongside YAML rules:
- form_action_missing
- form_http_on_https_page
- form_submits_to_different_host
- script_src_uses_data_or_blob
- script_src_has_dangerous_extension
- script_third_party_host
Rules & YAML
- Expand/normalize YAML rules with severities + tags; tighten patterns.
- Add new regex rules: new_function_usage, unescape_usage, string_timer_usage, long_hex_constants.
- Move iframe rule to `text` category.
- Keep existing script/form/text rules; all compile under IGNORECASE.
Browser / analysis refactor
- browser.py:
- Remove inline heuristics; rely on engine for PASS/FAIL, reason, severity, tags.
- Build page-level overview (`rule_checks`) across categories.
- Analyze forms: add `base_url` + `base_hostname` to snippet so function rules can evaluate; include per-form rule_checks.
- Analyze scripts: **per-script evaluation**:
- Inline -> run regex script rules on inline text.
- External -> run function script rules with a facts dict (src/src_hostname/base_url/base_hostname).
- Only include scripts that matched ≥1 rule; attach severity/tags to matches.
- Persist single source of truth: `/data/<uuid>/results.json`.
- Backward-compat: `fetch_page_artifacts(..., engine=...)` kwarg accepted/ignored.
UI/UX
- Suspicious Scripts table now shows only matched scripts.
- Add severity badges and tag chips; tooltips show rule description.
- Prevent table blowouts:
- Fixed layout + ellipsis + wrapping helpers (`.scripts-table`, `.breakable`, `details pre.code`).
- Shortened inline snippet preview (configurable).
- Minor template niceties (e.g., rel="noopener" on external links where applicable).
Config
- Add `ui.snippet_preview_len` to settings.yaml; default 160.
- Load into `app.config["SNIPPET_PREVIEW_LEN"]` and use in `analyze_scripts`.
Init / wiring
- Import and register function rules as `Rule(...)` objects (not dicts).
- Hook Rules Engine to Flask logger for verbose/diagnostic output.
- Log totals on startup; keep YAML path override via `SNEAKYSCOPE_RULES_FILE`.
Bug fixes
- Fix boot crash: pass `Rule` instances to `engine.add_rule()` instead of dicts.
- Fix “N/A” in scripts table by actually computing per-script matches.
- Ensure form rules fire by including `base_url`/`base_hostname` in form snippets.
Roadmap
- Update roadmap to reflect completed items:
- “Show each check and whether it triggered (pass/fail list per rule)”
- Severity levels + tags in Suspicious Scripts
- Results.json as route source of truth
- Scripts table UX (badges, tooltips, layout fix)
This commit is contained in:
130
app/__init__.py
130
app/__init__.py
@@ -1,16 +1,3 @@
|
||||
"""
|
||||
app/__init__.py
|
||||
|
||||
Application factory and startup hooks for SneakyScope.
|
||||
|
||||
Responsibilities:
|
||||
- Create the Flask app.
|
||||
- Load settings (YAML -> dataclasses) with safe defaults.
|
||||
- Initialize and load the Suspicious Rules Engine from YAML.
|
||||
- Register blueprints (routes).
|
||||
- Configure core paths (e.g., SANDBOX_STORAGE).
|
||||
"""
|
||||
|
||||
import os
|
||||
import logging
|
||||
from pathlib import Path
|
||||
@@ -18,11 +5,24 @@ from flask import Flask
|
||||
|
||||
# Local imports
|
||||
from .utils.settings import get_settings
|
||||
from .utils import io_helpers # if you need logging/setup later
|
||||
from .utils import cache_db # available for future injections
|
||||
from .utils.rules_engine import RuleEngine, load_rules_from_yaml # rules engine
|
||||
from .utils.rules_engine import RuleEngine, load_rules_from_yaml, Rule
|
||||
|
||||
# our code based rules
|
||||
from .rules.function_rules import (
|
||||
FactAdapter,
|
||||
FunctionRuleAdapter,
|
||||
script_src_uses_data_or_blob,
|
||||
script_src_has_dangerous_extension,
|
||||
script_third_party_host,
|
||||
form_submits_to_different_host,
|
||||
form_http_on_https_page,
|
||||
form_action_missing,
|
||||
)
|
||||
|
||||
from . import routes # blueprint
|
||||
|
||||
# from .utils import io_helpers # if need logging/setup later
|
||||
# from .utils import cache_db # available for future injections
|
||||
|
||||
def create_app() -> Flask:
|
||||
"""
|
||||
@@ -37,46 +37,110 @@ def create_app() -> Flask:
|
||||
# Load settings (safe fallback to defaults if file missing)
|
||||
settings = get_settings()
|
||||
|
||||
# Secret key loaded from env
|
||||
# Secret key loaded from env (warn if missing)
|
||||
app.secret_key = os.getenv("SECRET_KEY")
|
||||
if not app.secret_key:
|
||||
app.logger.warning("[init] SECRET_KEY is not set; sessions may be insecure in production.")
|
||||
|
||||
# Configure storage directory (bind-mount is still handled by sandbox.sh)
|
||||
sandbox_storage_default = Path("/data")
|
||||
app.config["SANDBOX_STORAGE"] = str(sandbox_storage_default)
|
||||
|
||||
# Initialize Suspicious Rules Engine at startup
|
||||
# Determine rules file path relative to this package
|
||||
base_dir = Path(__file__).resolve().parent
|
||||
rules_path = base_dir / "config" / "suspicious_rules.yaml"
|
||||
# ---------------------------
|
||||
# Suspicious Rules Engine
|
||||
# ---------------------------
|
||||
|
||||
# Create an engine instance (even if file missing, we still want an engine)
|
||||
engine = RuleEngine()
|
||||
# Determine rules file path relative to this package (allow env override)
|
||||
base_dir = Path(__file__).resolve().parent
|
||||
default_rules_path = base_dir / "config" / "suspicious_rules.yaml"
|
||||
rules_path_str = os.getenv("SNEAKYSCOPE_RULES_FILE", str(default_rules_path))
|
||||
rules_path = Path(rules_path_str)
|
||||
|
||||
# Create engine bound to Flask logger so all verbose/debug goes to app.logger
|
||||
engine = RuleEngine(rules=[], logger=app.logger)
|
||||
|
||||
# Try to load from YAML if present; log clearly if not
|
||||
if rules_path.exists():
|
||||
try:
|
||||
loaded_rules = load_rules_from_yaml(rules_path)
|
||||
# Add rules one-by-one (explicit)
|
||||
for rule in loaded_rules:
|
||||
engine.add_rule(rule)
|
||||
app.logger.info(f"[+] Loaded {len(loaded_rules)} suspicious rules from {rules_path}")
|
||||
loaded_rules = load_rules_from_yaml(rules_path, logger=app.logger)
|
||||
# Add rules one-by-one (explicit, clearer logs if any rule fails to compile)
|
||||
index = 0
|
||||
total = len(loaded_rules)
|
||||
while index < total:
|
||||
engine.add_rule(loaded_rules[index])
|
||||
index = index + 1
|
||||
app.logger.info(f"[init] Loaded {len(loaded_rules)} suspicious rules from {rules_path}")
|
||||
except Exception as e:
|
||||
app.logger.warning(f"[!] Failed loading rules from {rules_path}: {e}")
|
||||
app.logger.warning(f"[init] Failed loading rules from {rules_path}: {e}")
|
||||
else:
|
||||
app.logger.warning(f"[!] Rules file not found at {rules_path}. Engine will start with zero rules.")
|
||||
app.logger.warning(f"[init] Rules file not found at {rules_path}. Engine will start with zero rules.")
|
||||
|
||||
# Store engine on app config so it is accessible via current_app
|
||||
# Built-in function-based rules
|
||||
adapter = FactAdapter(logger=app.logger)
|
||||
|
||||
engine.add_rule(Rule(
|
||||
name="form_action_missing",
|
||||
description="Form has no action attribute",
|
||||
category="form",
|
||||
rule_type="function",
|
||||
function=FunctionRuleAdapter(form_action_missing, category="form", adapter=adapter),
|
||||
))
|
||||
|
||||
engine.add_rule(Rule(
|
||||
name="form_http_on_https_page",
|
||||
description="Form submits via HTTP from HTTPS page",
|
||||
category="form",
|
||||
rule_type="function",
|
||||
function=FunctionRuleAdapter(form_http_on_https_page, category="form", adapter=adapter),
|
||||
))
|
||||
|
||||
engine.add_rule(Rule(
|
||||
name="form_submits_to_different_host",
|
||||
description="Form submits to a different host",
|
||||
category="form",
|
||||
rule_type="function",
|
||||
function=FunctionRuleAdapter(form_submits_to_different_host, category="form", adapter=adapter),
|
||||
))
|
||||
|
||||
# Script rules expect dict 'facts' (you’ll wire per-script facts later)
|
||||
engine.add_rule(Rule(
|
||||
name="script_src_uses_data_or_blob",
|
||||
description="Script src uses data:/blob: URL",
|
||||
category="script",
|
||||
rule_type="function",
|
||||
function=FunctionRuleAdapter(script_src_uses_data_or_blob, category="script", adapter=adapter),
|
||||
))
|
||||
|
||||
engine.add_rule(Rule(
|
||||
name="script_src_has_dangerous_extension",
|
||||
description="External script with dangerous extension",
|
||||
category="script",
|
||||
rule_type="function",
|
||||
function=FunctionRuleAdapter(script_src_has_dangerous_extension, category="script", adapter=adapter),
|
||||
))
|
||||
|
||||
engine.add_rule(Rule(
|
||||
name="script_third_party_host",
|
||||
description="Script is from a third-party host",
|
||||
category="script",
|
||||
rule_type="function",
|
||||
function=FunctionRuleAdapter(script_third_party_host, category="script", adapter=adapter),
|
||||
))
|
||||
|
||||
# Store engine both ways: attribute (convenient) and config
|
||||
app.rule_engine = engine
|
||||
app.config["RULE_ENGINE"] = engine
|
||||
|
||||
# Make app name/version available for templates here if you want it globally
|
||||
# App metadata available to templates
|
||||
app.config["APP_NAME"] = settings.app.name
|
||||
app.config["APP_VERSION"] = f"v{settings.app.version_major}.{settings.app.version_minor}"
|
||||
|
||||
# Register blueprints
|
||||
app.register_blueprint(routes.bp)
|
||||
|
||||
# Example log line so we know we booted cleanly
|
||||
# Example log lines so we know we booted cleanly
|
||||
app.logger.info(f"SneakyScope started: {app.config['APP_NAME']} {app.config['APP_VERSION']}")
|
||||
app.logger.info(f"SANDBOX_STORAGE: {app.config['SANDBOX_STORAGE']}")
|
||||
app.logger.info(f"Registered {len(engine.rules)} total rules (YAML + function)")
|
||||
|
||||
return app
|
||||
|
||||
519
app/browser.py
519
app/browser.py
@@ -1,18 +1,46 @@
|
||||
import re
|
||||
import uuid
|
||||
import json
|
||||
from pathlib import Path
|
||||
from bs4 import BeautifulSoup
|
||||
from datetime import datetime
|
||||
from urllib.parse import urlparse
|
||||
from typing import Dict, Any, Optional
|
||||
from playwright.async_api import async_playwright, TimeoutError as PWTimeoutError
|
||||
"""
|
||||
app/browser.py
|
||||
|
||||
from flask import current_app # access the rule engine from app config
|
||||
Page fetcher + analysis orchestrator for SneakyScope.
|
||||
- Fetches a URL (HTML, redirects, etc.)
|
||||
- Runs the Suspicious Rules Engine (PASS/FAIL for all rules)
|
||||
- Writes artifacts (screenshot.png, source.txt, results.json) into /data/<run_uuid>/
|
||||
- Returns a single 'result' dict suitable for UI and future API
|
||||
|
||||
Design notes:
|
||||
- Detection logic (regex/heuristics) lives in the rules engine (YAML/function rules).
|
||||
- This module keeps "plumbing" only (fetch, extract, persist).
|
||||
- Minimal non-detection heuristics remain here (e.g., skip benign script MIME types).
|
||||
|
||||
Assumptions:
|
||||
- Flask app context is active (uses current_app for logger and RULE_ENGINE).
|
||||
- SANDBOX_STORAGE is configured (default: /data).
|
||||
- enrich_url(url) returns enrichment dict.
|
||||
"""
|
||||
|
||||
import json
|
||||
import uuid
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from bs4 import BeautifulSoup
|
||||
from flask import current_app
|
||||
from playwright.async_api import async_playwright, TimeoutError as PWTimeoutError
|
||||
|
||||
from app.utils.io_helpers import safe_write
|
||||
from .enrichment import enrich_url
|
||||
|
||||
from .utils.settings import get_settings
|
||||
|
||||
settings = get_settings()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Engine access helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def get_rule_engine():
|
||||
"""
|
||||
Retrieve the rules engine instance from the Flask application config.
|
||||
@@ -21,96 +49,158 @@ def get_rule_engine():
|
||||
RuleEngine or None: The engine if available, or None if not configured.
|
||||
"""
|
||||
try:
|
||||
# current_app is only available during an active request context
|
||||
engine = current_app.config.get("RULE_ENGINE")
|
||||
return engine
|
||||
return current_app.config.get("RULE_ENGINE")
|
||||
except Exception:
|
||||
# If called outside a Flask request context, fail gracefully
|
||||
return None
|
||||
|
||||
|
||||
def run_rule_checks(text, category):
|
||||
def _summarize_results(results: List[Dict[str, Any]]) -> Dict[str, int]:
|
||||
"""
|
||||
Run all rules for a given category against the provided text.
|
||||
|
||||
Args:
|
||||
text (str): The content to test (e.g., form snippet, inline JS).
|
||||
category (str): The rule category to run (e.g., 'form' or 'script').
|
||||
Summarize a list of engine rule result dicts (result = "PASS"|"FAIL").
|
||||
|
||||
Returns:
|
||||
dict: {
|
||||
"checks": [ { "rule": str, "category": str, "matched": bool, "reason": Optional[str] }, ... ],
|
||||
"summary": { "matched_count": int, "total_rules": int }
|
||||
{'fail_count': int, 'total_rules': int}
|
||||
"""
|
||||
summary = {"fail_count": 0, "total_rules": 0}
|
||||
index = 0
|
||||
total = len(results)
|
||||
while index < total:
|
||||
item = results[index]
|
||||
summary["total_rules"] = summary["total_rules"] + 1
|
||||
if str(item.get("result", "")).upper() == "FAIL":
|
||||
summary["fail_count"] = summary["fail_count"] + 1
|
||||
index = index + 1
|
||||
return summary
|
||||
|
||||
|
||||
def run_rule_checks(text: str, category: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Run all rules for a given category against provided text, returning a table-friendly model.
|
||||
|
||||
Args:
|
||||
text: Text to analyze (HTML, snippet, etc.)
|
||||
category: One of 'form', 'script', 'text' (or any category your rules use)
|
||||
|
||||
Returns:
|
||||
{
|
||||
"checks": [
|
||||
{ "name": str, "description": str, "category": str,
|
||||
"result": "PASS"|"FAIL", "reason": Optional[str],
|
||||
"severity": Optional[str], "tags": Optional[List[str]] }, ...
|
||||
],
|
||||
"summary": { "fail_count": int, "total_rules": int }
|
||||
}
|
||||
"""
|
||||
result = {
|
||||
"checks": [],
|
||||
"summary": {
|
||||
"matched_count": 0,
|
||||
"total_rules": 0
|
||||
}
|
||||
}
|
||||
|
||||
out: Dict[str, Any] = {"checks": [], "summary": {"fail_count": 0, "total_rules": 0}}
|
||||
engine = get_rule_engine()
|
||||
|
||||
if engine is None:
|
||||
# No engine configured; return empty but well-formed structure
|
||||
return result
|
||||
return out
|
||||
|
||||
try:
|
||||
# Run engine rules for the specified category
|
||||
check_results = engine.run_all(text, category=category)
|
||||
|
||||
# Normalize results into the expected structure
|
||||
total = 0
|
||||
matched = 0
|
||||
|
||||
for item in check_results:
|
||||
# item is expected to contain: rule, category, matched, reason (optional)
|
||||
total = total + 1
|
||||
if bool(item.get("matched")):
|
||||
matched = matched + 1
|
||||
|
||||
engine_results = engine.run_all(text, category=category) # list of dicts
|
||||
# Normalize explicitly
|
||||
index = 0
|
||||
total = len(engine_results)
|
||||
while index < total:
|
||||
item = engine_results[index]
|
||||
normalized = {
|
||||
"rule": item.get("rule"),
|
||||
"name": item.get("name"),
|
||||
"description": item.get("description"),
|
||||
"category": item.get("category"),
|
||||
"matched": bool(item.get("matched")),
|
||||
"reason": item.get("reason")
|
||||
"result": item.get("result"), # "PASS" | "FAIL"
|
||||
"reason": item.get("reason"), # present on FAIL by engine design
|
||||
"severity": item.get("severity"),
|
||||
"tags": item.get("tags"),
|
||||
}
|
||||
result["checks"].append(normalized)
|
||||
out["checks"].append(normalized)
|
||||
index = index + 1
|
||||
|
||||
result["summary"]["matched_count"] = matched
|
||||
result["summary"]["total_rules"] = total
|
||||
|
||||
except Exception as e:
|
||||
# If anything goes wrong, keep structure and add a fake failure note
|
||||
result["checks"].append({
|
||||
"rule": "engine_error",
|
||||
out["summary"] = _summarize_results(out["checks"])
|
||||
except Exception as exc:
|
||||
# Preserve shape; record the error as a synthetic PASS (so UI doesn't break)
|
||||
out["checks"].append({
|
||||
"name": "engine_error",
|
||||
"description": "Rule engine failed during evaluation",
|
||||
"category": category,
|
||||
"matched": False,
|
||||
"reason": f"Rule engine error: {e}"
|
||||
"result": "PASS",
|
||||
"reason": f"{exc}",
|
||||
"severity": None,
|
||||
"tags": None
|
||||
})
|
||||
result["summary"]["matched_count"] = 0
|
||||
result["summary"]["total_rules"] = 0
|
||||
out["summary"] = {"fail_count": 0, "total_rules": 1}
|
||||
|
||||
return result
|
||||
return out
|
||||
|
||||
|
||||
def analyze_forms(html: str, base_url: str):
|
||||
def build_rule_checks_overview(full_html_text: str) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Parse forms from the page HTML and apply heuristic flags and rule-based checks.
|
||||
|
||||
Args:
|
||||
html (str): The full page HTML.
|
||||
base_url (str): The final URL of the page (used for hostname comparisons).
|
||||
Build a top-level overview for the results page: runs each category across
|
||||
the entire HTML and groups results by category.
|
||||
|
||||
Returns:
|
||||
list[dict]: A list of form analysis dictionaries, each including:
|
||||
[
|
||||
{"category": "script", "results": [ ...engine dicts... ], "summary": {...}},
|
||||
{"category": "form", "results": [ ... ], "summary": {...}},
|
||||
{"category": "text", "results": [ ... ], "summary": {...}},
|
||||
]
|
||||
"""
|
||||
overview: List[Dict[str, Any]] = []
|
||||
engine = get_rule_engine()
|
||||
|
||||
categories = ["script", "form", "text"]
|
||||
index = 0
|
||||
total = len(categories)
|
||||
|
||||
while index < total:
|
||||
cat = categories[index]
|
||||
block = {"category": cat, "results": [], "summary": {"fail_count": 0, "total_rules": 0}}
|
||||
|
||||
if engine is not None:
|
||||
try:
|
||||
results = engine.run_all(full_html_text, category=cat)
|
||||
block["results"] = results
|
||||
block["summary"] = _summarize_results(results)
|
||||
except Exception as exc:
|
||||
block["results"] = [{
|
||||
"name": "engine_error",
|
||||
"description": "Rule engine failed during overview evaluation",
|
||||
"category": cat,
|
||||
"result": "PASS",
|
||||
"reason": f"{exc}",
|
||||
"severity": None,
|
||||
"tags": None
|
||||
}]
|
||||
block["summary"] = {"fail_count": 0, "total_rules": 1}
|
||||
|
||||
overview.append(block)
|
||||
index = index + 1
|
||||
|
||||
return overview
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Form & Script analysis (plumbing only; detection is in the rules engine)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def analyze_forms(html: str, base_url: str) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Parse forms from the page HTML and apply rule-based checks (engine), keeping
|
||||
only simple plumbing heuristics here (no security logic).
|
||||
|
||||
Returns list of dicts with keys:
|
||||
- action, method, inputs
|
||||
- flagged (bool), flag_reasons (list[str]), status (str)
|
||||
- rule_checks: dict with "checks" (list) and "summary" (dict)
|
||||
- rule_checks: {'checks': [...], 'summary': {...}} (per-form snippet evaluation)
|
||||
|
||||
Note:
|
||||
The 'flagged' value is now purely a legacy visual hint based on simple
|
||||
heuristics; the authoritative PASS/FAIL details are in rule_checks.
|
||||
As you migrate heuristics into function rules, this 'flagged' may be
|
||||
removed entirely.
|
||||
"""
|
||||
soup = BeautifulSoup(html, "lxml")
|
||||
forms_info = []
|
||||
forms_info: List[Dict[str, Any]] = []
|
||||
page_hostname = urlparse(base_url).hostname
|
||||
|
||||
for form in soup.find_all("form"):
|
||||
@@ -118,40 +208,31 @@ def analyze_forms(html: str, base_url: str):
|
||||
method = form.get("method", "get").lower()
|
||||
|
||||
# Build explicit inputs list
|
||||
inputs = []
|
||||
inputs: List[Dict[str, Any]] = []
|
||||
for inp in form.find_all("input"):
|
||||
input_name = inp.get("name")
|
||||
input_type = inp.get("type", "text")
|
||||
inputs.append({
|
||||
"name": input_name,
|
||||
"type": input_type
|
||||
})
|
||||
inputs.append({"name": input_name, "type": input_type})
|
||||
|
||||
flagged_reasons = []
|
||||
# Minimal legacy flags (kept for UI continuity; detection lives in engine)
|
||||
flagged_reasons: List[str] = []
|
||||
|
||||
# No action specified
|
||||
if not action or str(action).strip() == "":
|
||||
flagged_reasons.append("No action specified")
|
||||
|
||||
# External host
|
||||
else:
|
||||
try:
|
||||
action_host = urlparse(action).hostname
|
||||
if not str(action).startswith("/") and action_host != page_hostname:
|
||||
flagged_reasons.append("Submits to a different host")
|
||||
except Exception:
|
||||
# If hostname parsing fails, skip this condition quietly
|
||||
pass
|
||||
|
||||
# HTTP form on HTTPS page
|
||||
try:
|
||||
if urlparse(action).scheme == "http" and urlparse(base_url).scheme == "https":
|
||||
flagged_reasons.append("Submits over insecure HTTP")
|
||||
except Exception:
|
||||
# If scheme parsing fails, ignore
|
||||
pass
|
||||
|
||||
# Hidden password / suspicious hidden inputs
|
||||
for hidden in form.find_all("input", type="hidden"):
|
||||
name_value = hidden.get("name") or ""
|
||||
if "password" in name_value.lower():
|
||||
@@ -159,15 +240,23 @@ def analyze_forms(html: str, base_url: str):
|
||||
|
||||
flagged = bool(flagged_reasons)
|
||||
|
||||
# Serialize a simple form snippet for the rules engine to analyze (category='form')
|
||||
# Serialize a simple form snippet for rule category='form'
|
||||
snippet_lines = []
|
||||
snippet_lines.append(f"base_url={base_url}")
|
||||
snippet_lines.append(f"base_hostname={page_hostname}")
|
||||
snippet_lines.append(f"action={action}")
|
||||
snippet_lines.append(f"method={method}")
|
||||
snippet_lines.append("inputs=")
|
||||
for item in inputs:
|
||||
|
||||
i = 0
|
||||
n = len(inputs)
|
||||
while i < n:
|
||||
item = inputs[i]
|
||||
snippet_lines.append(f" - name={item.get('name')} type={item.get('type')}")
|
||||
i = i + 1
|
||||
form_snippet = "\n".join(snippet_lines)
|
||||
|
||||
# Per-form rule checks (PASS/FAIL list via engine)
|
||||
rule_checks = run_rule_checks(form_snippet, category="form")
|
||||
|
||||
forms_info.append({
|
||||
@@ -183,156 +272,116 @@ def analyze_forms(html: str, base_url: str):
|
||||
return forms_info
|
||||
|
||||
|
||||
def analyze_scripts(html: str, base_url: str = "", engine=None) -> list[dict]:
|
||||
def analyze_scripts(html: str, base_url: str = "") -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Analyze <script> elements using the RuleEngine (if provided) and
|
||||
lightweight built-in heuristics. Only append a record when at least
|
||||
one rule or heuristic matches, and always set a sensible 'type'.
|
||||
Collect script artifacts and evaluate per-script matches via the rules engine.
|
||||
Only include rows that matched at least one rule. Inline scripts are checked
|
||||
against regex rules using their text; external scripts are checked against
|
||||
function rules using a small 'facts' dict (src/hosts).
|
||||
|
||||
Returns list of dicts like:
|
||||
{
|
||||
"type": "external" | "inline" | "unknown",
|
||||
"src": "...", # for external
|
||||
"content_snippet": "...", # for inline
|
||||
"rules": [ { "name": "...", "description": "..." }, ... ],
|
||||
"heuristics": [ "reason1", "reason2", ... ]
|
||||
"rules": [ { "name": "...", "description": "..." }, ... ]
|
||||
}
|
||||
"""
|
||||
soup = BeautifulSoup(html, "lxml")
|
||||
results: list[dict] = []
|
||||
results: List[Dict[str, Any]] = []
|
||||
|
||||
import re
|
||||
from urllib.parse import urlparse
|
||||
|
||||
# Benign MIME types we ignore entirely
|
||||
# Benign MIME types we ignore entirely (non-detection plumbing)
|
||||
benign_types = {"application/ld+json", "application/json"}
|
||||
|
||||
# Suspicious file extensions for external scripts
|
||||
dangerous_ext = (".vbs", ".hta")
|
||||
|
||||
# Inline red flags
|
||||
risky_inline_patterns = [
|
||||
(re.compile(r"\beval\s*\(", re.IGNORECASE), "Uses eval()"),
|
||||
(re.compile(r"\bnew\s+Function\s*\(", re.IGNORECASE), "Uses Function constructor"),
|
||||
(re.compile(r"\bdocument\.write\s*\(", re.IGNORECASE), "Uses document.write()"),
|
||||
(re.compile(r"\bActiveXObject\s*\(", re.IGNORECASE), "Uses ActiveXObject (IE-only)"),
|
||||
(re.compile(r"\batob\s*\(", re.IGNORECASE), "Uses atob() (possible obfuscation)"),
|
||||
(re.compile(r"\bunescape\s*\(", re.IGNORECASE), "Uses unescape() (legacy/obfuscation)"),
|
||||
(re.compile(r"\bset(?:Timeout|Interval)\s*\(\s*['\"`].+['\"`]\s*,", re.IGNORECASE),
|
||||
"String passed to setTimeout/setInterval"),
|
||||
(re.compile(r"[\"']?0x[0-9a-fA-F]{16,}[\"']?", re.IGNORECASE),
|
||||
"Contains long hex-like constants (possible obfuscation)"),
|
||||
]
|
||||
|
||||
base_host = urlparse(base_url).hostname or ""
|
||||
engine = get_rule_engine()
|
||||
base_hostname = urlparse(base_url).hostname or ""
|
||||
|
||||
for script in soup.find_all("script"):
|
||||
try:
|
||||
src = (script.get("src") or "").strip()
|
||||
s_type_attr = (script.get("type") or "").strip().lower()
|
||||
|
||||
# IMPORTANT: .string is often None; get_text() is reliable
|
||||
inline_text = script.get_text(strip=True) or ""
|
||||
|
||||
# Skip benign structured data outright
|
||||
# Skip benign structured data outright (noise control)
|
||||
if s_type_attr in benign_types:
|
||||
continue
|
||||
|
||||
# ---- Build facts for the rules engine
|
||||
facts = {
|
||||
"script_type_attr": s_type_attr or None,
|
||||
"has_src": bool(src),
|
||||
"src": src or None,
|
||||
"attrs": dict(script.attrs),
|
||||
"inline_len": len(inline_text),
|
||||
"inline_preview": inline_text[:200].replace("\n", " ") if inline_text else None,
|
||||
"base_url": base_url or None,
|
||||
"base_hostname": base_host or None,
|
||||
"src_hostname": urlparse(src).hostname if src else None,
|
||||
}
|
||||
|
||||
# ---- Evaluate rules engine (using name/description)
|
||||
engine_matches: list[dict] = []
|
||||
if engine is not None:
|
||||
try:
|
||||
if hasattr(engine, "evaluate_script"):
|
||||
matches = engine.evaluate_script(facts)
|
||||
elif hasattr(engine, "evaluate"):
|
||||
matches = engine.evaluate(facts)
|
||||
else:
|
||||
matches = []
|
||||
|
||||
if isinstance(matches, list):
|
||||
for m in matches:
|
||||
if isinstance(m, dict) and "name" in m:
|
||||
engine_matches.append({
|
||||
"name": m["name"],
|
||||
"description": m.get("description", "")
|
||||
})
|
||||
elif isinstance(m, str):
|
||||
engine_matches.append({"name": m, "description": ""})
|
||||
except Exception as e:
|
||||
engine_matches.append({"name": "Rules Engine Error", "description": str(e)})
|
||||
|
||||
# ---- Built-in heuristics
|
||||
heuristics: list[str] = []
|
||||
if src:
|
||||
# Unusual URL schemes for script sources
|
||||
if src.startswith(("data:", "blob:")):
|
||||
heuristics.append("Script src uses data:/blob: URL")
|
||||
# Dangerous extensions
|
||||
for ext in dangerous_ext:
|
||||
if src.lower().endswith(ext):
|
||||
heuristics.append(f"External script with dangerous extension ({ext.lstrip('.')})")
|
||||
break
|
||||
# Third-party host hint
|
||||
src_host = facts.get("src_hostname") or ""
|
||||
if base_host and src_host and src_host != base_host:
|
||||
heuristics.append(f"Third-party host: {src_host}")
|
||||
else:
|
||||
if inline_text:
|
||||
for pat, why in risky_inline_patterns:
|
||||
if pat.search(inline_text):
|
||||
heuristics.append(why)
|
||||
|
||||
# ---- Only append when something matched; always set type
|
||||
if engine_matches or heuristics:
|
||||
record: dict = {}
|
||||
|
||||
record: Dict[str, Any] = {}
|
||||
if src:
|
||||
record["type"] = "external"
|
||||
record["src"] = src
|
||||
elif inline_text:
|
||||
record["type"] = "inline"
|
||||
record["content_snippet"] = facts.get("inline_preview")
|
||||
record["content_snippet"] = (inline_text[:settings.ui.snippet_preview_len]).replace("\n", " ")
|
||||
else:
|
||||
record["type"] = "unknown"
|
||||
|
||||
if engine_matches:
|
||||
record["rules"] = engine_matches
|
||||
if heuristics:
|
||||
record["heuristics"] = heuristics
|
||||
# --- Per-script evaluation: gather matches from engine rules
|
||||
matches: List[Dict[str, str]] = []
|
||||
if engine is not None:
|
||||
# Inline content → run regex script rules against the text
|
||||
if inline_text:
|
||||
for r in engine.rules:
|
||||
if getattr(r, "category", None) == "script" and getattr(r, "rule_type", None) == "regex":
|
||||
ok, reason = r.run(inline_text)
|
||||
if ok:
|
||||
matches.append({
|
||||
"name": getattr(r, "name", "unknown_rule"),
|
||||
"description": getattr(r, "description", "") or (reason or ""),
|
||||
"severity": getattr(r, "severity", None),
|
||||
"tags": getattr(r, "tags", None),
|
||||
})
|
||||
|
||||
# External src → run function script rules with facts
|
||||
if src:
|
||||
facts = {
|
||||
"src": src,
|
||||
"base_url": base_url,
|
||||
"base_hostname": base_hostname,
|
||||
"src_hostname": urlparse(src).hostname or "",
|
||||
"category": "script",
|
||||
}
|
||||
for r in engine.rules:
|
||||
if getattr(r, "category", None) == "script" and getattr(r, "rule_type", None) == "function":
|
||||
ok, reason = r.run(facts)
|
||||
if ok:
|
||||
matches.append({
|
||||
"name": getattr(r, "name", "unknown_rule"),
|
||||
"description": (reason or "") or getattr(r, "description", ""),
|
||||
"severity": getattr(r, "severity", None),
|
||||
"tags": getattr(r, "tags", None),
|
||||
})
|
||||
|
||||
# Only keep rows that matched at least one rule
|
||||
if matches:
|
||||
record["rules"] = matches
|
||||
results.append(record)
|
||||
|
||||
except Exception as e:
|
||||
# Never let a single broken <script> kill the whole analysis
|
||||
except Exception as exc:
|
||||
results.append({
|
||||
"type": "unknown",
|
||||
"heuristics": [f"Script analysis error: {e}"]
|
||||
"heuristics": [f"Script analysis error: {exc}"]
|
||||
})
|
||||
|
||||
return results
|
||||
|
||||
|
||||
async def fetch_page_artifacts(url: str, storage_dir: Path, engine=None) -> Dict[str, Any]:
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fetcher / Orchestrator
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
async def fetch_page_artifacts(url: str, storage_dir: Path) -> Dict[str, Any]:
|
||||
"""
|
||||
Fetch page artifacts and save them in a UUID-based directory.
|
||||
|
||||
Args:
|
||||
url (str): URL to analyze.
|
||||
storage_dir (Path): Base /data path.
|
||||
engine: Optional rules engine instance (from app.config["RULE_ENGINE"]).
|
||||
Writes:
|
||||
- /data/<uuid>/screenshot.png
|
||||
- /data/<uuid>/source.txt
|
||||
- /data/<uuid>/results.json (single source of truth for routes)
|
||||
|
||||
Returns:
|
||||
result dict with keys used by templates (and future API).
|
||||
"""
|
||||
run_uuid = str(uuid.uuid4())
|
||||
run_dir = storage_dir / run_uuid
|
||||
@@ -342,9 +391,9 @@ async def fetch_page_artifacts(url: str, storage_dir: Path, engine=None) -> Dict
|
||||
source_path = run_dir / "source.txt"
|
||||
results_path = run_dir / "results.json"
|
||||
|
||||
redirects = []
|
||||
downloads = []
|
||||
scripts = []
|
||||
redirects: List[Dict[str, Any]] = []
|
||||
downloads: List[Dict[str, Any]] = []
|
||||
scripts_seen: List[str] = []
|
||||
|
||||
async with async_playwright() as pw:
|
||||
browser = await pw.chromium.launch(
|
||||
@@ -359,10 +408,30 @@ async def fetch_page_artifacts(url: str, storage_dir: Path, engine=None) -> Dict
|
||||
)
|
||||
page = await context.new_page()
|
||||
|
||||
# Event handlers
|
||||
page.on("response", lambda resp: redirects.append({"status": resp.status, "url": resp.url}) if 300 <= resp.status <= 399 else None)
|
||||
page.on("download", lambda d: downloads.append({"url": d.url, "suggested_filename": d.suggested_filename}))
|
||||
page.on("request", lambda r: scripts.append(r.url) if r.url.endswith((".js", ".vbs", ".hta")) else None)
|
||||
# Event handlers (plumbing)
|
||||
def _on_response(resp):
|
||||
try:
|
||||
if 300 <= resp.status <= 399:
|
||||
redirects.append({"status": resp.status, "url": resp.url})
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def _on_download(d):
|
||||
try:
|
||||
downloads.append({"url": d.url, "suggested_filename": d.suggested_filename})
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def _on_request(r):
|
||||
try:
|
||||
if r.url.endswith((".js", ".vbs", ".hta")):
|
||||
scripts_seen.append(r.url)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
page.on("response", _on_response)
|
||||
page.on("download", _on_download)
|
||||
page.on("request", _on_request)
|
||||
|
||||
try:
|
||||
await page.goto(url, wait_until="networkidle", timeout=60000)
|
||||
@@ -378,23 +447,65 @@ async def fetch_page_artifacts(url: str, storage_dir: Path, engine=None) -> Dict
|
||||
await context.close()
|
||||
await browser.close()
|
||||
|
||||
# Read back saved source
|
||||
html_content = source_path.read_text(encoding="utf-8")
|
||||
forms_info = analyze_forms(html_content, final_url)
|
||||
suspicious_scripts = analyze_scripts(html_content, base_url=final_url, engine=engine)
|
||||
|
||||
# Forms analysis (per-form rule checks)
|
||||
forms_info = analyze_forms(html_content, final_url)
|
||||
|
||||
# Scripts artifacts (no detection here)
|
||||
suspicious_scripts = analyze_scripts(html_content, base_url=final_url)
|
||||
|
||||
# Enrichment
|
||||
enrichment = enrich_url(url)
|
||||
|
||||
result = {
|
||||
# Global PASS/FAIL table per category (entire document)
|
||||
rule_checks_overview = build_rule_checks_overview(html_content)
|
||||
|
||||
for blk in rule_checks_overview:
|
||||
current_app.logger.debug(f"[rules] {blk['category']}: {blk['summary']}")
|
||||
|
||||
|
||||
# Assemble single result dict
|
||||
result: Dict[str, Any] = {
|
||||
"uuid": run_uuid,
|
||||
"submitted_url": url,
|
||||
"final_url": final_url,
|
||||
"redirects": redirects,
|
||||
"downloads": downloads,
|
||||
"scripts": scripts,
|
||||
"scripts": scripts_seen,
|
||||
"forms": forms_info,
|
||||
"suspicious_scripts": suspicious_scripts,
|
||||
"rule_checks": rule_checks_overview, # table-ready for UI
|
||||
"enrichment": enrichment
|
||||
}
|
||||
|
||||
safe_write(results_path, json.dumps(result, indent=2))
|
||||
# Persist as the single source of truth for routes
|
||||
safe_write(results_path, json.dumps(result, indent=2, ensure_ascii=False))
|
||||
|
||||
try:
|
||||
current_app.logger.info(f"[browser] Saved results.json for run {run_uuid}")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def load_results(storage_dir: Path, run_uuid: str) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
Load a prior run's results.json from /data/<uuid>/.
|
||||
|
||||
Returns:
|
||||
dict or None
|
||||
"""
|
||||
run_dir = storage_dir / run_uuid
|
||||
results_path = run_dir / "results.json"
|
||||
if not results_path.exists():
|
||||
return None
|
||||
|
||||
try:
|
||||
text = results_path.read_text(encoding="utf-8")
|
||||
data = json.loads(text)
|
||||
return data
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
@@ -7,3 +7,6 @@ cache:
|
||||
recent_runs_count: 10
|
||||
whois_cache_days: 7
|
||||
geoip_cache_days: 7
|
||||
|
||||
ui:
|
||||
snippet_preview_len: 300
|
||||
|
||||
@@ -1,80 +1,138 @@
|
||||
# config/suspicious_rules.yaml
|
||||
# Baseline suspicious rules for SneakyScope
|
||||
# Organized by category: script, form, text
|
||||
# Extend these with more specific rules as needed
|
||||
# Notes:
|
||||
# - Engine compiles regex with IGNORECASE.
|
||||
# - 'severity' is optional: low | medium | high
|
||||
# - 'tags' is optional: list of strings for grouping
|
||||
|
||||
# --- Script Rules ---
|
||||
- name: eval_usage
|
||||
description: "Use of eval() in script"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: "\\beval\\("
|
||||
pattern: '\beval\s*\('
|
||||
severity: high
|
||||
tags: [obfuscation, unsafe-eval]
|
||||
|
||||
- name: new_function_usage
|
||||
description: "Use of Function constructor (new Function)"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: '\bnew\s+Function\s*\('
|
||||
severity: high
|
||||
tags: [obfuscation]
|
||||
|
||||
- name: document_write
|
||||
description: "Use of document.write (often abused in malicious injections)"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: "document\\.write\\("
|
||||
pattern: '\bdocument\s*\.\s*write\s*\('
|
||||
severity: medium
|
||||
tags: [injection, legacy-api]
|
||||
|
||||
- name: inline_event_handler
|
||||
description: "Inline event handler detected (onload, onclick, etc.)"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: "on(load|click|error|mouseover|keydown)\\s*="
|
||||
pattern: '\bon(load|click|error|mouseover|mouseenter|submit|keydown|keyup|change)\s*='
|
||||
severity: medium
|
||||
tags: [inline-handlers, potential-xss]
|
||||
|
||||
- name: obfuscated_encoding
|
||||
description: "Suspicious use of atob() or btoa() (base64 encoding/decoding)"
|
||||
description: "Suspicious use of atob()/btoa() (base64 encode/decode)"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: "\\b(atob|btoa)\\("
|
||||
pattern: '\b(atob|btoa)\s*\('
|
||||
severity: medium
|
||||
tags: [encoding, obfuscation]
|
||||
|
||||
- name: suspicious_iframe
|
||||
description: "Iframe usage in script (possible phishing/malvertising)"
|
||||
- name: unescape_usage
|
||||
description: "Use of unescape() (legacy/obfuscation)"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: "<iframe[^>]*>"
|
||||
pattern: '\bunescape\s*\('
|
||||
severity: low
|
||||
tags: [legacy-api, obfuscation]
|
||||
|
||||
- name: string_timer_usage
|
||||
description: "String passed to setTimeout/setInterval (sink for XSS)"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: '\bset(?:Timeout|Interval)\s*\(\s*[''"`].+[''"`]\s*,'
|
||||
severity: medium
|
||||
tags: [xss-sink]
|
||||
|
||||
- name: long_hex_constants
|
||||
description: "Long hex-like constants (possible obfuscation)"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: '["'']?0x[0-9a-fA-F]{16,}["'']?'
|
||||
severity: low
|
||||
tags: [obfuscation]
|
||||
|
||||
# --- Form Rules ---
|
||||
- name: suspicious_form_action
|
||||
description: "Form action with external URL (potential credential exfiltration)"
|
||||
- name: suspicious_form_action_absolute
|
||||
description: "Form action uses absolute URL (potential credential exfiltration)"
|
||||
category: form
|
||||
type: regex
|
||||
pattern: "<form[^>]*action=['\"]http"
|
||||
pattern: '<form\b[^>]*\baction\s*=\s*[''"]https?://'
|
||||
severity: medium
|
||||
tags: [exfiltration, form]
|
||||
|
||||
- name: hidden_inputs
|
||||
description: "Form with hidden inputs (possible credential harvesting)"
|
||||
description: "Form with hidden inputs (could be used to smuggle data)"
|
||||
category: form
|
||||
type: regex
|
||||
pattern: "<input[^>]*type=['\"]hidden"
|
||||
pattern: '<input\b[^>]*\btype\s*=\s*[''"]hidden[''"]'
|
||||
severity: low
|
||||
tags: [stealth, form]
|
||||
|
||||
- name: password_field
|
||||
description: "Form requesting password field"
|
||||
description: "Form requests a password field"
|
||||
category: form
|
||||
type: regex
|
||||
pattern: "<input[^>]*type=['\"]password"
|
||||
pattern: '<input\b[^>]*\btype\s*=\s*[''"]password[''"]'
|
||||
severity: high
|
||||
tags: [credentials, form]
|
||||
|
||||
# --- Text Rules (Social Engineering / BEC) ---
|
||||
- name: urgent_request
|
||||
description: "Language suggesting urgency (common in phishing/BEC)"
|
||||
category: text
|
||||
type: regex
|
||||
pattern: "(urgent|immediately|asap|action required)"
|
||||
pattern: '\b(urgent|immediately|asap|action\s*required|verify\s*now)\b'
|
||||
severity: medium
|
||||
tags: [bec, urgency]
|
||||
|
||||
- name: account_suspension
|
||||
description: "Threat of account suspension/closure"
|
||||
category: text
|
||||
type: regex
|
||||
pattern: "(account.*suspend|account.*close|verify.*account)"
|
||||
pattern: '\b(account\s*(suspend|closure|close)|verify\s*account)\b'
|
||||
severity: medium
|
||||
tags: [bec, scare-tactics]
|
||||
|
||||
- name: financial_request
|
||||
description: "Request for gift cards, wire transfer, or money"
|
||||
category: text
|
||||
type: regex
|
||||
pattern: "(gift card|wire transfer|bank account|bitcoin|payment required)"
|
||||
pattern: '\b(gift\s*card|wire\s*transfer|bank\s*account|bitcoin|crypto|payment\s*required)\b'
|
||||
severity: high
|
||||
tags: [bec, financial]
|
||||
|
||||
- name: credential_reset
|
||||
description: "Password reset or credential reset wording"
|
||||
category: text
|
||||
type: regex
|
||||
pattern: "(reset password|update credentials|login to verify)"
|
||||
pattern: '\b(reset\s*password|update\s*credentials|log\s*in\s*to\s*verify|password\s*expiry)\b'
|
||||
severity: medium
|
||||
tags: [bec, credentials]
|
||||
|
||||
- name: suspicious_iframe
|
||||
description: "Iframe tag present (possible phishing/malvertising/drive-by)"
|
||||
category: text
|
||||
type: regex
|
||||
pattern: '<iframe\b[^>]*\bsrc\s*=\s*[''"][^''"]+[''"]'
|
||||
severity: medium
|
||||
tags: [iframe, phishing, malvertising]
|
||||
|
||||
@@ -65,7 +65,7 @@ def analyze():
|
||||
|
||||
try:
|
||||
engine = current_app.config.get("RULE_ENGINE")
|
||||
result = asyncio.run(fetch_page_artifacts(url, storage, engine=engine))
|
||||
result = asyncio.run(fetch_page_artifacts(url, storage))
|
||||
# result = asyncio.run(fetch_page_artifacts(url, storage))
|
||||
current_app.logger.info(f"[+] Analysis done for {url}")
|
||||
except Exception as e:
|
||||
|
||||
203
app/rules/function_rules.py
Normal file
203
app/rules/function_rules.py
Normal file
@@ -0,0 +1,203 @@
|
||||
"""
|
||||
app/rules/function_rules.py
|
||||
|
||||
Class-based adapters + function-based rules for SneakyScope.
|
||||
|
||||
Design:
|
||||
- FactAdapter: converts text snippets into structured 'facts' dicts by category.
|
||||
- FunctionRuleAdapter: wraps a rule function (expects dict facts) so it can be
|
||||
used directly by the RuleEngine even when the engine is given strings.
|
||||
|
||||
Each rule returns (matched: bool, reason: Optional[str]).
|
||||
If matched is True, 'reason' should explain why.
|
||||
|
||||
Note:
|
||||
- Form rules work today with text snippets, thanks to FunctionRuleAdapter+FactAdapter.
|
||||
- Script rules expect per-script dict facts (src/base_hostname/etc.). They are
|
||||
registered now and will fully activate when you evaluate per-script contexts.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, Optional
|
||||
from urllib.parse import urlparse
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Adapters
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class FactAdapter:
|
||||
"""
|
||||
Converts raw text/html snippets into structured 'facts' suitable for
|
||||
function-based rules. If input is already a dict, returns it unchanged.
|
||||
|
||||
You can expand the per-category parsers over time as needed.
|
||||
"""
|
||||
|
||||
def __init__(self, logger: Optional[Any] = None) -> None:
|
||||
self.logger = logger
|
||||
|
||||
def adapt(self, text_or_facts: Any, category: str = "") -> Dict[str, Any]:
|
||||
"""
|
||||
Adapt text_or_facts (str or dict) into a facts dict.
|
||||
|
||||
Args:
|
||||
text_or_facts: Either raw string snippet or an already-structured dict.
|
||||
category: 'form' | 'script' | 'text' | ... (used to choose parser)
|
||||
"""
|
||||
# Already structured — pass through
|
||||
if isinstance(text_or_facts, dict):
|
||||
# Ensure a category key for consistency (optional)
|
||||
text_or_facts.setdefault("category", category or text_or_facts.get("category") or "")
|
||||
return text_or_facts
|
||||
|
||||
# String snippets are parsed by category
|
||||
if isinstance(text_or_facts, str):
|
||||
if category == "form":
|
||||
return self._adapt_form_snippet(text_or_facts)
|
||||
elif category == "script":
|
||||
# For now, we don't parse script snippets into facts. Script rules expect
|
||||
# per-script dicts (src/base_hostname/etc.), which you'll provide when you
|
||||
# add per-script evaluation. Return minimal facts for safety.
|
||||
return {"category": "script", "raw": text_or_facts}
|
||||
elif category == "text":
|
||||
return {"category": "text", "raw": text_or_facts}
|
||||
else:
|
||||
if self.logger:
|
||||
self.logger.warning(f"[FactAdapter] Unknown category '{category}', returning raw snippet.")
|
||||
return {"category": category, "raw": text_or_facts}
|
||||
|
||||
# Fallback for unrecognized input types
|
||||
if self.logger:
|
||||
self.logger.warning(f"[FactAdapter] Unsupported input type: {type(text_or_facts)!r}")
|
||||
return {"category": category, "raw": text_or_facts}
|
||||
|
||||
# ---- Per-category parsers ----
|
||||
|
||||
def _adapt_form_snippet(self, snippet: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Parse the simple form snippet format used by browser.py today, e.g.:
|
||||
|
||||
action=https://example.com/post
|
||||
method=post
|
||||
inputs=
|
||||
- name=email type=text
|
||||
- name=password type=password
|
||||
|
||||
Only extracts fields needed by current function rules.
|
||||
"""
|
||||
facts: Dict[str, Any] = {"category": "form", "raw": snippet}
|
||||
|
||||
lines = snippet.splitlines()
|
||||
i = 0
|
||||
n = len(lines)
|
||||
while i < n:
|
||||
line = (lines[i] or "").strip()
|
||||
if line.startswith("action="):
|
||||
facts["action"] = line.split("=", 1)[1].strip()
|
||||
elif line.startswith("method="):
|
||||
facts["method"] = line.split("=", 1)[1].strip()
|
||||
i = i + 1
|
||||
|
||||
# Normalize context keys expected by form rules
|
||||
facts.setdefault("base_url", "") # filled by caller later if desired
|
||||
facts.setdefault("base_hostname", "") # filled by caller later if desired
|
||||
return facts
|
||||
|
||||
|
||||
class FunctionRuleAdapter:
|
||||
"""
|
||||
Callable wrapper that adapts engine input (str or dict) into 'facts' and then
|
||||
invokes the underlying function rule that expects a facts dict.
|
||||
|
||||
Usage:
|
||||
wrapped = FunctionRuleAdapter(fn=form_action_missing, category="form", adapter=FactAdapter(app.logger))
|
||||
matched, reason = wrapped("action=https://...") # engine-friendly
|
||||
"""
|
||||
|
||||
def __init__(self, fn, category: str = "", adapter: Optional[FactAdapter] = None) -> None:
|
||||
self.fn = fn
|
||||
self.category = category
|
||||
self.adapter = adapter or FactAdapter()
|
||||
|
||||
def __call__(self, text_or_facts: Any):
|
||||
facts = self.adapter.adapt(text_or_facts, category=self.category)
|
||||
return self.fn(facts)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Function-based rules (dict 'facts' expected)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# ---------------- Script rules ----------------
|
||||
|
||||
def script_src_uses_data_or_blob(facts: Dict[str, Any]):
|
||||
"""Flags <script> tags with src='data:' or 'blob:'."""
|
||||
src = facts.get("src") or ""
|
||||
if isinstance(src, str) and src.startswith(("data:", "blob:")):
|
||||
scheme = src.split(":", 1)[0]
|
||||
return True, f"Script src uses {scheme}: URL"
|
||||
return False, None
|
||||
|
||||
|
||||
def script_src_has_dangerous_extension(facts: Dict[str, Any]):
|
||||
"""Flags <script> tags with dangerous file extensions (e.g., .vbs, .hta)."""
|
||||
src = facts.get("src") or ""
|
||||
if not isinstance(src, str):
|
||||
return False, None
|
||||
low = src.lower()
|
||||
dangerous = (".vbs", ".hta")
|
||||
i = 0
|
||||
m = len(dangerous)
|
||||
while i < m:
|
||||
ext = dangerous[i]
|
||||
if low.endswith(ext):
|
||||
return True, f"External script has dangerous extension ({ext})"
|
||||
i = i + 1
|
||||
return False, None
|
||||
|
||||
|
||||
def script_third_party_host(facts: Dict[str, Any]):
|
||||
"""Flags scripts loaded from a different hostname than the page."""
|
||||
base_host = facts.get("base_hostname") or ""
|
||||
src_host = facts.get("src_hostname") or ""
|
||||
if base_host and src_host and base_host != src_host:
|
||||
return True, f"Third-party script host: {src_host}"
|
||||
return False, None
|
||||
|
||||
|
||||
# ---------------- Form rules ----------------
|
||||
|
||||
def form_submits_to_different_host(facts: Dict[str, Any]):
|
||||
"""Flags <form> actions that submit to a different hostname than the page."""
|
||||
base_host = facts.get("base_hostname") or ""
|
||||
action = facts.get("action") or ""
|
||||
try:
|
||||
action_host = urlparse(action).hostname
|
||||
if action_host and base_host and action_host != base_host:
|
||||
return True, "Form submits to a different host"
|
||||
except Exception:
|
||||
# Parsing failed; treat as no match rather than erroring out
|
||||
pass
|
||||
return False, None
|
||||
|
||||
|
||||
def form_http_on_https_page(facts: Dict[str, Any]):
|
||||
"""Flags forms submitting over HTTP while the page was loaded over HTTPS."""
|
||||
base_url = facts.get("base_url") or ""
|
||||
action = facts.get("action") or ""
|
||||
try:
|
||||
if urlparse(base_url).scheme == "https" and urlparse(action).scheme == "http":
|
||||
return True, "Form submits over insecure HTTP"
|
||||
except Exception:
|
||||
pass
|
||||
return False, None
|
||||
|
||||
|
||||
def form_action_missing(facts: Dict[str, Any]):
|
||||
"""Flags <form> elements with no action attribute."""
|
||||
action = (facts.get("action") or "").strip()
|
||||
if not action:
|
||||
return True, "Form has no action attribute"
|
||||
return False, None
|
||||
@@ -286,3 +286,67 @@ details ul, details p {
|
||||
.scripts-table td small {
|
||||
opacity: 0.85;
|
||||
}
|
||||
/* keep the table from exploding */
|
||||
.scripts-table {
|
||||
table-layout: fixed;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
/* columns: Type | Source URL | Snippet | Matches */
|
||||
.scripts-table th:nth-child(1) { width: 8rem; }
|
||||
.scripts-table th:nth-child(2) { width: 32rem; } /* tweak as you like */
|
||||
.scripts-table th:nth-child(3) { width: 24rem; }
|
||||
.scripts-table th:nth-child(4) { width: auto; }
|
||||
|
||||
/* ellipsize the table cells by default */
|
||||
.scripts-table td, .scripts-table th {
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
/* let URLs/snippets wrap *inside* their cell when expanded content shows */
|
||||
.breakable {
|
||||
white-space: normal;
|
||||
overflow-wrap: anywhere;
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
/* when user opens <details>, keep code readable without blowing layout */
|
||||
details pre.code {
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
max-height: 18rem;
|
||||
overflow: auto;
|
||||
}
|
||||
|
||||
/* Generic badge */
|
||||
.badge {
|
||||
display: inline-block;
|
||||
padding: 0.1rem 0.4rem;
|
||||
margin-left: 0.35rem;
|
||||
border-radius: 0.4rem;
|
||||
font-size: 0.75rem;
|
||||
line-height: 1;
|
||||
vertical-align: middle;
|
||||
user-select: none;
|
||||
}
|
||||
|
||||
/* Severity colors */
|
||||
.sev-high { background: #fdecea; color: #b71c1c; border: 1px solid #f5c6c4; }
|
||||
.sev-medium { background: #fff8e1; color: #8a6d3b; border: 1px solid #ffe0a3; }
|
||||
.sev-low { background: #e8f5e9; color: #1b5e20; border: 1px solid #b9e6be; }
|
||||
|
||||
/* Tag chips */
|
||||
.chip {
|
||||
display: inline-block;
|
||||
padding: 0.1rem 0.35rem;
|
||||
margin-left: 0.25rem;
|
||||
border-radius: 999px;
|
||||
font-size: 0.7rem;
|
||||
line-height: 1;
|
||||
background: #eef2f7;
|
||||
color: #425466;
|
||||
border: 1px solid #d9e2ec;
|
||||
}
|
||||
|
||||
|
||||
@@ -187,9 +187,9 @@
|
||||
<td>{{ s.type or 'unknown' }}</td>
|
||||
|
||||
<!-- Source URL -->
|
||||
<td>
|
||||
<td class="breakable">
|
||||
{% if s.src %}
|
||||
<a href="{{ s.src }}" target="_blank">{{ s.src }}</a>
|
||||
<a href="{{ s.src }}" target="_blank">{{ s.src[:50] }}</a>
|
||||
{% else %}
|
||||
N/A
|
||||
{% endif %}
|
||||
@@ -199,8 +199,8 @@
|
||||
<td>
|
||||
{% if s.content_snippet %}
|
||||
<details>
|
||||
<summary>View snippet</summary>
|
||||
<pre class="code">{{ s.content_snippet }}</pre>
|
||||
<summary>View snippet ({{ s.content_snippet|length }} chars) </summary>
|
||||
<pre class="code">({{ s.content_snippet}}</pre>
|
||||
</details>
|
||||
{% else %}
|
||||
N/A
|
||||
@@ -218,6 +218,14 @@
|
||||
{% for r in s.rules %}
|
||||
<li title="{{ r.description or '' }}">
|
||||
{{ r.name }}
|
||||
{% if r.severity %}
|
||||
<span class="badge sev-{{ r.severity|lower }}">{{ r.severity|title }}</span>
|
||||
{% endif %}
|
||||
{% if r.tags %}
|
||||
{% for t in r.tags %}
|
||||
<span class="chip" title="Tag: {{ t }}">{{ t }}</span>
|
||||
{% endfor %}
|
||||
{% endif %}
|
||||
{% if r.description %}
|
||||
<small>— {{ r.description }}</small>
|
||||
{% endif %}
|
||||
|
||||
@@ -1,132 +1,291 @@
|
||||
"""
|
||||
rules_engine.py
|
||||
|
||||
A flexible rule-based engine for detecting suspicious patterns in scripts, forms,
|
||||
or other web artifacts inside SneakyScope.
|
||||
|
||||
Each rule is defined as:
|
||||
- name: str # Rule identifier
|
||||
- description: str # Human-readable reason for analysts
|
||||
- category: str # e.g., 'script', 'form', 'text', 'generic'
|
||||
- type: str # 'regex' or 'function'
|
||||
- pattern: str # Regex pattern (if type=regex)
|
||||
- function: callable # Python function returning (bool, str) (if type=function)
|
||||
|
||||
The framework returns a list of results, with pass/fail and reasoning.
|
||||
Flask-logger integrated rules engine for SneakyScope.
|
||||
Logs go to `current_app.logger` when a Flask app context is active,
|
||||
otherwise to a namespaced standard logger "sneakyscope.rules".
|
||||
"""
|
||||
|
||||
import re
|
||||
import logging
|
||||
from dataclasses import dataclass, asdict, field
|
||||
from pathlib import Path
|
||||
from typing import Callable, Dict, List, Tuple, Union
|
||||
from typing import Callable, Dict, List, Optional, Tuple, Union
|
||||
|
||||
import yaml
|
||||
|
||||
try:
|
||||
# Flask is optional; engine still works without it.
|
||||
from flask import current_app, has_app_context
|
||||
except Exception:
|
||||
current_app = None # type: ignore
|
||||
def has_app_context() -> bool: # type: ignore
|
||||
return False
|
||||
|
||||
class Rule:
|
||||
"""Represents a single detection rule."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
name: str,
|
||||
description: str,
|
||||
category: str,
|
||||
rule_type: str = "regex",
|
||||
pattern: str = None,
|
||||
function: Callable = None,
|
||||
):
|
||||
self.name = name
|
||||
self.description = description
|
||||
self.category = category
|
||||
self.rule_type = rule_type
|
||||
self.pattern = pattern
|
||||
self.function = function
|
||||
|
||||
def run(self, text: str) -> Tuple[bool, str]:
|
||||
def get_engine_logger() -> logging.Logger:
|
||||
"""
|
||||
Run the rule on given text.
|
||||
Return a logger that prefers Flask's current_app.logger if available.
|
||||
Falls back to a namespaced standard logger otherwise.
|
||||
"""
|
||||
if has_app_context() and current_app is not None and hasattr(current_app, "logger"):
|
||||
return current_app.logger
|
||||
return logging.getLogger("sneakyscope.rules")
|
||||
|
||||
|
||||
@dataclass
|
||||
class Rule:
|
||||
"""
|
||||
Represents a single detection rule.
|
||||
|
||||
When rule_type == 'regex', 'pattern' must be provided.
|
||||
When rule_type == 'function', 'function' must be provided and return (matched: bool, reason: str).
|
||||
"""
|
||||
name: str
|
||||
description: str
|
||||
category: str
|
||||
rule_type: str = "regex"
|
||||
pattern: Optional[str] = None
|
||||
function: Optional[Callable[[str], Tuple[bool, str]]] = None
|
||||
severity: Optional[str] = None # 'low' | 'medium' | 'high' (optional)
|
||||
tags: Optional[List[str]] = field(default=None) # e.g., ['obfuscation', 'phishing'] (optional)
|
||||
|
||||
# Internal compiled regex cache (not serialized)
|
||||
_compiled_regex: Optional[re.Pattern] = field(default=None, repr=False, compare=False)
|
||||
|
||||
def compile_if_needed(self, logger: Optional[logging.Logger] = None) -> bool:
|
||||
"""
|
||||
Compile the regex pattern once for performance, if applicable.
|
||||
|
||||
Returns:
|
||||
bool: True if the regex is compiled and ready, False otherwise.
|
||||
"""
|
||||
if logger is None:
|
||||
logger = get_engine_logger()
|
||||
|
||||
if self.rule_type == "regex" and self.pattern:
|
||||
try:
|
||||
self._compiled_regex = re.compile(self.pattern, re.IGNORECASE)
|
||||
logger.debug(f"[Rule] Compiled regex for '{self.name}'")
|
||||
return True
|
||||
except re.error as rex:
|
||||
self._compiled_regex = None
|
||||
logger.warning(f"[Rule] Failed to compile regex for '{self.name}': {rex}")
|
||||
return False
|
||||
return False
|
||||
|
||||
def run(self, text: str, logger: Optional[logging.Logger] = None) -> Tuple[bool, str]:
|
||||
"""
|
||||
Run the rule on the given text.
|
||||
|
||||
Returns:
|
||||
(matched: bool, reason: str)
|
||||
"""
|
||||
if self.rule_type == "regex" and self.pattern:
|
||||
if re.search(self.pattern, text, re.IGNORECASE):
|
||||
if logger is None:
|
||||
logger = get_engine_logger()
|
||||
|
||||
if self.rule_type == "regex":
|
||||
if not self.pattern:
|
||||
logger.warning(f"[Rule] '{self.name}' missing regex pattern.")
|
||||
return False, "Invalid rule configuration: missing pattern"
|
||||
|
||||
if self._compiled_regex is None:
|
||||
compiled_ok = self.compile_if_needed(logger=logger)
|
||||
if not compiled_ok:
|
||||
return False, f"Invalid regex pattern: {self.pattern!r}"
|
||||
|
||||
if self._compiled_regex and self._compiled_regex.search(text):
|
||||
return True, f"Matched regex '{self.pattern}' → {self.description}"
|
||||
else:
|
||||
return False, "No match"
|
||||
elif self.rule_type == "function" and callable(self.function):
|
||||
return self.function(text)
|
||||
else:
|
||||
return False, "Invalid rule configuration"
|
||||
|
||||
if self.rule_type == "function":
|
||||
if callable(self.function):
|
||||
try:
|
||||
matched, reason = self.function(text)
|
||||
if isinstance(matched, bool) and isinstance(reason, str):
|
||||
return matched, reason
|
||||
logger.warning(f"[Rule] '{self.name}' function returned invalid types.")
|
||||
return False, "Invalid function return type; expected (bool, str)"
|
||||
except Exception as exc:
|
||||
logger.exception(f"[Rule] '{self.name}' function raised exception.")
|
||||
return False, f"Rule function raised exception: {exc!r}"
|
||||
logger.warning(f"[Rule] '{self.name}' has invalid function configuration.")
|
||||
return False, "Invalid rule configuration: function not callable"
|
||||
|
||||
logger.warning(f"[Rule] '{self.name}' has unknown type '{self.rule_type}'.")
|
||||
return False, f"Invalid rule configuration: unknown type '{self.rule_type}'"
|
||||
|
||||
|
||||
@dataclass
|
||||
class RuleResult:
|
||||
"""
|
||||
Uniform per-rule outcome for UI/API consumption.
|
||||
|
||||
result is "PASS" or "FAIL" (FAIL == matched True)
|
||||
"""
|
||||
name: str
|
||||
description: str
|
||||
category: str
|
||||
result: str # "PASS" | "FAIL"
|
||||
reason: Optional[str] = None
|
||||
severity: Optional[str] = None
|
||||
tags: Optional[List[str]] = None
|
||||
|
||||
|
||||
class RuleEngine:
|
||||
"""Loads and executes rules against provided text."""
|
||||
"""
|
||||
Loads and executes rules against provided text, with Flask-aware logging.
|
||||
"""
|
||||
|
||||
def __init__(self, rules: List[Rule] = None):
|
||||
self.rules = rules or []
|
||||
def __init__(self, rules: Optional[List[Rule]] = None, logger: Optional[logging.Logger] = None):
|
||||
"""
|
||||
Args:
|
||||
rules: Optional initial rule list.
|
||||
logger: Optional explicit logger. If None, uses Flask app logger if available,
|
||||
otherwise a namespaced standard logger.
|
||||
"""
|
||||
if logger is None:
|
||||
self.logger = get_engine_logger()
|
||||
else:
|
||||
self.logger = logger
|
||||
|
||||
def add_rule(self, rule: Rule):
|
||||
"""Add a new rule at runtime."""
|
||||
self.rules: List[Rule] = rules or []
|
||||
self._compile_all()
|
||||
|
||||
def _compile_all(self) -> None:
|
||||
"""
|
||||
Compile all regex rules at initialization and warn about failures.
|
||||
"""
|
||||
index = 0
|
||||
total = len(self.rules)
|
||||
while index < total:
|
||||
rule = self.rules[index]
|
||||
if rule.rule_type == "regex":
|
||||
compiled_ok = rule.compile_if_needed(logger=self.logger)
|
||||
if not compiled_ok:
|
||||
self.logger.warning(f"[Engine] Regex failed at init for rule '{rule.name}' (pattern={rule.pattern!r})")
|
||||
index = index + 1
|
||||
|
||||
def add_rule(self, rule: Rule) -> None:
|
||||
"""
|
||||
Add a new rule at runtime; compiles regex if needed and logs failures.
|
||||
"""
|
||||
self.rules.append(rule)
|
||||
if rule.rule_type == "regex":
|
||||
compiled_ok = rule.compile_if_needed(logger=self.logger)
|
||||
if not compiled_ok:
|
||||
self.logger.warning(f"[Engine] Regex failed when adding rule '{rule.name}' (pattern={rule.pattern!r})")
|
||||
|
||||
def run_all(self, text: str, category: str = None) -> List[Dict]:
|
||||
def run_all(self, text: str, category: Optional[str] = None) -> List[Dict]:
|
||||
"""
|
||||
Run all rules against text.
|
||||
|
||||
Args:
|
||||
text: str → the content to test
|
||||
category: str → optional, only run rules in this category
|
||||
text: The content to test.
|
||||
category: If provided, only evaluate rules that match this category.
|
||||
|
||||
Returns:
|
||||
List of dicts with rule results.
|
||||
List of dicts with PASS/FAIL per rule (JSON-serializable).
|
||||
"""
|
||||
results = []
|
||||
for rule in self.rules:
|
||||
if category and rule.category != category:
|
||||
results: List[Dict] = []
|
||||
|
||||
index = 0
|
||||
total = len(self.rules)
|
||||
while index < total:
|
||||
rule = self.rules[index]
|
||||
|
||||
if category is not None and rule.category != category:
|
||||
index = index + 1
|
||||
continue
|
||||
|
||||
matched, reason = rule.run(text)
|
||||
results.append(
|
||||
{
|
||||
"rule": rule.name,
|
||||
"category": rule.category,
|
||||
"matched": matched,
|
||||
"reason": reason if matched else None,
|
||||
}
|
||||
matched, reason = rule.run(text, logger=self.logger)
|
||||
|
||||
result_str = "FAIL" if matched else "PASS"
|
||||
reason_to_include: Optional[str]
|
||||
if matched:
|
||||
reason_to_include = reason
|
||||
else:
|
||||
reason_to_include = None
|
||||
|
||||
rr = RuleResult(
|
||||
name=rule.name,
|
||||
description=rule.description,
|
||||
category=rule.category,
|
||||
result=result_str,
|
||||
reason=reason_to_include,
|
||||
severity=rule.severity,
|
||||
tags=rule.tags,
|
||||
)
|
||||
|
||||
results.append(asdict(rr))
|
||||
index = index + 1
|
||||
|
||||
self.logger.debug(f"[Engine] Completed evaluation. Returned {len(results)} rule results.")
|
||||
return results
|
||||
|
||||
|
||||
def load_rules_from_yaml(yaml_file: Union[str, Path]) -> List[Rule]:
|
||||
def load_rules_from_yaml(yaml_file: Union[str, Path], logger: Optional[logging.Logger] = None) -> List[Rule]:
|
||||
"""
|
||||
Load rules from a YAML file.
|
||||
|
||||
Example YAML format:
|
||||
Supports optional 'severity' and 'tags' keys.
|
||||
|
||||
Example YAML:
|
||||
- name: suspicious_eval
|
||||
description: "Use of eval() in script"
|
||||
category: script
|
||||
type: regex
|
||||
pattern: "\\beval\\("
|
||||
severity: medium
|
||||
tags: [obfuscation]
|
||||
|
||||
- name: password_reset
|
||||
description: "Password reset wording"
|
||||
category: text
|
||||
type: regex
|
||||
pattern: "reset password"
|
||||
|
||||
Returns:
|
||||
List[Rule]
|
||||
"""
|
||||
rules = []
|
||||
with open(yaml_file, "r", encoding="utf-8") as f:
|
||||
if logger is None:
|
||||
logger = get_engine_logger()
|
||||
|
||||
rules: List[Rule] = []
|
||||
|
||||
path = Path(yaml_file)
|
||||
with path.open("r", encoding="utf-8") as f:
|
||||
data = yaml.safe_load(f)
|
||||
|
||||
for item in data:
|
||||
rule = Rule(
|
||||
name=item["name"],
|
||||
description=item["description"],
|
||||
category=item["category"],
|
||||
rule_type=item.get("type", "regex"),
|
||||
pattern=item.get("pattern"),
|
||||
)
|
||||
rules.append(rule)
|
||||
if not isinstance(data, list):
|
||||
logger.error("[Loader] Rules YAML must be a list of rule objects.")
|
||||
raise ValueError("Rules YAML must be a list of rule objects.")
|
||||
|
||||
idx = 0
|
||||
total = len(data)
|
||||
while idx < total:
|
||||
item = data[idx]
|
||||
|
||||
name = item.get("name")
|
||||
description = item.get("description")
|
||||
category = item.get("category")
|
||||
rule_type = item.get("type", "regex")
|
||||
pattern = item.get("pattern")
|
||||
severity = item.get("severity")
|
||||
tags = item.get("tags")
|
||||
|
||||
if not name or not description or not category:
|
||||
logger.warning(f"[Loader] Skipping invalid rule at index {idx}: missing required fields.")
|
||||
idx = idx + 1
|
||||
continue
|
||||
|
||||
rule = Rule(
|
||||
name=name,
|
||||
description=description,
|
||||
category=category,
|
||||
rule_type=rule_type,
|
||||
pattern=pattern,
|
||||
function=None, # function rules should be registered in code
|
||||
severity=severity,
|
||||
tags=tags if isinstance(tags, list) else None,
|
||||
)
|
||||
|
||||
rules.append(rule)
|
||||
idx = idx + 1
|
||||
|
||||
logger.info(f"[Loader] Loaded {len(rules)} rules from '{yaml_file}'.")
|
||||
return rules
|
||||
|
||||
@@ -39,6 +39,10 @@ BASE_DIR = Path(__file__).resolve().parent.parent
|
||||
DEFAULT_SETTINGS_FILE = BASE_DIR / "config" / "settings.yaml"
|
||||
|
||||
# ---------- CONFIG DATA CLASSES ----------
|
||||
@dataclass
|
||||
class UIConfig:
|
||||
snippet_preview_len: int = 160
|
||||
|
||||
@dataclass
|
||||
class Cache_Config:
|
||||
whois_cache_days: int = 7
|
||||
@@ -56,6 +60,7 @@ class AppConfig:
|
||||
@dataclass
|
||||
class Settings:
|
||||
cache: Cache_Config = field(default_factory=Cache_Config)
|
||||
ui: UIConfig = field(default_factory=UIConfig)
|
||||
app: AppConfig = field(default_factory=AppConfig)
|
||||
|
||||
@classmethod
|
||||
|
||||
31
docs/Feature Session Plan Template.md
Normal file
31
docs/Feature Session Plan Template.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Feature Session Plan – SneakyScope
|
||||
|
||||
**Feature:**
|
||||
|
||||
* \[Short description of the feature or improvement]
|
||||
|
||||
**Effort:**
|
||||
|
||||
* Easy / Medium / Hard
|
||||
|
||||
**Dependencies:**
|
||||
|
||||
* \[List of prerequisites or related tasks that must be done first]
|
||||
|
||||
**Design Notes:**
|
||||
|
||||
* \[Goals, considerations, analyst/UX needs, edge cases, pitfalls to avoid]
|
||||
|
||||
**Implementation Tasks:**
|
||||
|
||||
* [ ] Step 1 – …
|
||||
* [ ] Step 2 – …
|
||||
* [ ] Step 3 – …
|
||||
|
||||
**Validation / Testing:**
|
||||
|
||||
* \[How we’ll verify it works — e.g., test cases, UI check, API output, logs]
|
||||
|
||||
**Next Steps After Completion:**
|
||||
|
||||
* \[What this unblocks or enables, i.e. the next feature/dependency]
|
||||
6
docs/README.md
Normal file
6
docs/README.md
Normal file
@@ -0,0 +1,6 @@
|
||||
# Roadmap and chats
|
||||
|
||||
## Vibecode? Brotha Ewww
|
||||
No, I don't "vibe code". There is a huge difference between asking AI to do everything vs "give me a boiler plate function" and tweaking from there. I've been coding over 20 years, these fingers have typed enough. So, yes I use AI while I code.
|
||||
|
||||
Some of these little files in here are just helpful ways I've started using AI to help keep me on track with the project. Feel free to borrow.
|
||||
@@ -1,71 +1,32 @@
|
||||
# SneakyScope — Roadmap (Updated 8-20-25)
|
||||
|
||||
## Priority 1 – Core Functionality / Stability
|
||||
## Priority 1 – Core Analysis / Stability
|
||||
|
||||
**Permissions / Storage Paths**
|
||||
* Opt-in “fetch external scripts” mode (off by default): on submission, download external script content (size/time limits) and run rules on fetched content.
|
||||
* Remove remaining legacy form “flagged\_reasons” plumbing once all equivalent function rules are in place.
|
||||
* Unit tests: YAML compilation, function-rule adapters, and per-script/per-form rule cases.
|
||||
|
||||
* ✅ `/data` and other mounted volumes setup handled by `sandbox.sh`
|
||||
* ✅ Downloads, screenshots, and HTML artifacts are written correctly (`safe_write` in `io_helpers.py`)
|
||||
## Priority 2 – API Layer
|
||||
|
||||
---
|
||||
* API endpoints: `/screenshot`, `/source`, `/analyse`.
|
||||
* OpenAPI spec: create `openapi/openapi.yaml` and serve at `/api/openapi.yaml`.
|
||||
* Docs UI: Swagger UI or Redoc at `/docs`.
|
||||
|
||||
## Priority 2 – Data Accuracy / Enrichment
|
||||
## Priority 3 – UI / UX
|
||||
|
||||
**WHOIS & GeoIP Enhancements**
|
||||
* Front page/input handling: auto-prepend `http://`/`https://`/`www.` for bare domains.
|
||||
* Source code viewer: embed page source in an editor view for readability.
|
||||
* Scripts table: toggle between “Only suspicious” and “All scripts”.
|
||||
* Rules Lab (WYSIWYG tester): paste a rule, validate/compile, run against sample text; lightweight nav entry.
|
||||
|
||||
* ✅ Implemented Python-based WHOIS parsing with fallback to raw WHOIS text
|
||||
* ✅ Default `"Possible Privacy"` or `"N/A"` for missing WHOIS fields
|
||||
* ✅ GeoIP + ASN + ISP info displayed per IP in **accordion tables**
|
||||
* ✅ Cache WHOIS and GeoIP results to reduce repeated queries
|
||||
## Priority 4 – Artifact Management & Ops
|
||||
|
||||
**Suspicious Scripts & Forms**
|
||||
* Retention/cleanup policy for old artifacts (age/size thresholds).
|
||||
* Make periodic maintenance scripts for storage; cleanup options set in `settings.yaml`.
|
||||
* Results caching UX: add “Re-run analysis” vs. “Load from cache” controls in the results UI.
|
||||
|
||||
* [ ] Expand flagged script and form output with reasons for analysts
|
||||
* [ ] Show each check and if it triggered flags (pass/fail for each check)
|
||||
## Priority 5 – Extras / Integrations
|
||||
|
||||
**Add Suspicious BEC words**
|
||||
|
||||
* ✅ Look for things like `"reset password"`
|
||||
* ✅ Make configurable via a config file (yaml doc with rules)
|
||||
|
||||
---
|
||||
|
||||
## Priority 3 – User Interface / UX
|
||||
|
||||
**Front Page / Input Handling**
|
||||
|
||||
* [ ] Automatically prepend `http://`, `https://`, and/or `www.` if a user only enters a domain
|
||||
|
||||
**Result Templates / Cards**
|
||||
* [ ] load sourcecode for webpage in a code editor view or code block on page so that it's easier to read
|
||||
* [ ] Update result cards with clear, analyst-friendly explanations
|
||||
* [ ] Include flagged logic and reason lists for scripts and forms
|
||||
* ✅ Display GeoIP results in accordion tables (✅ done)
|
||||
|
||||
---
|
||||
|
||||
## Priority 4 – API Layer
|
||||
|
||||
**API Endpoints**
|
||||
|
||||
* [ ] Add `/screenshot` endpoint
|
||||
* [ ] Add `/source` endpoint
|
||||
* [ ] Add `/analyse` endpoint
|
||||
|
||||
**OpenAPI + Docs**
|
||||
|
||||
* [ ] Create initial `openapi/openapi.yaml` spec file
|
||||
* [ ] Serve spec at `/api/openapi.yaml`
|
||||
* [ ] Wire up Swagger UI or Redoc at `/docs` for interactive API exploration
|
||||
|
||||
---
|
||||
|
||||
## Priority 5 – Optional / Cleanup
|
||||
|
||||
**Artifact Management**
|
||||
* [ ] Implement saving of results from a UUID as "results.json" so we don't rerun all the rules and just load from cache.
|
||||
* [ ] Implement cleanup or retention policy for old artifacts
|
||||
* [ ] Optional: Add periodic maintenance scripts for storage
|
||||
|
||||
**Extra Features**
|
||||
|
||||
* [ ] Placeholder for additional features (e.g., bulk URL analysis, alerting, integrations)
|
||||
* Bulk URL analysis (batch/queue).
|
||||
* Alerting & integrations (webhooks, Slack, email).
|
||||
* Optional: analyst verdict tags and export (CSV/JSON).
|
||||
|
||||
12
docs/workflow.md
Normal file
12
docs/workflow.md
Normal file
@@ -0,0 +1,12 @@
|
||||
### 🛠 SneakyScope Feature Workflow
|
||||
|
||||
1. Pick feature from roadmap
|
||||
2. Drop in **Feature Session Plan** template
|
||||
3. Fill in description, effort, dependencies, design notes
|
||||
4. Expand into tasks → implement code → test/validate
|
||||
5. Update roadmap (remove/complete, reorder if needed)
|
||||
|
||||
---
|
||||
|
||||
This way, every session starts with the same rhythm, and we don’t lose context between chats.
|
||||
|
||||
Reference in New Issue
Block a user