SneakyScope/Readme.md

# SneakyScope

A lightweight web-based sandbox for analyzing websites and domains.
SneakyScope fetches a page in a sandbox, enriches with WHOIS/GeoIP, and runs a unified **Rules Engine** (YAML + function rules) against scripts, forms, and text. Results are saved per-run and rendered with analyst-friendly tables, severity badges, and tags. Results are saved at time of analysis per run so you have a point in time result that doesn't change.

> Repo: [https://git.sneakygeek.net/ptarrant/SneakyScope](https://git.sneakygeek.net/ptarrant/SneakyScope)
> Status: **Private** (may become public later)

---

## 🚀 Features

### Unified Detection (Rules Engine)

* **Regex rules from YAML** + **function rules in code** for context-aware checks.
* PASS/FAIL per rule with **reason**, **severity** (`low|medium|high`), and **tags**.
* **Per-script matches**:

  * Inline scripts → run regex rules on the code.
  * External scripts → run function rules with structured facts (`src`, hostnames, etc.).
* **Page-level overview**: complete PASS/FAIL tables by category (`script`, `form`, `text`).

### Domain & IP Enrichment

* WHOIS with robust fallbacks (`N/A`, `Possible Privacy` when fields are missing).
* GeoIP, ASN, and ISP details.

### Results & UX

* **Per-run artifacts** under `/data/<uuid>/`:

  * `screenshot.png`, `source.txt`, `results.json` (single source of truth).
* **Suspicious Scripts** table shows only **matched** scripts with:

  * **Severity badges** and **tag chips** (tooltip shows rule reason).
  * Snippet preview length configurable via `settings.yaml`.

---

## 🧱 Architecture at a Glance

* **Flask** app (Gunicorn in Docker)
* **Playwright** for headless page fetch/render
* **BeautifulSoup4** for parsing
* **Rules Engine**

  * YAML regex rules (`config/suspicious_rules.yaml`)
  * Function rules (`app/rules/function_rules.py`) registered on startup
* **Artifacts**: persistent path mounted at `/data` (configurable)

---

## ⚙️ Setup

### 1) Clone

> Since this repo is private, you’ll need credentials (HTTPS with a personal access token) **or** SSH access.

**HTTPS (with token):**

```bash
git clone https://git.sneakygeek.net/ptarrant/SneakyScope.git
cd SneakyScope
```

**SSH:**

```bash
git clone git@git.sneakygeek.net:ptarrant/SneakyScope.git
cd SneakyScope
```

### 2) Configure Environment

Copy and edit env:

```bash
cp .env.example .env
```

Important vars:

* `SECRET_KEY` – Flask secret (set in production).
* `MAXMIND_LICENSE_KEY` – for GeoIP (optional if you disable GeoIP).
* `SNEAKYSCOPE_RULES_FILE` – override path to YAML rules (optional).

### 3) Settings

`settings.yaml` controls UI/behavior. Example:

```yaml
app:
  name: "SneakyScope"
  version_major: 0
  version_minor: 1

ui:
  snippet_preview_len: 160  # controls inline script snippet length in UI
```

### 4) Run with Docker Compose

```bash
docker-compose up --build
```

This builds the image and starts the web app. The `/data` directory in the container is where run artifacts are written—mount a host directory in Compose to persist between restarts.

---

## 🧪 Using SneakyScope

1. Open the web UI and submit a URL.
2. On completion you’ll see:

   * **URL Overview** (with permalink to `/results/<uuid>`)
   * **Enrichment** (WHOIS/GeoIP)
   * **Redirects**
   * **Forms** (inputs + per-form rule checks)
   * **Suspicious Scripts** (only scripts that matched rules; badges/tags, snippet)
   * **Screenshot** and **Source**

Artifacts for each run live under `/data/<uuid>/`:

* `results.json` – complete structured result consumed by the UI.
* `source.txt`, `screenshot.png`, and other files as added.

---

## 📝 Rules

### YAML (regex) Rules

`config/suspicious_rules.yaml` contains regex rules (compiled `IGNORECASE`). Example:

```yaml
- name: eval_usage
  description: "Use of eval() in script"
  category: script
  type: regex
  pattern: '\beval\s*\('
  severity: high
  tags: [obfuscation, unsafe-eval]
```

### Function Rules (code)

Rules needing **context** (e.g., compare action host to page host) live in:

* `app/rules/function_rules.py`:

  * `FactAdapter` – converts snippets to structured facts.
  * `FunctionRuleAdapter` – lets dict-expecting rules run from engine inputs.
  * Implementations like:

    * `form_action_missing`
    * `form_http_on_https_page`
    * `form_submits_to_different_host`
    * `script_src_uses_data_or_blob`
    * `script_src_has_dangerous_extension`
    * `script_third_party_host`

They’re registered at startup in `app/__init__.py` alongside YAML rules.

---

## 🔧 Configuration Tips

* **Snippet length**: tweak `ui.snippet_preview_len` in `settings.yaml` (default 160).
* **Rules file override**: set `SNEAKYSCOPE_RULES_FILE=/path/to/your.yaml`.
* **Artifacts path**: by default `/data` in the container (mount via Compose).

---

## 📂 Project Structure (high-level)

```
app/
  __init__.py                 # Flask app factory (loads YAML + function rules)
  browser.py                  # fetch + analysis orchestrator (writes results.json)
  routes.py                   # web views
  rules/
    function_rules.py         # FactAdapter, FunctionRuleAdapter, function rules
  utils/
    rules_engine.py           # engine + Rule class + YAML loader
    io_helpers.py             # safe_write, etc.
    settings.py               # get_settings()
  templates/                  # Jinja2 templates
  static/                     # CSS/JS
  config/
    suspicious_rules.yaml     # regex rules
docs/
  roadmap.md                  # ongoing plan and priorities
```

---

## 🧭 Roadmap (short version)

Full details: `docs/roadmap.md`

* **Core Analysis / Stability**

  * Opt-in **fetch external scripts** (size/time limits) and evaluate fetched content.
  * Remove remaining legacy form “flagged\_reasons” once function rules cover them.
  * Unit tests: YAML compilation, adapters, per-artifact rule cases.

* **API Layer**

  * Endpoints: `/screenshot`, `/source`, `/analyse`
  * OpenAPI at `/api/openapi.yaml`; docs at `/docs` (Swagger/Redoc)

* **UI / UX**

  * Auto-prepend `http(s)://`/`www.` for bare domains
  * Source viewer (embedded editor)
  * Scripts table toggle: “Only suspicious” / “All scripts”
  * **Rules Lab** (WYSIWYG tester) for rapid rule validation

* **Artifact Management & Ops**

  * Retention/cleanup policy (age/size)
  * Periodic maintenance scripts (configurable in `settings.yaml`)
  * Results caching UX (re-run vs. load from cache)

* **Extras / Integrations**

  * Bulk URL analysis
  * Alerting/webhooks (Slack/email)
  * Analyst verdict tags + export (CSV/JSON)

---

## 🤝 Contributing

This repository is currently **private** on a self-hosted git server.

* Internal contributors: use feature branches and open merge requests on `https://git.sneakygeek.net/ptarrant/SneakyScope`.
* If/when the repo is made public, we’ll welcome issues and PRs from the community.

---

## ⚠️ Disclaimer

SneakyScope is intended for defensive security analysis and educational use.
Only analyze content you are authorized to test.

---