updating readme

This commit is contained in:
2025-08-21 08:58:05 -05:00
parent 1eb2a52f17
commit ebd3d256d5

272
Readme.md
View File

@@ -1,92 +1,248 @@
# URL Sandbox
# SneakyScope
A lightweight web-based sandbox for analyzing websites and domains.
It performs WHOIS lookups, GeoIP enrichment, script/form inspection, and provides analyst-friendly output.
SneakyScope fetches a page in a sandbox, enriches with WHOIS/GeoIP, and runs a unified **Rules Engine** (YAML + function rules) against scripts, forms, and text. Results are saved per-run and rendered with analyst-friendly tables, severity badges, and tags. Results are saved at time of analysis per run so you have a point in time result that doesn't change.
> Repo: [https://git.sneakygeek.net/ptarrant/SneakyScope](https://git.sneakygeek.net/ptarrant/SneakyScope)
> Status: **Private** (may become public later)
---
## 🚀 Features
- **Domain & IP Enrichment**
- WHOIS lookups with fallback to raw text when fields are missing
- Explicit handling of privacy-protected WHOIS records (`N/A` or `Possible Privacy`)
- GeoIP (City, Region, Country, Latitude/Longitude)
- ASN, ISP, and network details
- **Flagged Content Analysis**
- Suspicious script detection
- Suspicious form detection
- Nested bullet-style reporting for clarity
- **Improved UX**
- Automatic addition of `http://`, `https://`, and `www.` if only a domain is provided
- Modal spinner to indicate background analysis (`Analyzing website…`)
- **Resilient GeoLite2 Database Management**
- Downloads the MaxMind GeoLite2-City database on first startup
- Checks file age and only re-downloads if older than **14 days** (configurable via environment variable)
### Unified Detection (Rules Engine)
* **Regex rules from YAML** + **function rules in code** for context-aware checks.
* PASS/FAIL per rule with **reason**, **severity** (`low|medium|high`), and **tags**.
* **Per-script matches**:
* Inline scripts → run regex rules on the code.
* External scripts → run function rules with structured facts (`src`, hostnames, etc.).
* **Page-level overview**: complete PASS/FAIL tables by category (`script`, `form`, `text`).
### Domain & IP Enrichment
* WHOIS with robust fallbacks (`N/A`, `Possible Privacy` when fields are missing).
* GeoIP, ASN, and ISP details.
### Results & UX
* **Per-run artifacts** under `/data/<uuid>/`:
* `screenshot.png`, `source.txt`, `results.json` (single source of truth).
* **Suspicious Scripts** table shows only **matched** scripts with:
* **Severity badges** and **tag chips** (tooltip shows rule reason).
* Snippet preview length configurable via `settings.yaml`.
---
## ⚙️ Setup Instructions
## 🧱 Architecture at a Glance
* **Flask** app (Gunicorn in Docker)
* **Playwright** for headless page fetch/render
* **BeautifulSoup4** for parsing
* **Rules Engine**
* YAML regex rules (`config/suspicious_rules.yaml`)
* Function rules (`app/rules/function_rules.py`) registered on startup
* **Artifacts**: persistent path mounted at `/data` (configurable)
---
## ⚙️ Setup
### 1) Clone
> Since this repo is private, youll need credentials (HTTPS with a personal access token) **or** SSH access.
**HTTPS (with token):**
### 1. Clone the Repository
```bash
git clone https://github.com/yourusername/url-sandbox.git
cd url-sandbox
git clone https://git.sneakygeek.net/ptarrant/SneakyScope.git
cd SneakyScope
```
### 2. Create a MaxMind Account & License Key
1. Go to [MaxMind GeoLite2](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data)
2. Sign up for a free account
3. Navigate to **Account > Manage License Keys**
4. Generate a new license key
**SSH:**
### 3. Configure Environment Variables
All environment variables are loaded from a `.env` file.
1. Copy the sample file:
```bash
cp .env.example .env
````
git clone git@git.sneakygeek.net:ptarrant/SneakyScope.git
cd SneakyScope
```
2. Edit `.env` and set your values (see [`.env.example`](./.env.example) for available options).
### 2) Configure Environment
Make sure to add your **MaxMind License Key** under `MAXMIND_LICENSE_KEY`.
Copy and edit env:
```bash
cp .env.example .env
```
Important vars:
* `SECRET_KEY` Flask secret (set in production).
* `MAXMIND_LICENSE_KEY` for GeoIP (optional if you disable GeoIP).
* `SNEAKYSCOPE_RULES_FILE` override path to YAML rules (optional).
### 3) Settings
`settings.yaml` controls UI/behavior. Example:
```yaml
app:
name: "SneakyScope"
version_major: 0
version_minor: 1
ui:
snippet_preview_len: 160 # controls inline script snippet length in UI
```
### 4) Run with Docker Compose
### 4. Run with Docker Compose
```bash
docker-compose up --build
```
This will:
- Build the app
- Download the GeoLite2 database if not present or too old
- Start the web interface
This builds the image and starts the web app. The `/data` directory in the container is where run artifacts are written—mount a host directory in Compose to persist between restarts.
---
## 📝 Example Output
## 🧪 Using SneakyScope
**WHOIS Info**
- Registrar: MarkMonitor, Inc.
- Organization: Possible Privacy
- Creation: 1997-09-15
- Expiration: 2028-09-14
1. Open the web UI and submit a URL.
2. On completion youll see:
**GeoIP Info**
- IP: 172.66.159.20
- City: N/A
- Region: N/A
- Country: United States
- Coordinates: (37.751, -97.822)
- ASN: 13335
- ISP: Cloudflare, Inc.
* **URL Overview** (with permalink to `/results/<uuid>`)
* **Enrichment** (WHOIS/GeoIP)
* **Redirects**
* **Forms** (inputs + per-form rule checks)
* **Suspicious Scripts** (only scripts that matched rules; badges/tags, snippet)
* **Screenshot** and **Source**
Artifacts for each run live under `/data/<uuid>/`:
* `results.json` complete structured result consumed by the UI.
* `source.txt`, `screenshot.png`, and other files as added.
---
## 📌 Roadmap
See [Next Steps Checklist](docs/roadmap.md) for planned features:
- Improved UI templates
- Artifact cleanup
- Proxy support (optional)
## 📝 Rules
### YAML (regex) Rules
`config/suspicious_rules.yaml` contains regex rules (compiled `IGNORECASE`). Example:
```yaml
- name: eval_usage
description: "Use of eval() in script"
category: script
type: regex
pattern: '\beval\s*\('
severity: high
tags: [obfuscation, unsafe-eval]
```
### Function Rules (code)
Rules needing **context** (e.g., compare action host to page host) live in:
* `app/rules/function_rules.py`:
* `FactAdapter` converts snippets to structured facts.
* `FunctionRuleAdapter` lets dict-expecting rules run from engine inputs.
* Implementations like:
* `form_action_missing`
* `form_http_on_https_page`
* `form_submits_to_different_host`
* `script_src_uses_data_or_blob`
* `script_src_has_dangerous_extension`
* `script_third_party_host`
Theyre registered at startup in `app/__init__.py` alongside YAML rules.
---
## 🔧 Configuration Tips
* **Snippet length**: tweak `ui.snippet_preview_len` in `settings.yaml` (default 160).
* **Rules file override**: set `SNEAKYSCOPE_RULES_FILE=/path/to/your.yaml`.
* **Artifacts path**: by default `/data` in the container (mount via Compose).
---
## 📂 Project Structure (high-level)
```
app/
__init__.py # Flask app factory (loads YAML + function rules)
browser.py # fetch + analysis orchestrator (writes results.json)
routes.py # web views
rules/
function_rules.py # FactAdapter, FunctionRuleAdapter, function rules
utils/
rules_engine.py # engine + Rule class + YAML loader
io_helpers.py # safe_write, etc.
settings.py # get_settings()
templates/ # Jinja2 templates
static/ # CSS/JS
config/
suspicious_rules.yaml # regex rules
docs/
roadmap.md # ongoing plan and priorities
```
---
## 🧭 Roadmap (short version)
Full details: `docs/roadmap.md`
* **Core Analysis / Stability**
* Opt-in **fetch external scripts** (size/time limits) and evaluate fetched content.
* Remove remaining legacy form “flagged\_reasons” once function rules cover them.
* Unit tests: YAML compilation, adapters, per-artifact rule cases.
* **API Layer**
* Endpoints: `/screenshot`, `/source`, `/analyse`
* OpenAPI at `/api/openapi.yaml`; docs at `/docs` (Swagger/Redoc)
* **UI / UX**
* Auto-prepend `http(s)://`/`www.` for bare domains
* Source viewer (embedded editor)
* Scripts table toggle: “Only suspicious” / “All scripts”
* **Rules Lab** (WYSIWYG tester) for rapid rule validation
* **Artifact Management & Ops**
* Retention/cleanup policy (age/size)
* Periodic maintenance scripts (configurable in `settings.yaml`)
* Results caching UX (re-run vs. load from cache)
* **Extras / Integrations**
* Bulk URL analysis
* Alerting/webhooks (Slack/email)
* Analyst verdict tags + export (CSV/JSON)
---
## 🤝 Contributing
This repository is currently **private** on a self-hosted git server.
* Internal contributors: use feature branches and open merge requests on `https://git.sneakygeek.net/ptarrant/SneakyScope`.
* If/when the repo is made public, well welcome issues and PRs from the community.
---
## ⚠️ Disclaimer
SneakyScope is intended for defensive security analysis and educational use.
Only analyze content you are authorized to test.
---