Add automatic multi-format report generation and ZIP archiving

Implements automatic generation of JSON, HTML, and ZIP outputs after every scan,
with all files sharing the same timestamp for easy correlation.

Features:
- Automatic HTML report generation after every scan
- ZIP archive creation containing JSON, HTML, and all screenshots
- Unified timestamp across all outputs (JSON, HTML, ZIP, screenshots)
- Graceful error handling (scan continues if HTML/ZIP generation fails)
- Email-ready ZIP archives for easy sharing

Technical changes:
- Fixed timestamp mismatch between scan() and save_report()
- Added generate_outputs() method to SneakyScanner class
- scan() now returns (report, timestamp) tuple
- save_report() accepts timestamp parameter instead of generating new one
- main() updated to call generate_outputs() for all output formats
- Added zipfile import and HTMLReportGenerator import
- Dockerfile updated to copy templates/ directory

Output structure:
- scan_report_YYYYMMDD_HHMMSS.json (JSON report)
- scan_report_YYYYMMDD_HHMMSS.html (HTML report)
- scan_report_YYYYMMDD_HHMMSS.zip (archive with JSON, HTML, screenshots)
- scan_report_YYYYMMDD_HHMMSS_screenshots/ (screenshots directory)

Documentation updated:
- README.md: Updated Output Format, Features, Quick Start sections
- CLAUDE.md: Updated Core Components, Scan Workflow, Key Design Decisions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-14 02:10:31 +00:00
parent d390c4b491
commit 212596fa0a
4 changed files with 171 additions and 41 deletions

View File

@@ -50,8 +50,9 @@ python3 -c "import yaml; yaml.safe_load(open('configs/example-site.yaml'))"
- `_detect_http_https()`: Detects HTTP vs HTTPS using socket connections
- `_analyze_ssl_tls()`: Analyzes SSL/TLS certificates and supported versions using sslyze
- `_run_http_analysis()`: Orchestrates HTTP/HTTPS and SSL/TLS analysis phase
- `scan()`: Main workflow - collects IPs, runs scans, performs service detection, HTTP/HTTPS analysis, compiles results
- `save_report()`: Writes JSON output with timestamp and scan duration
- `scan()`: Main workflow - collects IPs, runs scans, performs service detection, HTTP/HTTPS analysis, compiles results and returns report with timestamp
- `save_report()`: Writes JSON output using provided timestamp
- `generate_outputs()`: Generates all output formats (JSON, HTML, ZIP) with graceful error handling
2. **src/screenshot_capture.py** - Screenshot capture module
- `ScreenshotCapture` class: Handles webpage screenshot capture
@@ -75,26 +76,35 @@ python3 -c "import yaml; yaml.safe_load(open('configs/example-site.yaml'))"
- Define scan title, sites, IPs, and expected network behavior
- Each IP includes expected ping response and TCP/UDP ports
5. **output/** - JSON scan reports and screenshots
5. **output/** - Scan outputs (automatically generated)
- Timestamped JSON files: `scan_report_YYYYMMDD_HHMMSS.json`
- Timestamped HTML reports: `scan_report_YYYYMMDD_HHMMSS.html`
- Timestamped ZIP archives: `scan_report_YYYYMMDD_HHMMSS.zip`
- Screenshot directory: `scan_report_YYYYMMDD_HHMMSS_screenshots/`
- Contains actual vs. expected comparison for each IP
- All outputs share the same timestamp for easy correlation
- ZIP contains JSON, HTML, and all screenshots
### Scan Workflow
1. Parse YAML config and extract all unique IPs
2. Run ping scan on all IPs using `masscan --ping`
3. Run TCP scan on all IPs for ports 0-65535
4. Run UDP scan on all IPs for ports 0-65535
5. Run service detection on discovered TCP ports using `nmap -sV`
6. Run HTTP/HTTPS analysis on web services identified by nmap:
2. Create scan timestamp (shared across all outputs)
3. Run ping scan on all IPs using `masscan --ping`
4. Run TCP scan on all IPs for ports 0-65535
5. Run UDP scan on all IPs for ports 0-65535
6. Run service detection on discovered TCP ports using `nmap -sV`
7. Run HTTP/HTTPS analysis on web services identified by nmap:
- Detect HTTP vs HTTPS using socket connections
- Capture webpage screenshot using Playwright (viewport 1280x720, 15s timeout)
- For HTTPS: Extract certificate details (subject, issuer, expiry, SANs)
- Test TLS version support (TLS 1.0, 1.1, 1.2, 1.3)
- List accepted cipher suites for each TLS version
7. Aggregate results by IP and site
8. Generate JSON report with timestamp, scan duration, screenshot references, and complete service details
8. Aggregate results by IP and site
9. Return scan report and timestamp from `scan()` method
10. Automatically generate all output formats using `generate_outputs()`:
- Save JSON report with timestamp
- Generate HTML report (graceful error handling - continues if fails)
- Create ZIP archive containing JSON, HTML, and screenshots
- All outputs use the same timestamp for correlation
### Why Dockerized
@@ -213,12 +223,14 @@ sites: # List of sites (required)
1. **Five-phase scanning**: Masscan for fast port discovery (10,000 pps), nmap for service detection, then HTTP/HTTPS and SSL/TLS analysis for web services
2. **All-port scanning**: TCP and UDP scans cover entire port range (0-65535) to detect unexpected services
3. **Selective web analysis**: Only analyze services identified by nmap as web-related to optimize scan time
4. **Machine-readable output**: JSON format enables automated report generation and comparison
4. **Multi-format output**: Automatically generates JSON (machine-readable), HTML (human-readable), and ZIP (archival) for every scan
5. **Expected vs. Actual**: Config includes expected behavior to identify infrastructure drift
6. **Site grouping**: IPs organized by logical site for better reporting
7. **Temporary files**: Masscan and nmap output written to temp files to avoid conflicts in parallel scans
8. **Service details**: Extract product name, version, and additional info for each discovered service
9. **SSL/TLS security**: Comprehensive certificate analysis and TLS version testing with cipher suite enumeration
10. **Unified timestamp**: All outputs (JSON, HTML, ZIP, screenshots) share the same timestamp for easy correlation
11. **Graceful degradation**: If HTML or ZIP generation fails, scan continues and JSON is still saved
## Testing Strategy
@@ -226,18 +238,27 @@ When testing changes:
1. Use a controlled test environment with known services (including HTTP/HTTPS)
2. Create a test config with 1-2 IPs
3. Verify JSON output structure matches schema
4. Check that ping, TCP, and UDP results are captured
5. Verify service detection results include service name, product, and version
6. For web services, verify http_info includes:
3. Verify all three outputs are generated automatically:
- JSON report (`scan_report_YYYYMMDD_HHMMSS.json`)
- HTML report (`scan_report_YYYYMMDD_HHMMSS.html`)
- ZIP archive (`scan_report_YYYYMMDD_HHMMSS.zip`)
4. Verify all outputs share the same timestamp
5. Check that ping, TCP, and UDP results are captured in JSON
6. Verify service detection results include service name, product, and version
7. For web services, verify http_info includes:
- Correct protocol detection (http vs https)
- Screenshot path reference (relative to output directory)
- Verify screenshot PNG file exists at the referenced path
- Certificate details for HTTPS (subject, issuer, expiry, SANs)
- TLS version support (1.0-1.3) with cipher suites
7. Ensure temp files are cleaned up (masscan JSON, nmap XML)
8. Verify screenshot directory created with correct naming convention
9. Test screenshot capture with HTTP, HTTPS, and self-signed certificate services
8. Verify HTML report opens in browser and displays correctly
9. Verify ZIP archive contains:
- JSON report file
- HTML report file
- Screenshot directory with all PNG files
10. Ensure temp files are cleaned up (masscan JSON, nmap XML)
11. Test screenshot capture with HTTP, HTTPS, and self-signed certificate services
12. Test graceful degradation: If HTML generation fails, JSON and ZIP should still be created
## Common Tasks
@@ -278,9 +299,11 @@ JSON structure defined in src/scanner.py:365+. To modify:
### Generating HTML Reports
**Note**: HTML reports are automatically generated after every scan. The commands below are for manual generation from existing JSON data only.
**Basic usage:**
```bash
# Generate HTML report from most recent JSON scan
# Manually generate HTML report from existing JSON scan
python3 src/report_generator.py output/scan_report_20251113_175235.json
```
@@ -386,11 +409,16 @@ Optimization strategies:
## HTML Report Generation (✅ Implemented)
SneakyScanner now includes comprehensive HTML report generation from JSON scan data.
SneakyScanner automatically generates comprehensive HTML reports after every scan, along with JSON reports and ZIP archives.
**Usage:**
**Automatic Generation:**
- HTML reports are created automatically by `generate_outputs()` method after scan completes
- All outputs (JSON, HTML, ZIP) share the same timestamp for correlation
- Graceful error handling: If HTML generation fails, scan continues with JSON output
**Manual Generation (Optional):**
```bash
# Generate HTML report from JSON scan output
# Manually generate HTML report from existing JSON scan output
python3 src/report_generator.py output/scan_report_20251113_175235.json
# Specify custom output path
@@ -471,7 +499,7 @@ Generate reports showing changes between scans over time.
- sslyze==6.0.0 (SSL/TLS analysis)
- playwright==1.40.0 (webpage screenshot capture)
- Jinja2==3.1.2 (HTML report template engine)
- Built-in: socket, ssl, subprocess, xml.etree.ElementTree, logging, json, pathlib, datetime
- Built-in: socket, ssl, subprocess, xml.etree.ElementTree, logging, json, pathlib, datetime, zipfile
- System: chromium, chromium-driver (installed via Dockerfile)
### For Future Enhancements, May Need: