Files

Phillip Tarrant 61cc24f8d2 Add webpage screenshot capture with Playwright

Implements automated screenshot capture for all discovered HTTP/HTTPS services using Playwright with headless Chromium. Screenshots are saved as PNG files and referenced in JSON reports.

Features:
- Separate ScreenshotCapture module for code organization
- Viewport screenshots (1280x720) with 15-second timeout
- Graceful handling of self-signed certificates
- Browser reuse for optimal performance
- Screenshots stored in timestamped directories
- Comprehensive documentation in README.md and new CLAUDE.md

Technical changes:
- Added src/screenshot_capture.py: Screenshot capture module with context manager pattern
- Updated src/scanner.py: Integrated screenshot capture into HTTP/HTTPS analysis phase
- Updated Dockerfile: Added Chromium and Playwright browser installation
- Updated requirements.txt: Added playwright==1.40.0
- Added CLAUDE.md: Developer documentation and implementation guide
- Updated README.md: Enhanced features section, added screenshot details and troubleshooting
- Updated .gitignore: Ignore entire output/ directory including screenshots

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-14 00:57:36 +00:00

21 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

SneakyScanner is a dockerized network scanning tool that uses a five-phase approach: masscan for fast port discovery, nmap for service detection, sslyze for HTTP/HTTPS and SSL/TLS analysis, and Playwright for webpage screenshots. It accepts YAML configuration files defining scan targets and expected network behavior, then produces comprehensive JSON reports with service information, SSL certificates, TLS versions, cipher suites, and webpage screenshots - comparing expected vs. actual results.

Essential Commands

Building and Running

# Build the Docker image
docker build -t sneakyscanner .

# Run with docker-compose (easiest method)
docker-compose build
docker-compose up

# Run directly with Docker
docker run --rm --privileged --network host \
  -v $(pwd)/configs:/app/configs:ro \
  -v $(pwd)/output:/app/output \
  sneakyscanner /app/configs/your-config.yaml

Development

# Test the Python script locally (requires masscan and nmap installed)
python3 src/scanner.py configs/example-site.yaml -o ./output

# Validate YAML config
python3 -c "import yaml; yaml.safe_load(open('configs/example-site.yaml'))"

Architecture

Core Components

src/scanner.py - Main application
- SneakyScanner class: Orchestrates scanning workflow
- _load_config(): Parses and validates YAML config
- _run_masscan(): Executes masscan for TCP/UDP scanning
- _run_ping_scan(): Executes masscan ICMP ping scanning
- _run_nmap_service_detection(): Executes nmap service detection on discovered TCP ports
- _parse_nmap_xml(): Parses nmap XML output to extract service information
- _is_likely_web_service(): Identifies web services based on nmap results
- _detect_http_https(): Detects HTTP vs HTTPS using socket connections
- _analyze_ssl_tls(): Analyzes SSL/TLS certificates and supported versions using sslyze
- _run_http_analysis(): Orchestrates HTTP/HTTPS and SSL/TLS analysis phase
- scan(): Main workflow - collects IPs, runs scans, performs service detection, HTTP/HTTPS analysis, compiles results
- save_report(): Writes JSON output with timestamp and scan duration
src/screenshot_capture.py - Screenshot capture module
- ScreenshotCapture class: Handles webpage screenshot capture
- capture(): Captures screenshot of a web service (HTTP/HTTPS)
- _launch_browser(): Initializes Playwright with Chromium in headless mode
- _close_browser(): Cleanup browser resources
- _get_screenshot_dir(): Creates screenshots subdirectory
- _generate_filename(): Generates filename for screenshot (IP_PORT.png)
configs/ - YAML configuration files
- Define scan title, sites, IPs, and expected network behavior
- Each IP includes expected ping response and TCP/UDP ports
output/ - JSON scan reports and screenshots
- Timestamped JSON files: scan_report_YYYYMMDD_HHMMSS.json
- Screenshot directory: scan_report_YYYYMMDD_HHMMSS_screenshots/
- Contains actual vs. expected comparison for each IP

Scan Workflow

Parse YAML config and extract all unique IPs
Run ping scan on all IPs using masscan --ping
Run TCP scan on all IPs for ports 0-65535
Run UDP scan on all IPs for ports 0-65535
Run service detection on discovered TCP ports using nmap -sV
Run HTTP/HTTPS analysis on web services identified by nmap:
- Detect HTTP vs HTTPS using socket connections
- Capture webpage screenshot using Playwright (viewport 1280x720, 15s timeout)
- For HTTPS: Extract certificate details (subject, issuer, expiry, SANs)
- Test TLS version support (TLS 1.0, 1.1, 1.2, 1.3)
- List accepted cipher suites for each TLS version
Aggregate results by IP and site
Generate JSON report with timestamp, scan duration, screenshot references, and complete service details

Why Dockerized

Masscan and nmap require raw socket access (root/CAP_NET_RAW)
Isolates privileged operations in container
Ensures consistent masscan and nmap versions and dependencies
Uses --privileged and --network host for network access

Masscan Integration

Masscan is built from source in Dockerfile
Writes output to temporary JSON files
Results parsed line-by-line (masscan uses comma-separated JSON lines)
Temporary files cleaned up after each scan

Nmap Integration

Nmap installed via apt package in Dockerfile
Runs service detection (-sV) with intensity level 5 (balanced speed/accuracy)
Outputs XML format for structured parsing
XML parsed using Python's ElementTree library (xml.etree.ElementTree)
Extracts service name, product, version, extrainfo, and ostype
Runs sequentially per IP to avoid overwhelming the target
10-minute timeout per host, 5-minute host timeout

HTTP/HTTPS and SSL/TLS Analysis

Uses sslyze library for comprehensive SSL/TLS scanning
HTTP/HTTPS detection using Python's built-in socket and ssl modules
Analyzes services based on:
- Nmap service identification (http, https, ssl, http-proxy, etc.)
- Common web ports (80, 443, 8000, 8006, 8008, 8080, 8081, 8443, 8888, 9443)
- This ensures non-standard ports (like Proxmox 8006) are analyzed even if nmap misidentifies them
For HTTPS services:
- Extracts certificate information using cryptography library
- Tests TLS versions: 1.0, 1.1, 1.2, 1.3
- Lists all accepted cipher suites for each supported TLS version
- Calculates days until certificate expiration
- Extracts SANs (Subject Alternative Names) from certificate
Graceful error handling: if SSL analysis fails, still reports HTTP/HTTPS detection
5-second timeout per HTTP/HTTPS detection
Results merged into service data structure under http_info key
Note: Uses sslyze 6.0 API which accesses scan results as attributes (e.g., certificate_info, tls_1_2_cipher_suites) rather than through .scan_commands_results.get()

Webpage Screenshot Capture

Implementation: src/screenshot_capture.py - Separate module for code organization

Technology Stack:

Playwright 1.40.0 with Chromium in headless mode
System Chromium and chromium-driver installed via apt (Dockerfile)
Python's pathlib for cross-platform file path handling

Screenshot Process:

Screenshots captured for all successfully detected HTTP/HTTPS services
Services identified by:
- Nmap service names: http, https, ssl, http-proxy, http-alt, etc.
- Common web ports: 80, 443, 8000, 8006, 8008, 8080, 8081, 8443, 8888, 9443
Browser lifecycle managed via context manager pattern (__enter__, __exit__)

Configuration (default values):

Viewport size: 1280x720 pixels (viewport only, not full page)
Timeout: 15 seconds per screenshot (15000ms in Playwright)
Wait strategy: wait_until='networkidle' - waits for network activity to settle
SSL handling: ignore_https_errors=True - handles self-signed certs
User agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Browser args: --no-sandbox, --disable-setuid-sandbox, --disable-dev-shm-usage, --disable-gpu

Storage Architecture:

Screenshots saved as PNG files in subdirectory: scan_report_YYYYMMDD_HHMMSS_screenshots/
Filename format: {ip}_{port}.png (dots in IP replaced with underscores)
- Example: 192_168_1_10_443.png for 192.168.1.10:443
Path stored in JSON as relative reference: http_info.screenshot field
Relative paths ensure portability of output directory

Error Handling (graceful degradation):

If screenshot fails (timeout, connection error, etc.), scan continues
Failed screenshots logged as warnings, not errors
Services without screenshots simply omit the screenshot field in JSON output
Browser launch failure disables all screenshots for the scan

Browser Lifecycle (optimized for performance):

Browser launched once at scan start (in scan() method)
Reused for all screenshots via single browser instance
New context + page created per screenshot (isolated state)
Context and page closed after each screenshot
Browser closed at scan completion (cleanup in scan() method)

Integration Points:

Initialized in scanner.py:scan() with scan timestamp
Called from scanner.py:_run_http_analysis() after protocol detection
Cleanup called in scanner.py:scan() after all analysis complete

Code Reference Locations:

src/screenshot_capture.py: Complete screenshot module (lines 1-202)
src/scanner.py:scan(): Browser initialization and cleanup
src/scanner.py:_run_http_analysis(): Screenshot capture invocation

Configuration Schema

title: string                    # Report title (required)
sites:                           # List of sites (required)
  - name: string                 # Site name
    ips:                         # List of IPs for this site
      - address: string          # IP address (IPv4)
        expected:                # Expected network behavior
          ping: boolean          # Should respond to ping
          tcp_ports: [int]       # Expected TCP ports
          udp_ports: [int]       # Expected UDP ports
          services: [string]     # Expected services (optional)

Key Design Decisions

Five-phase scanning: Masscan for fast port discovery (10,000 pps), nmap for service detection, then HTTP/HTTPS and SSL/TLS analysis for web services
All-port scanning: TCP and UDP scans cover entire port range (0-65535) to detect unexpected services
Selective web analysis: Only analyze services identified by nmap as web-related to optimize scan time
Machine-readable output: JSON format enables automated report generation and comparison
Expected vs. Actual: Config includes expected behavior to identify infrastructure drift
Site grouping: IPs organized by logical site for better reporting
Temporary files: Masscan and nmap output written to temp files to avoid conflicts in parallel scans
Service details: Extract product name, version, and additional info for each discovered service
SSL/TLS security: Comprehensive certificate analysis and TLS version testing with cipher suite enumeration

Testing Strategy

When testing changes:

Use a controlled test environment with known services (including HTTP/HTTPS)
Create a test config with 1-2 IPs
Verify JSON output structure matches schema
Check that ping, TCP, and UDP results are captured
Verify service detection results include service name, product, and version
For web services, verify http_info includes:
- Correct protocol detection (http vs https)
- Screenshot path reference (relative to output directory)
- Verify screenshot PNG file exists at the referenced path
- Certificate details for HTTPS (subject, issuer, expiry, SANs)
- TLS version support (1.0-1.3) with cipher suites
Ensure temp files are cleaned up (masscan JSON, nmap XML)
Verify screenshot directory created with correct naming convention
Test screenshot capture with HTTP, HTTPS, and self-signed certificate services

Common Tasks

Modifying Scan Parameters

Masscan rate limiting:

--rate: Currently set to 10000 packets/second in src/scanner.py:80, 132
--wait: Set to 0 (don't wait for late responses)
Adjust these in _run_masscan() and _run_ping_scan() methods

Nmap service detection intensity:

--version-intensity: Currently set to 5 (balanced) in src/scanner.py:201
Range: 0-9 (0=light, 9=comprehensive)
Lower values are faster but less accurate
Adjust in _run_nmap_service_detection() method

Nmap timeouts:

--host-timeout: Currently 5 minutes in src/scanner.py:204
Overall subprocess timeout: 600 seconds (10 minutes) in src/scanner.py:208
Adjust based on network conditions and number of ports

Adding New Scan Types

To add additional scan functionality (e.g., OS detection, vulnerability scanning):

Add new method to SneakyScanner class (follow pattern of _run_nmap_service_detection())
Update scan() workflow to call new method
Add results to actual section of output JSON
Update YAML schema if expected values needed
Update documentation (README.md, CLAUDE.md)

Changing Output Format

JSON structure defined in src/scanner.py:365+. To modify:

Update the report dictionary structure
Ensure backward compatibility or version the schema
Update README.md output format documentation
Update example output in both README.md and CLAUDE.md

Customizing Screenshot Capture

Change viewport size (src/screenshot_capture.py:35):

self.viewport = viewport or {'width': 1920, 'height': 1080}  # Full HD

Change timeout (src/screenshot_capture.py:34):

self.timeout = timeout * 1000  # Default is 15 seconds
# Pass different value when initializing: ScreenshotCapture(..., timeout=30)

Capture full-page screenshots (src/screenshot_capture.py:173):

page.screenshot(path=str(screenshot_path), type='png', full_page=True)

Change wait strategy (src/screenshot_capture.py:170):

# Options: 'load', 'domcontentloaded', 'networkidle', 'commit'
page.goto(url, wait_until='load', timeout=self.timeout)

Add custom request headers (src/screenshot_capture.py:157-161):

context = self.browser.new_context(
    viewport=self.viewport,
    ignore_https_errors=True,
    user_agent='CustomUserAgent/1.0',
    extra_http_headers={'Authorization': 'Bearer token'}
)

Disable screenshot capture entirely: In src/scanner.py:scan(), comment out or skip initialization:

# self.screenshot_capture = ScreenshotCapture(...)
self.screenshot_capture = None  # This disables all screenshots

Add authentication (for services requiring login): In src/screenshot_capture.py:capture(), before taking screenshot:

# Navigate to login page first
page.goto(f"{protocol}://{ip}:{port}/login")
page.fill('#username', 'admin')
page.fill('#password', 'password')
page.click('#login-button')
page.wait_for_url(f"{protocol}://{ip}:{port}/dashboard")
# Then take screenshot
page.screenshot(path=str(screenshot_path), type='png')

Performance Optimization

Current bottlenecks:

Port scanning: ~30 seconds for 2 IPs (65535 ports each at 10k pps)
Service detection: ~20-60 seconds per IP with open ports
HTTP/HTTPS analysis: ~5-10 seconds per web service (includes SSL/TLS analysis)
Screenshot capture: ~5-15 seconds per web service (depends on page load time)

Optimization strategies:

Parallelize nmap scans across IPs (currently sequential)
Parallelize HTTP/HTTPS analysis and screenshot capture across services using ThreadPoolExecutor
Reduce port range for faster scanning (if full range not needed)
Lower nmap intensity (trade accuracy for speed)
Skip service detection on high ports (>1024) if desired
Reduce SSL/TLS analysis scope (e.g., test only TLS 1.2+ if legacy support not needed)
Adjust HTTP/HTTPS detection timeout (currently 5 seconds in src/scanner.py:510)
Adjust screenshot timeout (currently 15 seconds in src/screenshot_capture.py:34)
Disable screenshot capture for faster scans (set screenshot_capture to None)

Planned Features (Future Development)

The following features are planned for future implementation:

1. HTML Report Generation

Build comprehensive HTML reports from JSON scan data with interactive visualizations.

Report Features:

Service details and SSL/TLS information tables
Visual comparison of expected vs. actual results (red/green highlighting)
Certificate expiration warnings with countdown timers
TLS version compliance reports (highlight weak configurations)
Embedded webpage screenshots
Sortable/filterable tables
Timeline view of scan history
Export to PDF capability

Implementation Considerations:

Template engine: Jinja2 or similar
CSS framework: Bootstrap or Tailwind for responsive design
Charts/graphs: Chart.js or Plotly for visualizations
Store templates in templates/ directory
Generate static HTML that can be opened without server

Architecture:

class HTMLReportGenerator:
    def __init__(self, json_report_path, template_dir='templates'):
        pass

    def generate_report(self, output_path):
        # Parse JSON
        # Render template with data
        # Include screenshots
        # Write HTML file
        pass

    def _compare_expected_actual(self, expected, actual):
        # Generate diff/comparison data
        pass

    def _generate_cert_warnings(self, services):
        # Identify expiring certs, weak TLS, etc.
        pass

2. Comparison Reports (Scan Diffs)

Generate reports showing changes between scans over time.

Features:

Compare two scan reports
Highlight new/removed services
Track certificate changes
Detect TLS configuration drift
Show port changes

3. Additional Enhancements

Email Notifications: Alert on unexpected changes or certificate expirations
Scheduled Scanning: Automated periodic scans with cron integration
Vulnerability Detection: Integration with CVE databases for known vulnerabilities
API Mode: REST API for triggering scans and retrieving results
Multi-threading: Parallel scanning of multiple IPs for better performance

Development Notes

Current Dependencies

PyYAML==6.0.1 (YAML parsing)
python-libnmap==0.7.3 (nmap XML parsing)
sslyze==6.0.0 (SSL/TLS analysis)
playwright==1.40.0 (webpage screenshot capture)
Built-in: socket, ssl, subprocess, xml.etree.ElementTree, logging
System: chromium, chromium-driver (installed via Dockerfile)

For HTML Reports, Will Need:

Jinja2 (template engine)
Optional: weasyprint or pdfkit for PDF export

Key Files to Modify for New Features:

src/scanner.py - Core scanning logic (add new phases/methods)
src/screenshot_capture.py - ✅ Implemented: Webpage screenshot capture module
src/report_generator.py - New file for HTML report generation (planned)
templates/ - New directory for HTML templates (planned)
requirements.txt - Add new dependencies
Dockerfile - Install additional system dependencies (browsers, etc.)

Testing Strategy for New Features:

Screenshot Capture Testing (✅ Implemented):

Test with HTTP services (port 80, 8080, etc.)
Test with HTTPS services with valid certificates (port 443, 8443)
Test with HTTPS services with self-signed certificates
Test with non-standard web ports (e.g., Proxmox on 8006)
Test with slow-loading pages (verify 15s timeout works)
Test with services that return errors (404, 500, etc.)
Verify screenshot files are created with correct naming
Verify JSON references point to correct screenshot files
Verify browser cleanup occurs properly (no zombie processes)
Test with multiple IPs and services to ensure browser reuse works

HTML Report Testing (Planned):

Validate HTML report rendering across browsers
Ensure large scans don't cause memory issues with screenshots
Test report generation with missing/incomplete data
Verify all URLs and links work in generated reports
Test embedded screenshots display correctly

Troubleshooting

Screenshot Capture Issues

Problem: Screenshots not being captured

Check: Verify Chromium installed: chromium --version in container
Check: Verify Playwright browsers installed: playwright install --dry-run chromium
Check: Look for browser launch errors in stderr output
Solution: Rebuild Docker image ensuring Dockerfile steps complete

Problem: "Failed to launch browser" error

Check: Ensure container has sufficient memory (Chromium needs ~200MB)
Check: Docker runs with --privileged or appropriate capabilities
Solution: Add --shm-size=2gb to docker run command if /dev/shm is too small

Problem: Screenshots timing out

Check: Network connectivity to target services
Check: Services actually serve webpages (not just open ports)
Solution: Increase timeout in src/screenshot_capture.py:34 if needed
Solution: Check service responds to HTTP requests: curl -I http://IP:PORT

Problem: Screenshots are blank/empty

Check: Service returns valid HTML (not just TCP banner)
Check: Page requires JavaScript (may need longer wait time)
Solution: Change wait_until strategy from 'networkidle' to 'load' or 'domcontentloaded'

Problem: HTTPS certificate errors despite ignore_https_errors=True

Check: System certificates up to date in container
Solution: This should not happen; file an issue if it does

Nmap/Masscan Issues

Problem: No ports discovered

Check: Firewall rules allow scanning
Check: Targets are actually online (ping test)
Solution: Run manual masscan: masscan -p80,443 192.168.1.10 --rate 1000

Problem: "Operation not permitted" error

Check: Container runs with --privileged or CAP_NET_RAW
Solution: Add --privileged flag to docker run command

Problem: Service detection not working

Check: Nmap can connect to ports: nmap -p 80 192.168.1.10
Check: Services actually respond to nmap probes (some firewall/IPS block)
Solution: Adjust nmap intensity or timeout values

21 KiB Raw Blame History