From 9255233a748425b8bb326761bafb16c4c9e61b9d Mon Sep 17 00:00:00 2001 From: Phillip Tarrant Date: Fri, 14 Nov 2025 00:18:34 -0600 Subject: [PATCH] Add comprehensive Phase 2 implementation plan Create detailed 14-day implementation roadmap for Flask Web App Core. Plan includes: - 21 new files to create (services, auth, templates, tests) - 7 files to modify (app.py, scans.py, scanner.py, etc.) - 8-step implementation guide with daily breakdown - 7 key challenges with solutions - Complete testing strategy (unit, integration, e2e) - API documentation examples - Success criteria (14 checkpoints) - Design decisions documented Key features planned: - REST API for scan management (trigger, list, view, delete, status) - Background job queue with APScheduler - Authentication system with Flask-Login - Scanner integration with database - Basic UI templates (login, dashboard) - Production Docker deployment Ready to begin Phase 2 implementation. --- docs/ai/PHASE2.md | 1715 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1715 insertions(+) create mode 100644 docs/ai/PHASE2.md diff --git a/docs/ai/PHASE2.md b/docs/ai/PHASE2.md new file mode 100644 index 0000000..96cc418 --- /dev/null +++ b/docs/ai/PHASE2.md @@ -0,0 +1,1715 @@ +# Phase 2 Implementation Plan: Flask Web App Core + +**Status:** Planning Complete - Ready for Implementation +**Estimated Duration:** 14 days (2 weeks) +**Dependencies:** Phase 1 Complete ✅ + +--- + +## Table of Contents + +1. [Overview](#overview) +2. [Current State Analysis](#current-state-analysis) +3. [Files to Create](#files-to-create) +4. [Files to Modify](#files-to-modify) +5. [Step-by-Step Implementation](#step-by-step-implementation) +6. [Dependencies & Prerequisites](#dependencies--prerequisites) +7. [Testing Approach](#testing-approach) +8. [Potential Challenges & Solutions](#potential-challenges--solutions) +9. [Success Criteria](#success-criteria) +10. [Migration Path](#migration-path) +11. [Estimated Timeline](#estimated-timeline) +12. [Key Design Decisions](#key-design-decisions) +13. [Documentation Deliverables](#documentation-deliverables) + +--- + +## Overview + +Phase 2 focuses on creating the core web application functionality by: + +1. **REST API for Scans** - Trigger scans and retrieve results via API +2. **Background Job Queue** - Execute scans asynchronously using APScheduler +3. **Authentication** - Simple Flask-Login session management +4. **Scanner Integration** - Save scan results to database automatically +5. **Basic UI** - Login page and dashboard placeholder + +### Goals + +- ✅ Working REST API for scan management (trigger, list, view, delete, status) +- ✅ Background scan execution with status tracking +- ✅ Basic authentication system with session management +- ✅ Scanner saves all results to database +- ✅ Simple login page and dashboard placeholder +- ✅ Production-ready Docker deployment + +--- + +## Current State Analysis + +### What's Already Done (Phase 1) + +- ✅ Database schema with 11 models (Scan, ScanSite, ScanIP, ScanPort, ScanService, ScanCertificate, ScanTLSVersion, Schedule, Alert, AlertRule, Setting) +- ✅ SQLAlchemy models with relationships +- ✅ Flask app factory pattern in `web/app.py` +- ✅ Settings management with encryption (SettingsManager, PasswordManager) +- ✅ API blueprint stubs for scans, schedules, alerts, settings +- ✅ Alembic migrations system +- ✅ Docker deployment infrastructure + +### Scanner Capabilities (src/scanner.py) + +The existing scanner has these key characteristics: + +- **scan() method** returns `(report_dict, timestamp)` tuple +- **generate_outputs()** creates JSON, HTML, ZIP files +- **Five-phase scanning:** ping, TCP ports, UDP ports, service detection, HTTP/HTTPS analysis +- **Screenshot capture** with Playwright +- **Results structured** by sites → IPs → ports → services + +**Key Methods:** +- `scan()` - Main workflow, returns report and timestamp +- `generate_outputs(report, timestamp)` - Creates JSON/HTML/ZIP files +- `save_report(report, timestamp)` - Saves JSON to disk +- `_run_masscan()` - Port scanning +- `_run_nmap_service_detection()` - Service detection +- `_run_http_analysis()` - HTTP/HTTPS and SSL/TLS analysis + +--- + +## Files to Create + +### Backend Services (Core Logic) + +#### 1. `web/services/__init__.py` +Services package initialization. + +#### 2. `web/services/scan_service.py` +Core service for scan orchestration and database integration. + +**Class: ScanService** + +Methods: +- `trigger_scan(config_file, triggered_by='manual', schedule_id=None)` → scan_id + - Validate config file exists + - Create Scan record with status='running' + - Queue background job + - Return scan_id + +- `get_scan(scan_id)` → scan dict with all related data + - Query Scan with all relationships + - Format for API response + - Include sites, IPs, ports, services, certificates, TLS versions + +- `list_scans(page=1, per_page=20, status_filter=None)` → paginated results + - Query with pagination + - Filter by status if provided + - Return total count and items + +- `delete_scan(scan_id)` → cleanup DB + files + - Delete database records (cascade handles relationships) + - Delete JSON, HTML, ZIP files + - Delete screenshot directory + - Handle missing files gracefully + +- `get_scan_status(scan_id)` → status dict + - Return current scan status + - Include progress percentage if available + - Return error message if failed + +- `_save_scan_to_db(report, scan_id, status='completed')` → persist results + - Update Scan record with duration, file paths + - Call _map_report_to_models() + - Commit transaction + +- `_map_report_to_models(report, scan_obj)` → create related records + - Map JSON structure to database models + - Create ScanSite, ScanIP, ScanPort, ScanService records + - Create ScanCertificate and ScanTLSVersion records + - Handle nested relationships properly + +#### 3. `web/services/scheduler_service.py` +APScheduler integration for scheduled scans. + +**Class: SchedulerService** + +Methods: +- `init_scheduler(app)` → setup APScheduler + - Initialize BackgroundScheduler + - Load existing schedules from DB + - Start scheduler + +- `add_job(schedule_id, config_file, cron_expression)` → create scheduled job + - Parse cron expression + - Add job to scheduler + - Store job_id in database + +- `remove_job(schedule_id)` → cancel job + - Remove from scheduler + - Update database + +- `trigger_scheduled_scan(schedule_id)` → manual trigger + - Load schedule from DB + - Trigger scan via ScanService + - Update last_run timestamp + +- `update_schedule_times(schedule_id, last_run, next_run)` → DB update + - Update Schedule record + - Commit transaction + +### Authentication System + +#### 4. `web/auth/__init__.py` +Authentication package initialization. + +#### 5. `web/auth/routes.py` +Login/logout routes blueprint. + +**Routes:** +- `GET /login` - Render login form +- `POST /login` - Authenticate and create session + - Verify password via PasswordManager + - Create Flask-Login session + - Redirect to dashboard +- `GET /logout` - Destroy session + - Logout user + - Redirect to login page + +#### 6. `web/auth/decorators.py` +Custom authentication decorators. + +**Decorators:** +- `@login_required` - Wrapper for Flask-Login's login_required +- `@api_auth_required` - For API endpoints (session-based for Phase 2) + +#### 7. `web/auth/models.py` +User model for Flask-Login. + +**Class: User** +- Simple class representing the single application user +- Load from settings table (app_password) +- Implement Flask-Login required methods (get_id, is_authenticated, etc.) + +### Frontend Templates + +#### 8. `web/templates/base.html` +Base layout with Bootstrap 5 dark theme. + +**Features:** +- Navigation bar (Dashboard, Scans, Settings, Logout) +- Flash message display +- Jinja2 blocks: title, content +- Footer with version info +- Bootstrap 5 dark theme CSS +- Mobile responsive + +#### 9. `web/templates/login.html` +Login page. + +**Features:** +- Username/password form +- Error message display +- Remember me checkbox (optional) +- Redirect to dashboard after login +- Clean, minimal design + +#### 10. `web/templates/dashboard.html` +Dashboard placeholder. + +**Features:** +- Welcome message +- "Run Scan Now" button (manual trigger) +- Recent scans table (5 most recent) +- Summary stats: + - Total scans + - Last scan time + - Scans running +- Link to full scan history + +### Background Jobs + +#### 11. `web/jobs/__init__.py` +Jobs package initialization. + +#### 12. `web/jobs/scan_job.py` +Background scan execution. + +**Function: execute_scan(config_file, scan_id, db_url)** +- Run scanner in subprocess +- Update scan status in DB (running → completed/failed) +- Handle exceptions and log errors +- Store scan results in database +- Generate JSON/HTML/ZIP files + +**Implementation:** +```python +import subprocess +from pathlib import Path +from sqlalchemy import create_engine +from sqlalchemy.orm import sessionmaker +from web.models import Scan +from src.scanner import SneakyScanner +from web.services.scan_service import ScanService + +def execute_scan(config_file, scan_id, db_url): + """Execute scan in background and save to database.""" + engine = create_engine(db_url) + Session = sessionmaker(bind=engine) + session = Session() + + try: + # Update status to running + scan = session.query(Scan).get(scan_id) + scan.status = 'running' + session.commit() + + # Run scanner + scanner = SneakyScanner(config_file) + report, timestamp = scanner.scan() + + # Generate outputs (JSON, HTML, ZIP) + scanner.generate_outputs(report, timestamp) + + # Save to database + scan_service = ScanService(session) + scan_service._save_scan_to_db(report, scan_id, status='completed') + + except Exception as e: + # Mark as failed + scan = session.query(Scan).get(scan_id) + scan.status = 'failed' + session.commit() + raise + finally: + session.close() +``` + +### Utilities + +#### 13. `web/utils/pagination.py` +Pagination helper. + +**Function: paginate(query, page, per_page)** +- Apply offset and limit to SQLAlchemy query +- Return paginated results with metadata +- Handle edge cases (invalid page, empty results) + +#### 14. `web/utils/validators.py` +Input validation utilities. + +**Functions:** +- `validate_config_file(path)` → check file exists and is valid YAML +- `validate_scan_status(status)` → enum validation (running, completed, failed) +- `validate_page_params(page, per_page)` → sanitize pagination params + +### Web Routes + +#### 15. `web/routes/__init__.py` +Web routes package initialization. + +#### 16. `web/routes/main.py` +Main web routes (dashboard, etc.). + +**Routes:** +- `GET /` - Redirect to dashboard +- `GET /dashboard` - Dashboard page (@login_required) +- `GET /scans` - Scan list page (@login_required) +- `GET /scans/` - Scan details page (@login_required) + +### Testing + +#### 17. `tests/__init__.py` +Test package initialization. + +#### 18. `tests/conftest.py` +Pytest fixtures and configuration. + +**Fixtures:** +- `app` - Flask app instance with test config +- `client` - Flask test client +- `db` - Test database session +- `sample_scan` - Sample scan data for testing + +#### 19. `tests/test_scan_api.py` +API endpoint tests. + +**Tests:** +- `test_list_scans` - GET /api/scans +- `test_get_scan` - GET /api/scans/ +- `test_trigger_scan` - POST /api/scans +- `test_delete_scan` - DELETE /api/scans/ +- `test_scan_status` - GET /api/scans//status +- `test_pagination` - List scans with page/per_page params +- `test_authentication` - Verify auth required + +#### 20. `tests/test_scan_service.py` +ScanService unit tests. + +**Tests:** +- `test_trigger_scan` - Scan creation and queuing +- `test_get_scan` - Retrieve scan with relationships +- `test_list_scans` - Pagination and filtering +- `test_delete_scan` - Cleanup files and DB records +- `test_save_scan_to_db` - Database persistence +- `test_map_report_to_models` - JSON to DB mapping + +#### 21. `tests/test_authentication.py` +Authentication tests. + +**Tests:** +- `test_login_success` - Valid credentials +- `test_login_failure` - Invalid credentials +- `test_logout` - Session destruction +- `test_protected_route` - Requires authentication + +--- + +## Files to Modify + +### Backend Updates + +#### 1. `web/app.py` +Flask application factory - add authentication and scheduler. + +**Changes:** +- Import Flask-Login and configure LoginManager +- Import APScheduler and initialize in app factory +- Register auth blueprint +- Register web routes blueprint +- Add user_loader callback for Flask-Login +- Add before_request handler for authentication +- Initialize scheduler service + +**New imports:** +```python +from flask_login import LoginManager +from web.auth.routes import bp as auth_bp +from web.routes.main import bp as main_bp +from web.services.scheduler_service import SchedulerService +``` + +**New code:** +```python +# Initialize Flask-Login +login_manager = LoginManager() +login_manager.login_view = 'auth.login' +login_manager.init_app(app) + +@login_manager.user_loader +def load_user(user_id): + from web.auth.models import User + return User.get(user_id) + +# Initialize APScheduler +scheduler_service = SchedulerService() +scheduler_service.init_scheduler(app) + +# Register blueprints +app.register_blueprint(auth_bp, url_prefix='/auth') +app.register_blueprint(main_bp, url_prefix='/') +``` + +#### 2. `web/api/scans.py` +Implement all scan endpoints (currently stubs). + +**Changes:** +- Import ScanService +- Implement all endpoint logic +- Add authentication decorators +- Add proper error handling +- Add input validation +- Add logging + +**Endpoint implementations:** + +```python +from web.services.scan_service import ScanService +from web.auth.decorators import api_auth_required +from web.utils.validators import validate_config_file + +@bp.route('', methods=['POST']) +@api_auth_required +def trigger_scan(): + """Trigger a new scan.""" + data = request.get_json() or {} + config_file = data.get('config_file') + + # Validate config file + if not validate_config_file(config_file): + return jsonify({'error': 'Invalid config file'}), 400 + + # Trigger scan + scan_service = ScanService(current_app.db_session) + scan_id = scan_service.trigger_scan(config_file, triggered_by='api') + + return jsonify({ + 'scan_id': scan_id, + 'status': 'running', + 'message': 'Scan queued successfully' + }), 201 + +# Similar implementations for GET, DELETE, status endpoints... +``` + +#### 3. `src/scanner.py` +Minor modifications for progress callbacks (optional). + +**Changes (optional):** +- Add optional `progress_callback` parameter to scan() method +- Call callback at each phase (ping, TCP, UDP, services, HTTP) +- No breaking changes to existing functionality + +**Example:** +```python +def scan(self, progress_callback=None): + """Run scan with optional progress reporting.""" + if progress_callback: + progress_callback('phase', 'ping', 0) + + # Run ping scan + ping_results = self._run_ping_scan(all_ips) + + if progress_callback: + progress_callback('phase', 'tcp_scan', 20) + + # Continue with other phases... +``` + +#### 4. `requirements-web.txt` +Add missing dependencies. + +**Add:** +``` +Flask-APScheduler==1.13.1 +``` + +All other dependencies already present from Phase 1. + +#### 5. `docker-compose-web.yml` +Updates for production deployment. + +**Changes:** +- Add environment variable for scheduler threads +- Ensure proper volume mounts for data persistence +- Add healthcheck for web service +- Configure restart policy + +**Example additions:** +```yaml +environment: + - SCHEDULER_EXECUTORS=2 # Number of concurrent scan jobs + - SCHEDULER_JOB_DEFAULTS_MAX_INSTANCES=3 +healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:5000/api/settings/health"] + interval: 30s + timeout: 10s + retries: 3 +restart: unless-stopped +``` + +#### 6. Create New Alembic Migration + +**File:** `migrations/versions/002_add_scan_indexes.py` + +Add database indexes for better query performance: +- Index on scans.timestamp (for sorting recent scans) +- Index on scans.status (for filtering) +- Verify foreign key indexes exist (usually auto-created) + +**Migration code:** +```python +def upgrade(): + op.create_index('ix_scans_status', 'scans', ['status']) + # timestamp index already exists from 001 migration + +def downgrade(): + op.drop_index('ix_scans_status', 'scans') +``` + +#### 7. `docs/ai/ROADMAP.md` +Update with Phase 2 progress. + +**Changes:** +- Mark Phase 2 tasks as completed +- Update success metrics +- Add Phase 2 completion date + +--- + +## Step-by-Step Implementation + +### Step 1: Database & Service Layer ⏱️ Days 1-2 +**Priority: CRITICAL** - Foundation for everything else + +**Tasks:** +1. Create `web/services/` package +2. Implement `ScanService` class + - Start with `_save_scan_to_db()` method + - Implement `_map_report_to_models()` - most complex part + - Map JSON report structure to database models + - Handle nested relationships (sites → IPs → ports → services → certificates → TLS) +3. Implement pagination utility (`web/utils/pagination.py`) +4. Implement validators (`web/utils/validators.py`) +5. Write unit tests for ScanService +6. Create Alembic migration for indexes + +**Testing:** +- Mock `scanner.scan()` to return sample report +- Verify database records created correctly +- Test pagination logic +- Validate foreign key relationships +- Test with actual scan report JSON + +**Key Challenge:** Mapping complex JSON structure to normalized database schema + +**Solution:** Process in order, use SQLAlchemy relationships for FK handling + +### Step 2: Scan API Endpoints ⏱️ Days 3-4 +**Priority: HIGH** - Core functionality + +**Tasks:** +1. Update `web/api/scans.py`: + - Implement `POST /api/scans` (trigger scan) + - Implement `GET /api/scans` (list with pagination) + - Implement `GET /api/scans/` (get details) + - Implement `DELETE /api/scans/` (delete scan + files) + - Implement `GET /api/scans//status` (status polling) +2. Add error handling and validation +3. Add logging for all endpoints +4. Write integration tests + +**Testing:** +- Use pytest to test each endpoint +- Test with actual `scanner.scan()` execution +- Verify JSON/HTML/ZIP files created +- Test pagination edge cases +- Test 404 handling for invalid scan_id +- Test authentication required + +**Key Challenge:** Long-running scans causing HTTP timeouts + +**Solution:** Immediately return scan_id after queuing, client polls status + +### Step 3: Background Job Queue ⏱️ Days 5-6 +**Priority: HIGH** - Async scan execution + +**Tasks:** +1. Create `web/jobs/` package +2. Implement `scan_job.py`: + - `execute_scan()` function runs scanner + - Update scan status in DB (running → completed/failed) + - Handle exceptions and timeouts +3. Create `SchedulerService` class (basic version) + - Initialize APScheduler with BackgroundScheduler + - Add job management methods +4. Integrate APScheduler with Flask app + - Initialize in app factory + - Store scheduler instance in app context +5. Update `POST /api/scans` to queue job instead of blocking +6. Test background execution + +**Testing:** +- Trigger scan via API +- Verify scan runs in background +- Check status updates correctly +- Test scan failure scenarios +- Verify scanner subprocess isolation +- Test concurrent scans + +**Key Challenge:** Scanner requires privileged operations (masscan/nmap) + +**Solution:** Run in subprocess with proper privileges via Docker + +### Step 4: Authentication System ⏱️ Days 7-8 +**Priority: HIGH** - Security + +**Tasks:** +1. Create `web/auth/` package +2. Implement Flask-Login integration: + - Create User class (simple - single user) + - Configure LoginManager in app factory + - Implement user_loader callback +3. Create `auth/routes.py`: + - `GET /login` - render login form + - `POST /login` - authenticate and create session + - `GET /logout` - destroy session +4. Create `auth/decorators.py`: + - `@login_required` for web routes + - `@api_auth_required` for API endpoints +5. Apply decorators to all API endpoints +6. Test authentication flow + +**Testing:** +- Test login with correct/incorrect password +- Verify session persistence +- Test logout +- Verify protected routes require auth +- Test API authentication +- Test session timeout + +**Key Challenge:** Need both web UI and API authentication + +**Solution:** Use Flask-Login sessions for both (Phase 5 adds token auth) + +### Step 5: Basic UI Templates ⏱️ Days 9-10 +**Priority: MEDIUM** - User interface + +**Tasks:** +1. Create `base.html` template: + - Bootstrap 5 dark theme + - Navigation bar (Dashboard, Scans, Logout) + - Flash message display + - Footer with version info +2. Create `login.html`: + - Simple username/password form + - Error message display + - Redirect to dashboard after login +3. Create `dashboard.html`: + - Welcome message + - "Run Scan Now" button (manual trigger) + - Recent scans table (AJAX call to API) + - Summary stats (total scans, last scan time) +4. Create web routes blueprint (`web/routes/main.py`) +5. Style with Bootstrap 5 dark theme +6. Add minimal JavaScript for AJAX calls + +**Testing:** +- Verify templates render correctly +- Test responsive layout (desktop, tablet, mobile) +- Test login flow end-to-end +- Verify dashboard displays data from API +- Test manual scan trigger + +**Key Feature:** Dark theme matching existing HTML reports + +### Step 6: Docker & Deployment ⏱️ Day 11 +**Priority: MEDIUM** - Production readiness + +**Tasks:** +1. Update Dockerfile if needed (mostly done in Phase 1) +2. Update `docker-compose-web.yml`: + - Verify volume mounts + - Add environment variables for scheduler + - Set proper restart policy + - Add healthcheck +3. Create `.env.example` file with configuration template +4. Test deployment workflow +5. Create deployment documentation + +**Testing:** +- Build Docker image +- Run `docker-compose up` +- Test full workflow in Docker +- Verify volume persistence (database, scans) +- Test restart behavior +- Test healthcheck endpoint + +**Deliverable:** Production-ready Docker deployment + +### Step 7: Error Handling & Logging ⏱️ Day 12 +**Priority: MEDIUM** - Robustness + +**Tasks:** +1. Add comprehensive error handling: + - API error responses (JSON format) + - Web error pages (404, 500) + - Database transaction rollback on errors +2. Enhance logging: + - Structured logging for API calls + - Scan execution logging + - Error logging with stack traces +3. Add request/response logging middleware +4. Configure log rotation + +**Testing:** +- Test error scenarios (invalid input, DB errors, scanner failures) +- Verify error logging +- Check log file rotation +- Test error pages render correctly + +**Key Feature:** Helpful error messages for debugging + +### Step 8: Testing & Documentation ⏱️ Days 13-14 +**Priority: HIGH** - Quality assurance + +**Tasks:** +1. Write comprehensive tests: + - Unit tests for services (ScanService, SchedulerService) + - Integration tests for API endpoints + - End-to-end tests for workflows (login → scan → view → delete) +2. Create API documentation: + - Endpoint descriptions with request/response examples + - Authentication instructions + - Error code reference +3. Update README.md: + - Phase 2 features + - Installation instructions + - Configuration guide + - API usage examples +4. Create `PHASE2_COMPLETE.md` in `docs/ai/` +5. Update `docs/ai/ROADMAP.md` with completion status + +**Testing:** +- Run full test suite +- Achieve >80% code coverage +- Manual testing of all features +- Performance testing (multiple concurrent scans) + +**Deliverables:** +- Comprehensive test suite +- API documentation +- Updated user documentation +- Phase 2 completion summary + +--- + +## Dependencies & Prerequisites + +### Python Packages + +**Add to `requirements-web.txt`:** +``` +Flask-APScheduler==1.13.1 +``` + +**Already Present (from Phase 1):** +- Flask==3.0.0 +- Werkzeug==3.0.1 +- SQLAlchemy==2.0.23 +- alembic==1.13.0 +- Flask-Login==0.6.3 +- bcrypt==4.1.2 +- cryptography==41.0.7 +- Flask-CORS==4.0.0 +- marshmallow==3.20.1 +- marshmallow-sqlalchemy==0.29.0 +- APScheduler==3.10.4 +- Flask-Mail==0.9.1 +- python-dotenv==1.0.0 +- pytest==7.4.3 +- pytest-flask==1.3.0 + +### System Requirements + +- Python 3.12+ (development) +- Docker and Docker Compose (deployment) +- SQLite3 (database) +- Masscan, Nmap, Playwright (scanner dependencies - in Dockerfile) + +### Configuration Files + +**New file: `.env.example`** +```bash +# Flask configuration +FLASK_ENV=production +FLASK_DEBUG=false +FLASK_HOST=0.0.0.0 +FLASK_PORT=5000 + +# Database +DATABASE_URL=sqlite:////app/data/sneakyscanner.db + +# Security +SECRET_KEY=your-secret-key-here-change-in-production +SNEAKYSCANNER_ENCRYPTION_KEY=your-encryption-key-here + +# CORS (comma-separated origins) +CORS_ORIGINS=* + +# Logging +LOG_LEVEL=INFO + +# Scheduler +SCHEDULER_EXECUTORS=2 +SCHEDULER_JOB_DEFAULTS_MAX_INSTANCES=3 +``` + +--- + +## Testing Approach + +### Unit Tests + +**Framework:** pytest + +**Coverage Target:** >80% + +**Test Files:** +- `tests/test_scan_service.py` - ScanService methods +- `tests/test_scheduler_service.py` - SchedulerService methods +- `tests/test_validators.py` - Validation functions +- `tests/test_pagination.py` - Pagination helper + +**Approach:** +- Mock database calls with pytest fixtures +- Test each method independently +- Test edge cases and error conditions +- Use `@pytest.mark.parametrize` for multiple scenarios + +### Integration Tests + +**Test Files:** +- `tests/test_scan_api.py` - API endpoints with real database +- `tests/test_authentication.py` - Auth flow + +**Approach:** +- Use test database (separate from production) +- Test API endpoints end-to-end +- Verify database state after operations +- Test authentication required for protected routes + +### End-to-End Tests + +**Test Files:** +- `tests/test_workflows.py` - Complete user workflows + +**Scenarios:** +- Login → Trigger scan → View results → Delete scan → Logout +- API: Trigger scan → Poll status → Get results → Delete +- Background job execution → Database update → File creation + +**Approach:** +- Use pytest-flask for Flask testing +- Test both UI and API flows +- Verify files created/deleted +- Test concurrent operations + +### Manual Testing Checklist + +Phase 2 complete when all items checked: + +- [ ] Login with correct password succeeds +- [ ] Login with incorrect password fails +- [ ] Trigger scan via UI button works +- [ ] Trigger scan via API `POST /api/scans` works +- [ ] Scan runs in background (doesn't block) +- [ ] Scan status updates correctly (running → completed) +- [ ] View scan list with pagination +- [ ] View scan details page +- [ ] Delete scan removes DB records and files +- [ ] Logout destroys session +- [ ] Access protected route while logged out redirects to login +- [ ] Background scan completes successfully +- [ ] Background scan handles errors gracefully (e.g., invalid config) +- [ ] Multiple concurrent scans work +- [ ] Docker Compose deployment works +- [ ] Database persists across container restarts +- [ ] Scan files persist in volume + +--- + +## Potential Challenges & Solutions + +### Challenge 1: Scanner Integration with Background Jobs + +**Problem:** Scanner runs privileged operations (masscan/nmap require CAP_NET_RAW). How to execute from web app? + +**Impact:** High - core functionality + +**Solution:** +- Run scanner in subprocess with proper privileges +- Use Docker's `--privileged` mode and `--network host` +- Pass database URL to subprocess for status updates +- Isolate scanner execution errors from web app +- Use APScheduler's BackgroundScheduler (runs in threads) + +**Implementation:** +```python +# web/jobs/scan_job.py +def execute_scan(config_file, scan_id, db_url): + # Create new DB session for this thread + engine = create_engine(db_url) + Session = sessionmaker(bind=engine) + session = Session() + + try: + # Update status + scan = session.query(Scan).get(scan_id) + scan.status = 'running' + session.commit() + + # Run scanner (has privileged access via Docker) + scanner = SneakyScanner(config_file) + report, timestamp = scanner.scan() + scanner.generate_outputs(report, timestamp) + + # Save to DB + scan_service = ScanService(session) + scan_service._save_scan_to_db(report, scan_id) + except Exception as e: + scan.status = 'failed' + session.commit() + raise + finally: + session.close() +``` + +### Challenge 2: Database Concurrency + +**Problem:** Background jobs and web requests accessing SQLite simultaneously can cause locking issues. + +**Impact:** Medium - affects reliability + +**Solution:** +- Enable SQLite WAL (Write-Ahead Logging) mode for better concurrency +- Use SQLAlchemy scoped sessions (already configured) +- Add proper transaction handling and rollback +- Use connection pool (already configured in app.py) +- Consider timeout for busy database + +**Implementation:** +```python +# web/app.py - when creating engine +engine = create_engine( + app.config['SQLALCHEMY_DATABASE_URI'], + echo=app.debug, + pool_pre_ping=True, + pool_recycle=3600, + connect_args={'timeout': 15} # 15 second timeout for locks +) + +# Enable WAL mode for SQLite +if 'sqlite' in app.config['SQLALCHEMY_DATABASE_URI']: + @event.listens_for(engine, "connect") + def set_sqlite_pragma(dbapi_conn, connection_record): + cursor = dbapi_conn.cursor() + cursor.execute("PRAGMA journal_mode=WAL") + cursor.close() +``` + +### Challenge 3: Scan Status Tracking + +**Problem:** Need to show real-time progress for long-running scans (can take 5-10 minutes). + +**Impact:** Medium - affects UX + +**Solution:** +- Store scan status in DB (running, completed, failed) +- Implement polling endpoint `GET /api/scans//status` +- Update status at each scan phase (ping, TCP, UDP, services, HTTP) +- Dashboard polls every 5 seconds for running scans +- Consider WebSocket for real-time updates (Phase 3 enhancement) + +**Implementation:** +```javascript +// Dashboard JavaScript - poll for status +function pollScanStatus(scanId) { + const interval = setInterval(async () => { + const response = await fetch(`/api/scans/${scanId}/status`); + const data = await response.json(); + + if (data.status === 'completed' || data.status === 'failed') { + clearInterval(interval); + refreshScanList(); + } + + updateProgressBar(data.progress); + }, 5000); // Poll every 5 seconds +} +``` + +### Challenge 4: File Cleanup on Scan Deletion + +**Problem:** Must delete JSON, HTML, ZIP, screenshots when scan deleted. Missing files shouldn't cause errors. + +**Impact:** Medium - affects storage and cleanup + +**Solution:** +- Store all file paths in DB (already in schema) +- Implement cleanup in `ScanService.delete_scan()` +- Use pathlib for safe file operations +- Handle missing files gracefully (log warning, continue) +- Use database cascade deletion for related records +- Delete screenshot directory recursively + +**Implementation:** +```python +# web/services/scan_service.py +def delete_scan(self, scan_id): + scan = self.db.query(Scan).get(scan_id) + if not scan: + raise ValueError(f"Scan {scan_id} not found") + + # Delete files (handle missing gracefully) + for file_path in [scan.json_path, scan.html_path, scan.zip_path]: + if file_path: + try: + Path(file_path).unlink() + except FileNotFoundError: + logger.warning(f"File not found: {file_path}") + + # Delete screenshot directory + if scan.screenshot_dir: + try: + shutil.rmtree(scan.screenshot_dir) + except FileNotFoundError: + logger.warning(f"Directory not found: {scan.screenshot_dir}") + + # Delete DB record (cascade handles relationships) + self.db.delete(scan) + self.db.commit() +``` + +### Challenge 5: Authentication for API + +**Problem:** Need to protect API endpoints but also allow programmatic access. Session cookies don't work well for API clients. + +**Impact:** Medium - affects API usability + +**Solution:** +- Use Flask-Login sessions for both web UI and API in Phase 2 +- Require session cookie for API calls (works with curl -c/-b) +- Add `@api_auth_required` decorator to all endpoints +- Phase 5 will add token authentication for CLI client +- For now, document API usage with session cookies + +**Implementation:** +```python +# web/auth/decorators.py +from functools import wraps +from flask_login import current_user +from flask import jsonify + +def api_auth_required(f): + @wraps(f) + def decorated_function(*args, **kwargs): + if not current_user.is_authenticated: + return jsonify({'error': 'Authentication required'}), 401 + return f(*args, **kwargs) + return decorated_function + +# Usage in API +@bp.route('', methods=['POST']) +@api_auth_required +def trigger_scan(): + # Protected endpoint + pass +``` + +**API Usage Example:** +```bash +# Login first to get session cookie +curl -X POST http://localhost:5000/auth/login \ + -H "Content-Type: application/json" \ + -d '{"password":"yourpassword"}' \ + -c cookies.txt + +# Use cookie for API calls +curl -X POST http://localhost:5000/api/scans \ + -H "Content-Type: application/json" \ + -d '{"config_file":"/app/configs/example.yaml"}' \ + -b cookies.txt +``` + +### Challenge 6: Scanner Output Mapping to Database + +**Problem:** Complex nested JSON structure needs to map to normalized relational database schema. Many relationships to handle. + +**Impact:** High - core functionality + +**Solution:** +- Create comprehensive `_map_report_to_models()` method +- Process in order: Scan → Sites → IPs → Ports → Services → Certificates → TLS Versions +- Use SQLAlchemy relationships for automatic FK handling +- Batch operations within single transaction +- Add detailed error logging for mapping issues +- Handle missing/optional fields gracefully + +**Implementation Strategy:** +```python +def _map_report_to_models(self, report, scan_obj): + """Map JSON report to database models.""" + + # 1. Process sites + for site_data in report['sites']: + site = ScanSite( + scan_id=scan_obj.id, + site_name=site_data['name'] + ) + self.db.add(site) + self.db.flush() # Get site.id + + # 2. Process IPs for this site + for ip_data in site_data['ips']: + ip = ScanIP( + scan_id=scan_obj.id, + site_id=site.id, + ip_address=ip_data['address'], + ping_expected=ip_data['expected']['ping'], + ping_actual=ip_data['actual']['ping'] + ) + self.db.add(ip) + self.db.flush() + + # 3. Process ports for this IP + for port_data in ip_data['actual']['tcp_ports']: + port = ScanPort( + scan_id=scan_obj.id, + ip_id=ip.id, + port=port_data['port'], + protocol='tcp', + expected=port_data.get('expected', False), + state='open' + ) + self.db.add(port) + self.db.flush() + + # 4. Process services for this port + service_data = port_data.get('service') + if service_data: + service = ScanService( + scan_id=scan_obj.id, + port_id=port.id, + service_name=service_data.get('name'), + product=service_data.get('product'), + version=service_data.get('version'), + # ... more fields + ) + self.db.add(service) + self.db.flush() + + # 5. Process certificate if HTTPS + cert_data = service_data.get('http_info', {}).get('certificate') + if cert_data: + cert = ScanCertificate( + scan_id=scan_obj.id, + service_id=service.id, + # ... cert fields + ) + self.db.add(cert) + self.db.flush() + + # 6. Process TLS versions + for tls_data in cert_data.get('tls_versions', []): + tls = ScanTLSVersion( + scan_id=scan_obj.id, + certificate_id=cert.id, + # ... tls fields + ) + self.db.add(tls) + + # Commit entire transaction + self.db.commit() +``` + +### Challenge 7: Long-Running Scans Timeout + +**Problem:** HTTP request might timeout during long scans (5-10 minutes). Browser/client gives up waiting. + +**Impact:** High - affects UX + +**Solution:** +- Queue scan job immediately +- Return scan_id right away (within seconds) +- Client polls `GET /api/scans//status` for progress +- Store scan status and progress in database +- Background job runs independently of HTTP connection +- Dashboard auto-refreshes scan list + +**Flow:** +1. User clicks "Run Scan" button +2. POST /api/scans → creates Scan record, queues job, returns scan_id +3. JavaScript starts polling status endpoint every 5 seconds +4. Background job runs scanner, updates status in DB +5. When status changes to 'completed', stop polling and refresh scan list + +--- + +## Success Criteria + +Phase 2 is **COMPLETE** when all criteria are met: + +### API Functionality +- [ ] `POST /api/scans` triggers background scan and returns scan_id +- [ ] `GET /api/scans` lists scans with pagination (page, per_page params) +- [ ] `GET /api/scans/` returns full scan details from database +- [ ] `DELETE /api/scans/` removes scan records and files +- [ ] `GET /api/scans//status` shows current scan progress + +### Database Integration +- [ ] Scan results automatically saved to database after completion +- [ ] All relationships populated correctly (sites, IPs, ports, services, certs, TLS) +- [ ] Database queries work efficiently (indexes in place) +- [ ] Cascade deletion works for related records + +### Background Jobs +- [ ] Scans execute in background (don't block HTTP requests) +- [ ] Multiple scans can run concurrently +- [ ] Scan status updates correctly (running → completed/failed) +- [ ] Failed scans marked appropriately with error message + +### Authentication +- [ ] Login page renders and accepts password +- [ ] Successful login creates session and redirects to dashboard +- [ ] Invalid password shows error message +- [ ] Logout destroys session +- [ ] Protected routes require authentication +- [ ] API endpoints require authentication + +### User Interface +- [ ] Dashboard displays welcome message and stats +- [ ] Dashboard shows recent scans in table +- [ ] "Run Scan Now" button triggers scan +- [ ] Login page has clean design +- [ ] Templates use Bootstrap 5 dark theme +- [ ] Navigation works between pages + +### File Management +- [ ] JSON, HTML, ZIP files still generated (backward compatible) +- [ ] Screenshot directory created with images +- [ ] Files referenced correctly in database +- [ ] Delete scan removes all associated files + +### Deployment +- [ ] Docker Compose starts web app successfully +- [ ] Database persists across container restarts +- [ ] Scan files persist in mounted volume +- [ ] Healthcheck endpoint responds correctly +- [ ] Logs written to volume + +### Testing +- [ ] All unit tests pass +- [ ] All integration tests pass +- [ ] Test coverage >80% +- [ ] Manual testing checklist complete + +### Documentation +- [ ] API endpoints documented with examples +- [ ] README.md updated with Phase 2 features +- [ ] PHASE2_COMPLETE.md created +- [ ] ROADMAP.md updated + +--- + +## Migration Path + +### From Phase 1 to Phase 2 + +**No Breaking Changes:** +- Database schema already complete (Phase 1) +- Existing `scanner.py` code unchanged (backward compatible) +- YAML config format unchanged +- JSON/HTML/ZIP output format unchanged +- Docker deployment configuration compatible + +**Additions:** +- New API endpoint implementations (replace stubs) +- New service layer (ScanService, SchedulerService) +- New authentication system +- New UI templates +- Background job system + +**Migration Steps:** +1. Pull latest code +2. Install new dependency: `pip install Flask-APScheduler` +3. Run new Alembic migration: `alembic upgrade head` +4. Set application password if not set: `python3 init_db.py --password YOUR_PASSWORD` +5. Rebuild Docker image: `docker-compose -f docker-compose-web.yml build` +6. Start services: `docker-compose -f docker-compose-web.yml up -d` + +### Backward Compatibility + +**CLI Scanner:** +- Continues to work standalone: `python3 src/scanner.py configs/example.yaml` +- Still generates JSON/HTML/ZIP files +- No changes to command-line interface + +**Existing Scans:** +- Old scan JSON files not automatically imported to database +- Can be imported manually if needed (not in Phase 2 scope) +- New scans saved to both files and database + +**Configuration:** +- Existing YAML configs work without modification +- Settings from Phase 1 preserved +- No config changes required + +--- + +## Estimated Timeline + +**Total Duration:** 14 working days (2 weeks) + +### Week 1: Backend Foundation +- **Days 1-2:** Database & Service Layer + - ScanService implementation + - Database mapping logic + - Unit tests + +- **Days 3-4:** Scan API Endpoints + - All 5 endpoints implemented + - Input validation + - Integration tests + +- **Days 5-6:** Background Job Queue + - APScheduler integration + - Job execution logic + - Concurrent scan testing + +- **Day 7:** Authentication System (Part 1) + - Flask-Login setup + - User model + - Login/logout routes + +### Week 2: Frontend & Polish +- **Day 8:** Authentication System (Part 2) + - Decorators + - Apply to all endpoints + - Authentication tests + +- **Days 9-10:** Basic UI Templates + - Base template + - Login page + - Dashboard + - Web routes + +- **Day 11:** Docker & Deployment + - Docker Compose updates + - Deployment testing + - Production configuration + +- **Day 12:** Error Handling & Logging + - Error pages + - Logging enhancements + - Error scenarios testing + +- **Days 13-14:** Testing & Documentation + - Complete test suite + - API documentation + - README updates + - PHASE2_COMPLETE.md + +### Critical Path +The critical path (tasks that must complete before others): +1. Service Layer (Days 1-2) → Everything depends on this +2. API Endpoints (Days 3-4) → Required for UI and background jobs +3. Background Jobs (Days 5-6) → Required for async scan execution +4. Authentication (Days 7-8) → Required for security + +UI templates and documentation can proceed in parallel after Day 8. + +--- + +## Key Design Decisions + +### Decision 1: Background Job Processing + +**Choice:** APScheduler BackgroundScheduler + +**Alternatives Considered:** +- Celery with Redis/RabbitMQ +- Python threading module +- Subprocess with cron + +**Rationale:** +- APScheduler is lightweight (no external dependencies) +- BackgroundScheduler runs in threads (simple, no message broker needed) +- Sufficient for single-user application +- Can handle concurrent scans +- Easy to integrate with Flask +- Meets all Phase 2 requirements + +**Trade-offs:** +- ✅ Simple deployment (no Redis needed) +- ✅ Low resource usage +- ✅ Built-in job scheduling +- ❌ Less scalable than Celery (but not needed for single-user) +- ❌ Jobs lost if app crashes (acceptable for this use case) + +### Decision 2: Authentication System + +**Choice:** Flask-Login with single-user password + +**Alternatives Considered:** +- Multi-user with SQLite user table +- JWT tokens +- Basic HTTP auth +- No authentication + +**Rationale:** +- Simple and meets requirements (single-user, self-hosted) +- Flask-Login is well-maintained and integrated with Flask +- Session-based auth works for both UI and API +- Sufficient security for local/internal deployment +- Easy to implement and test + +**Trade-offs:** +- ✅ Simple implementation +- ✅ Works for UI and API +- ✅ Secure (bcrypt password hashing) +- ❌ Not suitable for multi-user (not a requirement) +- ❌ Session cookies don't work well for CLI clients (Phase 5 adds tokens) + +### Decision 3: Database Storage Strategy + +**Choice:** Store complete scan results in normalized database schema + +**Alternatives Considered:** +- Store only metadata, keep JSON files for details +- Store JSON blob in database +- Hybrid approach (metadata in DB, details in files) + +**Rationale:** +- Enables powerful queries (find all scans with cert expiring in 30 days) +- Required for trending and comparison features (Phase 4) +- Normalized schema is more flexible for future features +- Small storage overhead acceptable (scans are small) +- Still generate JSON/HTML/ZIP for backward compatibility + +**Trade-offs:** +- ✅ Enables advanced queries +- ✅ Required for Phase 3-4 features +- ✅ Clean separation of concerns +- ❌ More complex mapping logic +- ❌ Slightly larger database size (minimal impact) + +### Decision 4: Scanner Execution Model + +**Choice:** Execute scanner in subprocess from web app + +**Alternatives Considered:** +- Refactor scanner into library (import directly) +- Separate scanner service (microservice) +- CLI wrapper + +**Rationale:** +- Maintains isolation (scanner errors don't crash web app) +- Reuses existing scanner code (no refactoring needed) +- Handles privileged operations via Docker +- Simple to implement +- Backward compatible with CLI usage + +**Trade-offs:** +- ✅ Isolation and stability +- ✅ Reuses existing code +- ✅ Backward compatible +- ❌ Slightly more overhead than library import (minimal) +- ❌ Inter-process communication needed (solved with DB) + +### Decision 5: API Authentication (Phase 2) + +**Choice:** Session-based authentication via Flask-Login + +**Alternatives Considered:** +- API tokens (Bearer authentication) +- OAuth2 +- No authentication for API, only UI + +**Rationale:** +- Consistent with web UI authentication +- Simple to implement (already using Flask-Login) +- Works for testing and initial API usage +- Phase 5 will add token authentication for CLI client +- Secure enough for single-user self-hosted deployment + +**Trade-offs:** +- ✅ Consistent with UI auth +- ✅ Simple implementation +- ✅ Secure for intended use case +- ❌ Not ideal for programmatic access (Phase 5 improvement) +- ❌ Requires cookie management in API clients + +**API Usage Pattern (Phase 2):** +```bash +# Login to get session cookie +curl -X POST http://localhost:5000/auth/login \ + -d '{"password":"yourpass"}' \ + -c cookies.txt + +# Use session cookie for API calls +curl -X GET http://localhost:5000/api/scans \ + -b cookies.txt +``` + +--- + +## Documentation Deliverables + +### 1. API Documentation (`docs/ai/API_REFERENCE.md`) + +**Contents:** +- Endpoint reference (all 5 scan endpoints) +- Request/response examples +- Authentication instructions +- Error codes and messages +- Pagination parameters +- Status codes + +**Example:** +```markdown +## POST /api/scans + +Trigger a new scan. + +**Authentication:** Required (session cookie) + +**Request Body:** +```json +{ + "config_file": "/app/configs/example.yaml" +} +``` + +**Response (201 Created):** +```json +{ + "scan_id": 42, + "status": "running", + "message": "Scan queued successfully" +} +``` +``` + +### 2. Deployment Guide (`docs/ai/DEPLOYMENT.md`) + +**Contents:** +- Docker Compose setup +- Environment variables +- Volume configuration +- Database initialization +- First-time setup +- Troubleshooting + +### 3. Developer Guide (`docs/ai/DEVELOPMENT.md`) + +**Contents:** +- Project structure +- Architecture overview +- Database schema +- Service layer design +- Adding new features +- Running tests +- Code style guide + +### 4. User Guide (`README.md` updates) + +**Contents:** +- Phase 2 features +- Web UI usage +- API usage +- Configuration +- Common tasks +- FAQ + +### 5. Phase 2 Completion Summary (`docs/ai/PHASE2_COMPLETE.md`) + +**Contents:** +- What was delivered +- Success criteria checklist +- Known limitations +- Next steps (Phase 3) +- Migration instructions +- Testing results + +--- + +## Appendix: Example API Calls + +### Authentication + +```bash +# Login +curl -X POST http://localhost:5000/auth/login \ + -H "Content-Type: application/json" \ + -d '{"password":"yourpassword"}' \ + -c cookies.txt + +# Logout +curl -X GET http://localhost:5000/auth/logout \ + -b cookies.txt +``` + +### Scan Management + +```bash +# Trigger scan +curl -X POST http://localhost:5000/api/scans \ + -H "Content-Type: application/json" \ + -d '{"config_file":"/app/configs/example.yaml"}' \ + -b cookies.txt + +# List scans (with pagination) +curl -X GET "http://localhost:5000/api/scans?page=1&per_page=20" \ + -b cookies.txt + +# Get scan details +curl -X GET http://localhost:5000/api/scans/42 \ + -b cookies.txt + +# Get scan status +curl -X GET http://localhost:5000/api/scans/42/status \ + -b cookies.txt + +# Delete scan +curl -X DELETE http://localhost:5000/api/scans/42 \ + -b cookies.txt +``` + +### Example Responses + +**GET /api/scans/42:** +```json +{ + "id": 42, + "timestamp": "2025-11-14T10:30:00Z", + "duration": 125.5, + "status": "completed", + "title": "Production Network Scan", + "config_file": "/app/configs/production.yaml", + "json_path": "/app/output/scan_report_20251114_103000.json", + "html_path": "/app/output/scan_report_20251114_103000.html", + "zip_path": "/app/output/scan_report_20251114_103000.zip", + "screenshot_dir": "/app/output/scan_report_20251114_103000_screenshots", + "triggered_by": "api", + "sites": [ + { + "id": 101, + "name": "Production DC", + "ips": [ + { + "id": 201, + "address": "192.168.1.10", + "ping_expected": true, + "ping_actual": true, + "ports": [ + { + "id": 301, + "port": 443, + "protocol": "tcp", + "state": "open", + "expected": true, + "services": [ + { + "id": 401, + "service_name": "https", + "product": "nginx", + "version": "1.24.0", + "http_protocol": "https", + "screenshot_path": "scan_report_20251114_103000_screenshots/192_168_1_10_443.png" + } + ] + } + ] + } + ] + } + ] +} +``` + +**GET /api/scans/42/status:** +```json +{ + "scan_id": 42, + "status": "running", + "progress": 60, + "current_phase": "service_detection", + "started_at": "2025-11-14T10:30:00Z" +} +``` + +--- + +**End of Phase 2 Plan** + +This plan will be followed during Phase 2 implementation. Upon completion, a new document `PHASE2_COMPLETE.md` will summarize actual implementation, challenges encountered, and lessons learned.