Phase 2 Step 3: Implement Background Job Queue

Implemented APScheduler integration for background scan execution, enabling async job processing without blocking HTTP requests. ## Changes ### Background Jobs (web/jobs/) - scan_job.py - Execute scans in background threads - execute_scan() with isolated database sessions - Comprehensive error handling and logging - Scan status lifecycle tracking - Timing and error message storage ### Scheduler Service (web/services/scheduler_service.py) - SchedulerService class for job management - APScheduler BackgroundScheduler integration - ThreadPoolExecutor for concurrent jobs (max 3 workers) - queue_scan() - Immediate job execution - Job monitoring: list_jobs(), get_job_status() - Graceful shutdown handling ### Flask Integration (web/app.py) - init_scheduler() function - Scheduler initialization in app factory - Stored scheduler in app context (app.scheduler) ### Database Schema (migration 003) - Added scan timing fields: - started_at - Scan execution start time - completed_at - Scan execution completion time - error_message - Error details for failed scans ### Service Layer Updates (web/services/scan_service.py) - trigger_scan() accepts scheduler parameter - Queues background jobs after creating scan record - get_scan_status() includes new timing and error fields - _save_scan_to_db() sets completed_at timestamp ### API Updates (web/api/scans.py) - POST /api/scans passes scheduler to trigger_scan() - Scans now execute in background automatically ### Model Updates (web/models.py) - Added started_at, completed_at, error_message to Scan model ### Testing (tests/test_background_jobs.py) - 13 unit tests for background job execution - Scheduler initialization and configuration tests - Job queuing and status tracking tests - Scan timing field tests - Error handling and storage tests - Integration test for full workflow (skipped by default) ## Features - Async scan execution without blocking HTTP requests - Concurrent scan support (configurable max workers) - Isolated database sessions per background thread - Scan lifecycle tracking: created → running → completed/failed - Error messages captured and stored in database - Job monitoring and management capabilities - Graceful shutdown waits for running jobs ## Implementation Notes - Scanner runs in subprocess from background thread - Docker provides necessary privileges (--privileged, --network host) - Each job gets isolated SQLAlchemy session (avoid locking) - Job IDs follow pattern: scan_{scan_id} - Background jobs survive across requests - Failed jobs store error messages in database ## Documentation (docs/ai/PHASE2.md) - Updated progress: 6/14 days complete (43%) - Marked Step 3 as complete - Added detailed implementation notes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 09:24:00 -06:00
parent 6c4905d6c1
commit ee0c5a2c3c
10 changed files with 810 additions and 33 deletions
--- a/migrations/versions/003_add_scan_timing_fields.py
+++ b/migrations/versions/003_add_scan_timing_fields.py
@@ -0,0 +1,39 @@
+"""Add timing and error fields to scans table
+
+Revision ID: 003
+Revises: 002
+Create Date: 2025-11-14
+
+"""
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic
+revision = '003'
+down_revision = '002'
+branch_labels = None
+depends_on = None
+
+
+def upgrade():
+    """
+    Add fields for tracking scan execution timing and errors.
+
+    New fields:
+    - started_at: When scan execution actually started
+    - completed_at: When scan execution finished (success or failure)
+    - error_message: Error message if scan failed
+    """
+    with op.batch_alter_table('scans') as batch_op:
+        batch_op.add_column(sa.Column('started_at', sa.DateTime(), nullable=True, comment='Scan execution start time'))
+        batch_op.add_column(sa.Column('completed_at', sa.DateTime(), nullable=True, comment='Scan execution completion time'))
+        batch_op.add_column(sa.Column('error_message', sa.Text(), nullable=True, comment='Error message if scan failed'))
+
+
+def downgrade():
+    """Remove the timing and error fields."""
+    with op.batch_alter_table('scans') as batch_op:
+        batch_op.drop_column('error_message')
+        batch_op.drop_column('completed_at')
+        batch_op.drop_column('started_at')