Phase 2 Step 3: Implement Background Job Queue

Implemented APScheduler integration for background scan execution, enabling async job processing without blocking HTTP requests. ## Changes ### Background Jobs (web/jobs/) - scan_job.py - Execute scans in background threads - execute_scan() with isolated database sessions - Comprehensive error handling and logging - Scan status lifecycle tracking - Timing and error message storage ### Scheduler Service (web/services/scheduler_service.py) - SchedulerService class for job management - APScheduler BackgroundScheduler integration - ThreadPoolExecutor for concurrent jobs (max 3 workers) - queue_scan() - Immediate job execution - Job monitoring: list_jobs(), get_job_status() - Graceful shutdown handling ### Flask Integration (web/app.py) - init_scheduler() function - Scheduler initialization in app factory - Stored scheduler in app context (app.scheduler) ### Database Schema (migration 003) - Added scan timing fields: - started_at - Scan execution start time - completed_at - Scan execution completion time - error_message - Error details for failed scans ### Service Layer Updates (web/services/scan_service.py) - trigger_scan() accepts scheduler parameter - Queues background jobs after creating scan record - get_scan_status() includes new timing and error fields - _save_scan_to_db() sets completed_at timestamp ### API Updates (web/api/scans.py) - POST /api/scans passes scheduler to trigger_scan() - Scans now execute in background automatically ### Model Updates (web/models.py) - Added started_at, completed_at, error_message to Scan model ### Testing (tests/test_background_jobs.py) - 13 unit tests for background job execution - Scheduler initialization and configuration tests - Job queuing and status tracking tests - Scan timing field tests - Error handling and storage tests - Integration test for full workflow (skipped by default) ## Features - Async scan execution without blocking HTTP requests - Concurrent scan support (configurable max workers) - Isolated database sessions per background thread - Scan lifecycle tracking: created → running → completed/failed - Error messages captured and stored in database - Job monitoring and management capabilities - Graceful shutdown waits for running jobs ## Implementation Notes - Scanner runs in subprocess from background thread - Docker provides necessary privileges (--privileged, --network host) - Each job gets isolated SQLAlchemy session (avoid locking) - Job IDs follow pattern: scan_{scan_id} - Background jobs survive across requests - Failed jobs store error messages in database ## Documentation (docs/ai/PHASE2.md) - Updated progress: 6/14 days complete (43%) - Marked Step 3 as complete - Added detailed implementation notes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 09:24:00 -06:00
parent 6c4905d6c1
commit ee0c5a2c3c
10 changed files with 810 additions and 33 deletions
--- a/web/services/scan_service.py
+++ b/web/services/scan_service.py
@@ -42,7 +42,7 @@ class ScanService:
        self.db = db_session

    def trigger_scan(self, config_file: str, triggered_by: str = 'manual',
-                    schedule_id: Optional[int] = None) -> int:
+                    schedule_id: Optional[int] = None, scheduler=None) -> int:
        """
        Trigger a new scan.

@@ -53,6 +53,7 @@ class ScanService:
            config_file: Path to YAML configuration file
            triggered_by: Source that triggered scan (manual, scheduled, api)
            schedule_id: Optional schedule ID if triggered by schedule
+            scheduler: Optional SchedulerService instance for queuing background jobs

        Returns:
            Scan ID of the created scan
@@ -87,8 +88,21 @@ class ScanService:

        logger.info(f"Scan {scan.id} triggered via {triggered_by}")

-        # NOTE: Background job queuing will be implemented in Step 3
-        # For now, just return the scan ID
+        # Queue background job if scheduler provided
+        if scheduler:
+            try:
+                job_id = scheduler.queue_scan(scan.id, config_file)
+                logger.info(f"Scan {scan.id} queued for background execution (job_id={job_id})")
+            except Exception as e:
+                logger.error(f"Failed to queue scan {scan.id}: {str(e)}")
+                # Mark scan as failed if job queuing fails
+                scan.status = 'failed'
+                scan.error_message = f"Failed to queue background job: {str(e)}"
+                self.db.commit()
+                raise
+        else:
+            logger.warning(f"Scan {scan.id} created but not queued (no scheduler provided)")
+
        return scan.id

    def get_scan(self, scan_id: int) -> Optional[Dict[str, Any]]:
@@ -230,7 +244,9 @@ class ScanService:
            'scan_id': scan.id,
            'status': scan.status,
            'title': scan.title,
-            'started_at': scan.timestamp.isoformat() if scan.timestamp else None,
+            'timestamp': scan.timestamp.isoformat() if scan.timestamp else None,
+            'started_at': scan.started_at.isoformat() if scan.started_at else None,
+            'completed_at': scan.completed_at.isoformat() if scan.completed_at else None,
            'duration': scan.duration,
            'triggered_by': scan.triggered_by
        }
@@ -242,6 +258,7 @@ class ScanService:
            status_info['progress'] = 'Complete'
        elif scan.status == 'failed':
            status_info['progress'] = 'Failed'
+            status_info['error_message'] = scan.error_message

        return status_info

@@ -265,6 +282,7 @@ class ScanService:
        # Update scan record
        scan.status = status
        scan.duration = report.get('scan_duration')
+        scan.completed_at = datetime.utcnow()

        # Map report data to database models
        self._map_report_to_models(report, scan)