Phase 2 Step 3: Implement Background Job Queue

Implemented APScheduler integration for background scan execution,
enabling async job processing without blocking HTTP requests.

## Changes

### Background Jobs (web/jobs/)
- scan_job.py - Execute scans in background threads
  - execute_scan() with isolated database sessions
  - Comprehensive error handling and logging
  - Scan status lifecycle tracking
  - Timing and error message storage

### Scheduler Service (web/services/scheduler_service.py)
- SchedulerService class for job management
- APScheduler BackgroundScheduler integration
- ThreadPoolExecutor for concurrent jobs (max 3 workers)
- queue_scan() - Immediate job execution
- Job monitoring: list_jobs(), get_job_status()
- Graceful shutdown handling

### Flask Integration (web/app.py)
- init_scheduler() function
- Scheduler initialization in app factory
- Stored scheduler in app context (app.scheduler)

### Database Schema (migration 003)
- Added scan timing fields:
  - started_at - Scan execution start time
  - completed_at - Scan execution completion time
  - error_message - Error details for failed scans

### Service Layer Updates (web/services/scan_service.py)
- trigger_scan() accepts scheduler parameter
- Queues background jobs after creating scan record
- get_scan_status() includes new timing and error fields
- _save_scan_to_db() sets completed_at timestamp

### API Updates (web/api/scans.py)
- POST /api/scans passes scheduler to trigger_scan()
- Scans now execute in background automatically

### Model Updates (web/models.py)
- Added started_at, completed_at, error_message to Scan model

### Testing (tests/test_background_jobs.py)
- 13 unit tests for background job execution
- Scheduler initialization and configuration tests
- Job queuing and status tracking tests
- Scan timing field tests
- Error handling and storage tests
- Integration test for full workflow (skipped by default)

## Features

- Async scan execution without blocking HTTP requests
- Concurrent scan support (configurable max workers)
- Isolated database sessions per background thread
- Scan lifecycle tracking: created → running → completed/failed
- Error messages captured and stored in database
- Job monitoring and management capabilities
- Graceful shutdown waits for running jobs

## Implementation Notes

- Scanner runs in subprocess from background thread
- Docker provides necessary privileges (--privileged, --network host)
- Each job gets isolated SQLAlchemy session (avoid locking)
- Job IDs follow pattern: scan_{scan_id}
- Background jobs survive across requests
- Failed jobs store error messages in database

## Documentation (docs/ai/PHASE2.md)
- Updated progress: 6/14 days complete (43%)
- Marked Step 3 as complete
- Added detailed implementation notes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-14 09:24:00 -06:00
parent 6c4905d6c1
commit ee0c5a2c3c
10 changed files with 810 additions and 33 deletions

View File

@@ -42,7 +42,7 @@ class ScanService:
self.db = db_session
def trigger_scan(self, config_file: str, triggered_by: str = 'manual',
schedule_id: Optional[int] = None) -> int:
schedule_id: Optional[int] = None, scheduler=None) -> int:
"""
Trigger a new scan.
@@ -53,6 +53,7 @@ class ScanService:
config_file: Path to YAML configuration file
triggered_by: Source that triggered scan (manual, scheduled, api)
schedule_id: Optional schedule ID if triggered by schedule
scheduler: Optional SchedulerService instance for queuing background jobs
Returns:
Scan ID of the created scan
@@ -87,8 +88,21 @@ class ScanService:
logger.info(f"Scan {scan.id} triggered via {triggered_by}")
# NOTE: Background job queuing will be implemented in Step 3
# For now, just return the scan ID
# Queue background job if scheduler provided
if scheduler:
try:
job_id = scheduler.queue_scan(scan.id, config_file)
logger.info(f"Scan {scan.id} queued for background execution (job_id={job_id})")
except Exception as e:
logger.error(f"Failed to queue scan {scan.id}: {str(e)}")
# Mark scan as failed if job queuing fails
scan.status = 'failed'
scan.error_message = f"Failed to queue background job: {str(e)}"
self.db.commit()
raise
else:
logger.warning(f"Scan {scan.id} created but not queued (no scheduler provided)")
return scan.id
def get_scan(self, scan_id: int) -> Optional[Dict[str, Any]]:
@@ -230,7 +244,9 @@ class ScanService:
'scan_id': scan.id,
'status': scan.status,
'title': scan.title,
'started_at': scan.timestamp.isoformat() if scan.timestamp else None,
'timestamp': scan.timestamp.isoformat() if scan.timestamp else None,
'started_at': scan.started_at.isoformat() if scan.started_at else None,
'completed_at': scan.completed_at.isoformat() if scan.completed_at else None,
'duration': scan.duration,
'triggered_by': scan.triggered_by
}
@@ -242,6 +258,7 @@ class ScanService:
status_info['progress'] = 'Complete'
elif scan.status == 'failed':
status_info['progress'] = 'Failed'
status_info['error_message'] = scan.error_message
return status_info
@@ -265,6 +282,7 @@ class ScanService:
# Update scan record
scan.status = status
scan.duration = report.get('scan_duration')
scan.completed_at = datetime.utcnow()
# Map report data to database models
self._map_report_to_models(report, scan)