Implemented APScheduler integration for background scan execution,
enabling async job processing without blocking HTTP requests.
## Changes
### Background Jobs (web/jobs/)
- scan_job.py - Execute scans in background threads
- execute_scan() with isolated database sessions
- Comprehensive error handling and logging
- Scan status lifecycle tracking
- Timing and error message storage
### Scheduler Service (web/services/scheduler_service.py)
- SchedulerService class for job management
- APScheduler BackgroundScheduler integration
- ThreadPoolExecutor for concurrent jobs (max 3 workers)
- queue_scan() - Immediate job execution
- Job monitoring: list_jobs(), get_job_status()
- Graceful shutdown handling
### Flask Integration (web/app.py)
- init_scheduler() function
- Scheduler initialization in app factory
- Stored scheduler in app context (app.scheduler)
### Database Schema (migration 003)
- Added scan timing fields:
- started_at - Scan execution start time
- completed_at - Scan execution completion time
- error_message - Error details for failed scans
### Service Layer Updates (web/services/scan_service.py)
- trigger_scan() accepts scheduler parameter
- Queues background jobs after creating scan record
- get_scan_status() includes new timing and error fields
- _save_scan_to_db() sets completed_at timestamp
### API Updates (web/api/scans.py)
- POST /api/scans passes scheduler to trigger_scan()
- Scans now execute in background automatically
### Model Updates (web/models.py)
- Added started_at, completed_at, error_message to Scan model
### Testing (tests/test_background_jobs.py)
- 13 unit tests for background job execution
- Scheduler initialization and configuration tests
- Job queuing and status tracking tests
- Scan timing field tests
- Error handling and storage tests
- Integration test for full workflow (skipped by default)
## Features
- Async scan execution without blocking HTTP requests
- Concurrent scan support (configurable max workers)
- Isolated database sessions per background thread
- Scan lifecycle tracking: created → running → completed/failed
- Error messages captured and stored in database
- Job monitoring and management capabilities
- Graceful shutdown waits for running jobs
## Implementation Notes
- Scanner runs in subprocess from background thread
- Docker provides necessary privileges (--privileged, --network host)
- Each job gets isolated SQLAlchemy session (avoid locking)
- Job IDs follow pattern: scan_{scan_id}
- Background jobs survive across requests
- Failed jobs store error messages in database
## Documentation (docs/ai/PHASE2.md)
- Updated progress: 6/14 days complete (43%)
- Marked Step 3 as complete
- Added detailed implementation notes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
40 lines
1.2 KiB
Python
40 lines
1.2 KiB
Python
"""Add timing and error fields to scans table
|
|
|
|
Revision ID: 003
|
|
Revises: 002
|
|
Create Date: 2025-11-14
|
|
|
|
"""
|
|
from alembic import op
|
|
import sqlalchemy as sa
|
|
|
|
|
|
# revision identifiers, used by Alembic
|
|
revision = '003'
|
|
down_revision = '002'
|
|
branch_labels = None
|
|
depends_on = None
|
|
|
|
|
|
def upgrade():
|
|
"""
|
|
Add fields for tracking scan execution timing and errors.
|
|
|
|
New fields:
|
|
- started_at: When scan execution actually started
|
|
- completed_at: When scan execution finished (success or failure)
|
|
- error_message: Error message if scan failed
|
|
"""
|
|
with op.batch_alter_table('scans') as batch_op:
|
|
batch_op.add_column(sa.Column('started_at', sa.DateTime(), nullable=True, comment='Scan execution start time'))
|
|
batch_op.add_column(sa.Column('completed_at', sa.DateTime(), nullable=True, comment='Scan execution completion time'))
|
|
batch_op.add_column(sa.Column('error_message', sa.Text(), nullable=True, comment='Error message if scan failed'))
|
|
|
|
|
|
def downgrade():
|
|
"""Remove the timing and error fields."""
|
|
with op.batch_alter_table('scans') as batch_op:
|
|
batch_op.drop_column('error_message')
|
|
batch_op.drop_column('completed_at')
|
|
batch_op.drop_column('started_at')
|