Files
mass-scan2/punchlist.md
2025-10-21 23:05:43 -05:00

7.5 KiB
Raw Blame History

Phase 0 Baseline + Decisions (Quick)

Goals

  • Confirm well keep one Jinja template with palette includes.
  • Choose PDF engine flags and file layout.

Decisions

  • PDF engine: wkhtmltopdf via subprocess (wrapper optional). Decide if we want pdfkit
  • Page size: Letter (or A4 if you prefer).
  • Output layout: /data/output/<timestamp>/<config_slug>/report.pdf.
  • ZIP per run: /data/output/reports_<timestamp>.zip.

Deliverables

  • Short DECISIONS.md capturing above.
  • “Current state” run that still only outputs HTML (for baseline timings).

Acceptance

  • We can state in one sentence where PDFs will land and how theyre named.

Phase 1 HTML → PDF Renderer (Additive)

Goals

  • Convert your already-rendered HTML into a PDF file per scan config.

Tasks

  • Create report_renderer.py with:

    • render_html(context) -> str (calls existing Jinja template)
    • html_to_pdf(html, out_path, *, page="Letter", margin_mm=12, dpi=96) -> Path
  • Ensure local assets (e.g., logo) resolve without network.

  • Add run_repo_scan() call: after HTML render, write PDF.

Deliverables

  • One PDF per config alongside existing HTML.

Acceptance

  • PDFs open cleanly (no missing fonts/boxes).
  • Render time acceptable (<12s per average report on your host).

Phase 2 Bundle + Manifest

Goals

  • Zip all PDFs for a run and write an audit manifest.

Tasks

  • Create report_bundler.py with:

    • collect_pdfs(run_dir) -> List[Path]
    • write_manifest(run_info, per_config_stats, out_path) -> Path
    • make_zip(files, out_zip) -> Path
  • Manifest fields (per file and rollup):

    • run_id, generated_at, timezone
    • Per config: name, config_slug, total_hosts, ok_hosts, hosts_with_issues, pdf_path, pdf_sha256, duration_ms
    • Bundle: zip_path, zip_size_bytes, pdf_count
  • Add SHA-256 hashing utility.

Deliverables

  • reports_<timestamp>.zip
  • manifest_<timestamp>.json

Acceptance

  • ZIP opens; manifest accurately lists all PDFs with correct hashes.

Phase 3 Email Output

Goals

  • Email either the ZIP (preferred) or top-N PDFs when size overs threshold.

Tasks

  • Create report_emailer.py:

    • send_reports(summary, files, to, cc, subject, size_limit_mb)
    • Compose a concise plain-text body (global counts + path to ZIP).
    • Size check: if ZIP > limit → attach top N “issue” PDFs, mention storage path for ZIP.
  • Configure SMTP from environment (no secrets in code).

Deliverables

  • Successful email with attachments on a sample run.

Acceptance

  • Emails received by email_to with intended attachments.
  • Logs include message id / delivery status (if available).

Phase 4 Unify Templates (Light/Dark via palette)

Goals

  • One structural template; palette includes for dark/light.

Tasks

  • Introduce _palette_dark.j2 / _palette_light.j2.
  • Replace hardcoded colors with palette tokens.
  • Keep inline styles for email/PDF reliability.
  • Ensure reporting.dark_mode drives palette include.

Deliverables

  • Single report.html.j2 + two small palette partials.

Acceptance

  • Dark/Light flips correctly; visual parity with previous separate templates.

Phase 5 Config Flags & CLI ergonomics

Goals

  • Make format and emailing behavior configurable per run.

Tasks

  • New config/env flags:

    • OUTPUT_HTML=true|false
    • OUTPUT_PDF=true|false
    • BUNDLE_ZIP=true|false
    • EMAIL_ENABLED=true|false
    • EMAIL_SIZE_LIMIT_MB=15
  • Optional CLI args override envs for ad-hoc runs.

Deliverables

  • Documented flags in README or DECISIONS.md.

Acceptance

  • Toggling flags changes outputs without code edits.

Phase 6 Logging Overhaul (Structured, Useful)

Goals

  • Replace scattered prints with consistent, structured logs you can search.

Tasks

  • Adopt one logger (e.g., structlog or logging with JSON formatter).

  • Standard fields on every log line:

    • run_id, config_name, config_slug, phase, duration_ms
  • Log events:

    • Start/end per config, counts (ok/issues), output paths, sizes, hashes.
    • PDF render timings and failures (with exc_info=True).
    • Bundling: zip path/size; Email: recipients, attachment count, result.
  • Set sane defaults:

    • Level INFO in prod, DEBUG when DEBUG=1.
    • Rotate or daily-split logs; permissions via umask 0027.

Deliverables

  • Consistent logs; sample snippet documented in LOGGING.md.

Acceptance

  • You can answer “what happened to corp-wan at 03:05?” from logs alone.

Phase 7 Docker & Dependencies

Goals

  • Bake the toolchain into the image cleanly.

Tasks

  • apt-get install -y wkhtmltopdf fonts-dejavu-core
  • Keep pip deps minimal (Jinja2 already present).
  • Health check: small mock HTML → PDF during build or startup test (optional).

Deliverables

  • Rebuilt image; size acceptable.

Acceptance

  • Container can render PDFs in a clean runtime (no missing libraries).

Phase 8 Validation & Load Testing

Goals

  • Confidence under multiple configs and larger reports.

Tasks

  • Run with N configs (e.g., 1020) and capture:

    • Total runtime, average PDF time, largest PDF size.
  • Edge tests:

    • No targets → still produce a PDF with “No hosts”.
    • All OK vs many Issues.
    • Long hostnames and port lists (layout wrapping).
  • Verify email size behavior.

Deliverables

  • Short QA_NOTES.md with timings and observations.

Acceptance

  • No crashes; PDFs legible; email strategy holds at size thresholds.

Phase 9 Docs & Ops

Goals

  • Make future you (or anyone) productive in 5 minutes.

Tasks

  • README updates: flow diagram, flags, outputs.
  • DECISIONS.md finalized.
  • Sample manifest + how to verify checksums.
  • Runbook snippet: “How to re-send last runs ZIP.”

Deliverables

  • Docs committed and discoverable.

Acceptance

  • You can hand this to a teammate and theyll ship a report same day.

Phase 10 Nice-to-Haves / Backlog

  • “Issues-only” PDF variant (short brief for exec/IT).
  • Per-config S3 upload (instead of email attachments).
  • HTML preview endpoint for a single report (if you add a tiny web UI).
  • Retention policy job: purge artifacts >30 days.
  • Watermark/status banner for partial/incomplete scans.
  • Per-run metrics export (Prometheus counters/gauges).

Global Punch List (copy/paste into your tracker)

  • Phase 0: Lock decisions (page size, paths, file naming)
  • Phase 1: Add renderer module and wire PDF creation in run_repo_scan()
  • Phase 2: Add bundler + manifest (with SHA-256)
  • Phase 3: Add email sender with size-aware attachment strategy
  • Phase 4: Merge templates; add palette partials; test light/dark
  • Phase 5: Add env/CLI flags for outputs & email
  • Phase 6: Overhaul logging (structured, consistent fields, timings)
  • Phase 7: Update Dockerfile (wkhtmltopdf, fonts), rebuild, smoke test
  • Phase 8: Load test with multiple configs; capture timings; fix edge cases
  • Phase 9: Update README/DECISIONS/LOGGING; add runbook snippet
  • Phase 10: (Optional) Implement backlog items

Quick win(s) to start next session

  • Pick Letter vs A4 and the exact output path format.
  • Decide whether to keep writing HTML to disk (handy for debugging) or only in memory.
  • Choose structlog vs stdlib logging + JSON (I can tailor either).

When youre ready, we can start with Phase 1, and Ill draft the exact renderer function signatures and logging fields so it slots right into your style.