Files
mass-scan2/punchlist.md
2025-10-21 23:05:43 -05:00

298 lines
7.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 Baseline + Decisions (Quick)
**Goals**
* Confirm well keep one Jinja template with palette includes.
* Choose PDF engine flags and file layout.
**Decisions**
* PDF engine: `wkhtmltopdf` via `subprocess` (wrapper optional). Decide if we want pdfkit
* Page size: **Letter** (or A4 if you prefer).
* Output layout: `/data/output/<timestamp>/<config_slug>/report.pdf`.
* ZIP per run: `/data/output/reports_<timestamp>.zip`.
**Deliverables**
* Short `DECISIONS.md` capturing above.
* “Current state” run that still only outputs HTML (for baseline timings).
**Acceptance**
* We can state in one sentence where PDFs will land and how theyre named.
---
# Phase 1 HTML → PDF Renderer (Additive)
**Goals**
* Convert your already-rendered HTML into a PDF file per scan config.
**Tasks**
* Create `report_renderer.py` with:
* `render_html(context) -> str` (calls existing Jinja template)
* `html_to_pdf(html, out_path, *, page="Letter", margin_mm=12, dpi=96) -> Path`
* Ensure local assets (e.g., logo) resolve without network.
* Add `run_repo_scan()` call: after HTML render, write PDF.
**Deliverables**
* One PDF per config alongside existing HTML.
**Acceptance**
* PDFs open cleanly (no missing fonts/boxes).
* Render time acceptable (<12s per average report on your host).
---
# Phase 2 Bundle + Manifest
**Goals**
* Zip all PDFs for a run and write an audit manifest.
**Tasks**
* Create `report_bundler.py` with:
* `collect_pdfs(run_dir) -> List[Path]`
* `write_manifest(run_info, per_config_stats, out_path) -> Path`
* `make_zip(files, out_zip) -> Path`
* Manifest fields (per file and rollup):
* `run_id, generated_at, timezone`
* Per config: `name, config_slug, total_hosts, ok_hosts, hosts_with_issues, pdf_path, pdf_sha256, duration_ms`
* Bundle: `zip_path, zip_size_bytes, pdf_count`
* Add SHA-256 hashing utility.
**Deliverables**
* `reports_<timestamp>.zip`
* `manifest_<timestamp>.json`
**Acceptance**
* ZIP opens; manifest accurately lists all PDFs with correct hashes.
---
# Phase 3 Email Output
**Goals**
* Email either the ZIP (preferred) or top-N PDFs when size overs threshold.
**Tasks**
* Create `report_emailer.py`:
* `send_reports(summary, files, to, cc, subject, size_limit_mb)`
* Compose a concise plain-text body (global counts + path to ZIP).
* Size check: if ZIP > limit → attach top N “issue” PDFs, mention storage path for ZIP.
* Configure SMTP from environment (no secrets in code).
**Deliverables**
* Successful email with attachments on a sample run.
**Acceptance**
* Emails received by `email_to` with intended attachments.
* Logs include message id / delivery status (if available).
---
# Phase 4 Unify Templates (Light/Dark via palette)
**Goals**
* One structural template; palette includes for dark/light.
**Tasks**
* Introduce `_palette_dark.j2` / `_palette_light.j2`.
* Replace hardcoded colors with palette tokens.
* Keep **inline styles** for email/PDF reliability.
* Ensure `reporting.dark_mode` drives palette include.
**Deliverables**
* Single `report.html.j2` + two small palette partials.
**Acceptance**
* Dark/Light flips correctly; visual parity with previous separate templates.
---
# Phase 5 Config Flags & CLI ergonomics
**Goals**
* Make format and emailing behavior configurable per run.
**Tasks**
* New config/env flags:
* `OUTPUT_HTML=true|false`
* `OUTPUT_PDF=true|false`
* `BUNDLE_ZIP=true|false`
* `EMAIL_ENABLED=true|false`
* `EMAIL_SIZE_LIMIT_MB=15`
* Optional CLI args override envs for ad-hoc runs.
**Deliverables**
* Documented flags in `README` or `DECISIONS.md`.
**Acceptance**
* Toggling flags changes outputs without code edits.
---
# Phase 6 Logging Overhaul (Structured, Useful)
**Goals**
* Replace scattered prints with consistent, structured logs you can search.
**Tasks**
* Adopt one logger (e.g., `structlog` or `logging` with JSON formatter).
* Standard fields on every log line:
* `run_id`, `config_name`, `config_slug`, `phase`, `duration_ms`
* Log events:
* Start/end per config, counts (ok/issues), output paths, sizes, hashes.
* PDF render timings and failures (with `exc_info=True`).
* Bundling: zip path/size; Email: recipients, attachment count, result.
* Set sane defaults:
* Level INFO in prod, DEBUG when `DEBUG=1`.
* Rotate or daily-split logs; permissions via umask `0027`.
**Deliverables**
* Consistent logs; sample snippet documented in `LOGGING.md`.
**Acceptance**
* You can answer “what happened to corp-wan at 03:05?” from logs alone.
---
# Phase 7 Docker & Dependencies
**Goals**
* Bake the toolchain into the image cleanly.
**Tasks**
* `apt-get install -y wkhtmltopdf fonts-dejavu-core`
* Keep pip deps minimal (Jinja2 already present).
* Health check: small mock HTML → PDF during build or startup test (optional).
**Deliverables**
* Rebuilt image; size acceptable.
**Acceptance**
* Container can render PDFs in a clean runtime (no missing libraries).
---
# Phase 8 Validation & Load Testing
**Goals**
* Confidence under multiple configs and larger reports.
**Tasks**
* Run with N configs (e.g., 1020) and capture:
* Total runtime, average PDF time, largest PDF size.
* Edge tests:
* No targets → still produce a PDF with “No hosts”.
* All OK vs many Issues.
* Long hostnames and port lists (layout wrapping).
* Verify email size behavior.
**Deliverables**
* Short `QA_NOTES.md` with timings and observations.
**Acceptance**
* No crashes; PDFs legible; email strategy holds at size thresholds.
---
# Phase 9 Docs & Ops
**Goals**
* Make future you (or anyone) productive in 5 minutes.
**Tasks**
* `README` updates: flow diagram, flags, outputs.
* `DECISIONS.md` finalized.
* Sample manifest + how to verify checksums.
* Runbook snippet: “How to re-send last runs ZIP.”
**Deliverables**
* Docs committed and discoverable.
**Acceptance**
* You can hand this to a teammate and theyll ship a report same day.
---
# Phase 10 Nice-to-Haves / Backlog
* “Issues-only” PDF variant (short brief for exec/IT).
* Per-config S3 upload (instead of email attachments).
* HTML preview endpoint for a single report (if you add a tiny web UI).
* Retention policy job: purge artifacts >30 days.
* Watermark/status banner for partial/incomplete scans.
* Per-run metrics export (Prometheus counters/gauges).
---
## Global Punch List (copy/paste into your tracker)
* [ ] Phase 0: Lock decisions (page size, paths, file naming)
* [ ] Phase 1: Add renderer module and wire PDF creation in `run_repo_scan()`
* [ ] Phase 2: Add bundler + manifest (with SHA-256)
* [ ] Phase 3: Add email sender with size-aware attachment strategy
* [ ] Phase 4: Merge templates; add palette partials; test light/dark
* [ ] Phase 5: Add env/CLI flags for outputs & email
* [ ] Phase 6: Overhaul logging (structured, consistent fields, timings)
* [ ] Phase 7: Update Dockerfile (wkhtmltopdf, fonts), rebuild, smoke test
* [ ] Phase 8: Load test with multiple configs; capture timings; fix edge cases
* [ ] Phase 9: Update README/DECISIONS/LOGGING; add runbook snippet
* [ ] Phase 10: (Optional) Implement backlog items
---
## Quick win(s) to start next session
* Pick **Letter vs A4** and the exact output path format.
* Decide whether to **keep writing HTML to disk** (handy for debugging) or only in memory.
* Choose **structlog vs stdlib logging + JSON** (I can tailor either).
When youre ready, we can start with Phase 1, and Ill draft the exact renderer function signatures and logging fields so it slots right into your style.