Got it — here’s a structured outline you can keep as a “memory primer” for when you want to resume this work (whether continuing the current engine or pivoting to a SaaS build). Think of it as a **checkpoint map** of what we’ve covered and what directions you flagged. --- # SneakyScope – Rules Engine Refactor & SaaS Considerations (Outline) ## 1. Current State * Engine: `app/rules/rules_engine.py` — mixed YAML + code rules, `(bool, str)` return shape expected. * Issue: function-based rules returning `(bool, dict)` → caused invalid type warnings. * Stopgap: `FunctionRuleAdapter` converted returns. * Desire: Remove YAML entirely; rules defined in code only. --- ## 2. Agreed Direction * **Rule definition approach**: * Option B chosen → decorator-based registration. * Every rule defined in `app/rules/` as Python functions. * Rules register with metadata (`name`, `category`, `severity`, etc.). * **Return shape**: * Always return a **Result dict** (no adapter needed). * Engine enforces schema and fills in defaults. * **Engine relocation**: * Move to `app/utils/rules_engine/`. * Responsibilities: load, validate, freeze registry, run rules, aggregate results, log/report. --- ## 3. Result Schema (concept) * **Per RuleResult** * Required: `ok: bool`, `message: str`. * Identity: `name`, `category`, `severity`, `tags`, `rule_version`. * Detail: `data: object|null`. * Timing: `duration_ms`. * Errors: structured `error` object if exceptions occur. * Provenance: `source_module`, optional `policy` snapshot. * **Per AnalysisResult (run-level envelope)** * Input scope: target URL, category, content hash, facts profile. * Provenance: run\_id, engine\_version, ruleset\_checksum, timestamp, duration. * Results: array of RuleResults. * Summary: counts by severity, match count, errors, first match, top severity. * Artifacts: references (screenshot, DOM snapshot, etc.). * Policy snapshot: optional central policy/overrides. --- ## 4. Operational Standards * **Determinism**: identical inputs + ruleset\_checksum → identical results. * **Message stability**: avoid wording churn; expand via `data`. * **Size limits**: `message ≤ 256 chars`; `data ≤ 8–16 KB`. * **Errors**: `ok=false` if error present; always emit `message`. * **Severity**: rule sets default; policy may override. * **Tags**: controlled vocabulary; additive. --- ## 5. Migration Plan 1. Create new `rules_engine` package in `app/utils/`. 2. Add decorator/registry for rules. 3. Port all rules from YAML → Python modules grouped by category. 4. Delete YAML loader + adapters. 5. Update call sites to build `facts` and call `engine.run(...)`. 6. Add CI tests: * Schema compliance. * No duplicates. * Ruleset checksum snapshot. 7. Integration tests with real fixtures. 8. Benchmark & harden (caps on input size, rule runtime). --- ## 6. SaaS Expansion (future) * **Multi-tenancy**: separate org/user scopes for data and rule runs. * **RBAC**: roles (admin, analyst, viewer). * **Compliance**: logging, retention, export, audit trails. * **Rules**: centrally maintained, not user-editable. * **APIs**: authenticated endpoints, per-user quotas. * **Observability**: per-tenant metrics, alerts. * **Security**: sandboxing, strict module allowlists, compliance with SOC2/ISO. * **Data controls**: PII redaction, encryption, retention policies. --- ## 7. Future-Proofing Hooks * Versioning: ruleset checksum + per-rule versions. * Extensibility: support `actions`, `links`, `evidence` in Result. * Policy: central config for thresholds/overrides. * Hot reload (optional, dev-only). * Rule provenance tracking (source\_module, commit SHA). --- ✅ This outline is enough to “re-hydrate” the context later — you won’t need to dig back into old logs to remember why `(bool, str)` didn’t fit, why YAML was removed, or what schema we were converging on. --- Do you want me to also save this in a **short “README-spec” style** (like `RESULTS.md`) so it can live in your repo as the contract doc for rules, or should I keep this as just your personal checkpoint outline?