first commit

This commit is contained in:
2025-11-24 23:10:55 -06:00
commit 8315fa51c9
279 changed files with 74600 additions and 0 deletions

580
docs/DEPLOYMENT.md Normal file
View File

@@ -0,0 +1,580 @@
# Deployment & Operations
## Local Development Setup
### Prerequisites
| Tool | Version | Purpose |
|------|---------|---------|
| Python | 3.11+ | Backend runtime |
| Docker | Latest | Local services |
| Redis | 7.0+ | Job queue & caching |
| Git | Latest | Version control |
### Setup Steps
```bash
# 1. Clone repository
git clone <repo-url>
cd code_of_conquest
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment
cp .env.example .env
# Edit .env with your API keys and settings
# 5. Start local services
docker-compose up -d
# 6. Start RQ workers
rq worker ai_tasks combat_tasks marketplace_tasks &
# 7. Run Flask development server
flask run --debug
```
### Environment Variables
| Variable | Description | Required |
|----------|-------------|----------|
| `FLASK_ENV` | development/production | Yes |
| `SECRET_KEY` | Flask secret key | Yes |
| `REPLICATE_API_KEY` | Replicate API key | Yes |
| `ANTHROPIC_API_KEY` | Anthropic API key | Yes |
| `APPWRITE_ENDPOINT` | Appwrite server URL | Yes |
| `APPWRITE_PROJECT_ID` | Appwrite project ID | Yes |
| `APPWRITE_API_KEY` | Appwrite API key | Yes |
| `REDIS_URL` | Redis connection URL | Yes |
| `LOG_LEVEL` | Logging level (DEBUG/INFO/WARNING/ERROR) | No |
---
## Docker Compose (Local Development)
**docker-compose.yml:**
```yaml
version: '3.8'
services:
redis:
image: redis:alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
rq-worker:
build: .
command: rq worker ai_tasks combat_tasks marketplace_tasks --url redis://redis:6379
depends_on:
- redis
env_file:
- .env
environment:
- REDIS_URL=redis://redis:6379
volumes:
redis_data:
```
---
## Testing Strategy
### Manual Testing (Preferred)
**API Testing Document:** `docs/API_TESTING.md`
Contains:
- Endpoint examples
- Sample curl/httpie commands
- Expected responses
- Authentication setup
**Example API Test:**
```bash
# Login
curl -X POST http://localhost:5000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com", "password": "password123"}'
# Create character (with auth token)
curl -X POST http://localhost:5000/api/v1/characters \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"name": "Aragorn", "class_id": "vanguard"}'
```
### Unit Tests (Optional)
**Framework:** pytest
**Test Categories:**
| Category | Location | Focus |
|----------|----------|-------|
| Combat | `tests/test_combat.py` | Damage calculations, effect processing |
| Skills | `tests/test_skills.py` | Skill unlock logic, prerequisites |
| Marketplace | `tests/test_marketplace.py` | Bidding logic, auction processing |
| Character | `tests/test_character.py` | Character creation, stats |
**Run Tests:**
```bash
# All tests
pytest
# Specific test file
pytest tests/test_combat.py
# With coverage
pytest --cov=app tests/
```
### Load Testing
**Tool:** Locust or Apache Bench
**Test Scenarios:**
| Scenario | Target | Success Criteria |
|----------|--------|------------------|
| Concurrent AI requests | 50 concurrent users | < 5s response time |
| Marketplace browsing | 100 concurrent users | < 1s response time |
| Session realtime updates | 10 players per session | < 100ms update latency |
---
## Production Deployment
### Deployment Checklist
**Pre-Deployment:**
- [ ] All environment variables configured
- [ ] Appwrite collections created with proper permissions
- [ ] Redis configured and accessible
- [ ] RQ workers running
- [ ] SSL certificates installed
- [ ] Rate limiting configured
- [ ] Error logging/monitoring set up (Sentry recommended)
- [ ] Backup strategy for Appwrite data
**Production Configuration:**
- [ ] `DEBUG = False` in Flask
- [ ] Secure session keys (random, long)
- [ ] CORS restricted to production domain
- [ ] Rate limits appropriate for production
- [ ] AI cost alerts configured
- [ ] CDN for static assets (optional)
### Dockerfile
```dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Create non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
# Expose port
EXPOSE 5000
# Run application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"]
```
### Build & Push Script
**scripts/build_and_push.sh:**
```bash
#!/bin/bash
# Get current git branch
BRANCH=$(git rev-parse --abbrev-ref HEAD)
# Ask for tag options
read -p "Tag as :latest? (y/n) " TAG_LATEST
read -p "Push to registry? (y/n) " PUSH_IMAGE
# Build image
docker build -t ai-dungeon-master:$BRANCH .
if [ "$TAG_LATEST" = "y" ]; then
docker tag ai-dungeon-master:$BRANCH ai-dungeon-master:latest
fi
if [ "$PUSH_IMAGE" = "y" ]; then
docker push ai-dungeon-master:$BRANCH
if [ "$TAG_LATEST" = "y" ]; then
docker push ai-dungeon-master:latest
fi
fi
```
### Production Environment
**Recommended Stack:**
- **Web Server:** Nginx (reverse proxy)
- **WSGI Server:** Gunicorn (4+ workers)
- **Process Manager:** Supervisor or systemd
- **Redis:** Standalone or Redis Cluster
- **RQ Workers:** Separate instances for each queue
**Scaling Strategy:**
| Component | Scaling Method | Trigger |
|-----------|----------------|---------|
| Flask API | Horizontal (add workers) | CPU > 70% |
| RQ Workers | Horizontal (add workers) | Queue length > 100 |
| Redis | Vertical (upgrade instance) | Memory > 80% |
| Appwrite | Managed by Appwrite | N/A |
---
## Monitoring & Logging
### Application Logging
**Logging Configuration:**
| Level | Use Case | Examples |
|-------|----------|----------|
| DEBUG | Development only | Variable values, function calls |
| INFO | Normal operations | User actions, API calls |
| WARNING | Potential issues | Rate limit approaching, slow queries |
| ERROR | Errors (recoverable) | Failed AI calls, validation errors |
| CRITICAL | Critical failures | Database connection lost, service down |
**Structured Logging with Structlog:**
```python
import structlog
logger = structlog.get_logger(__name__)
logger.info("Combat action executed",
session_id=session_id,
character_id=character_id,
action_type="attack",
damage=15
)
```
### Monitoring Tools
**Recommended Tools:**
| Tool | Purpose | Priority |
|------|---------|----------|
| **Sentry** | Error tracking and alerting | High |
| **Prometheus** | Metrics collection | Medium |
| **Grafana** | Metrics visualization | Medium |
| **Uptime Robot** | Uptime monitoring | High |
| **CloudWatch** | AWS logs/metrics (if using AWS) | Medium |
### Key Metrics to Monitor
| Metric | Alert Threshold | Action |
|--------|----------------|--------|
| API response time | > 3s average | Scale workers |
| Error rate | > 5% | Investigate logs |
| AI API errors | > 10% | Check API status |
| Queue length | > 500 | Add workers |
| Redis memory | > 80% | Upgrade instance |
| CPU usage | > 80% | Scale horizontally |
| AI cost per day | > budget × 1.2 | Investigate usage |
### AI Cost Tracking
**Log Structure:**
| Field | Type | Purpose |
|-------|------|---------|
| `user_id` | str | Track per-user usage |
| `model` | str | Which model used |
| `tier` | str | FREE/STANDARD/PREMIUM |
| `tokens_used` | int | Token count |
| `cost_estimate` | float | Estimated cost |
| `timestamp` | datetime | When called |
| `context_type` | str | What prompted the call |
**Daily Report:**
- Total AI calls per tier
- Total tokens used
- Estimated cost
- Top users by usage
- Anomaly detection (unusual spikes)
---
## Security
### Authentication & Authorization
**Implementation:**
| Layer | Method | Details |
|-------|--------|---------|
| **User Auth** | Appwrite Auth | Email/password, OAuth providers |
| **API Auth** | JWT tokens | Bearer token in Authorization header |
| **Session Validation** | Every API call | Verify token, check expiry |
| **Resource Access** | User ID check | Users can only access their own data |
### Input Validation
**Validation Strategy:**
| Input Type | Validation | Tools |
|------------|------------|-------|
| JSON payloads | Schema validation | Marshmallow or Pydantic |
| Character names | Sanitize, length limits | Bleach library |
| Chat messages | Sanitize, profanity filter | Custom validators |
| AI prompts | Template-based only | Jinja2 (no direct user input) |
**Example Validation:**
| Field | Rules |
|-------|-------|
| Character name | 3-20 chars, alphanumeric + spaces only |
| Gold amount | Positive integer, max 999,999,999 |
| Action text | Max 500 chars, sanitized HTML |
### Rate Limiting
**Implementation:** Flask-Limiter with Redis backend
**Limits by Tier:**
| Tier | API Calls/Min | AI Calls/Day | Marketplace Actions/Day |
|------|---------------|--------------|------------------------|
| FREE | 30 | 50 | N/A |
| BASIC | 60 | 200 | N/A |
| PREMIUM | 120 | 1000 | 50 |
| ELITE | 300 | Unlimited | 100 |
**Rate Limit Bypass:**
- Admin accounts
- Health check endpoints
- Static assets
### API Security
**Configuration:**
| Setting | Value | Reason |
|---------|-------|--------|
| **CORS** | Production domain only | Prevent unauthorized access |
| **HTTPS** | Required | Encrypt data in transit |
| **API Keys** | Environment variables | Never in code |
| **Appwrite Permissions** | Least privilege | Collection-level security |
| **SQL Injection** | N/A | Using Appwrite (NoSQL) |
| **XSS** | Sanitize all inputs | Prevent script injection |
| **CSRF** | CSRF tokens | For form submissions |
### Data Protection
**Access Control Matrix:**
| Resource | Owner | Party Member | Public | System |
|----------|-------|--------------|--------|--------|
| Characters | RW | R | - | RW |
| Sessions | R | RW (turn) | - | RW |
| Marketplace Listings | RW (own) | - | R | RW |
| Transactions | R (own) | - | - | RW |
**RW = Read/Write, R = Read only, - = No access**
### Secrets Management
**Never Commit:**
- API keys
- Database credentials
- Secret keys
- Tokens
**Best Practices:**
- Use `.env` for local development
- Use environment variables in production
- Use secrets manager (AWS Secrets Manager, HashiCorp Vault) in production
- Rotate keys regularly
- Different keys for dev/staging/prod
---
## Backup & Recovery
### Appwrite Data Backup
**Strategy:**
| Data Type | Backup Frequency | Retention | Method |
|-----------|------------------|-----------|--------|
| Characters | Daily | 30 days | Appwrite export |
| Sessions (active) | Hourly | 7 days | Appwrite export |
| Marketplace | Daily | 30 days | Appwrite export |
| Transactions | Daily | 90 days | Appwrite export |
**Backup Script:**
- Export collections to JSON
- Compress and encrypt
- Upload to S3 or object storage
- Verify backup integrity
### Disaster Recovery Plan
| Scenario | RTO | RPO | Steps |
|----------|-----|-----|-------|
| **Database corruption** | 4 hours | 24 hours | Restore from latest backup |
| **API server down** | 15 minutes | 0 | Restart/failover to standby |
| **Redis failure** | 5 minutes | Session data loss | Restart, users re-login |
| **Complete infrastructure loss** | 24 hours | 24 hours | Restore from backups to new infrastructure |
**RTO = Recovery Time Objective, RPO = Recovery Point Objective**
---
## CI/CD Pipeline
### Recommended Workflow
| Stage | Actions | Tools |
|-------|---------|-------|
| **1. Commit** | Developer pushes to `dev` branch | Git |
| **2. Build** | Run tests, lint code | GitHub Actions, pytest, flake8 |
| **3. Test** | Unit tests, integration tests | pytest |
| **4. Build Image** | Create Docker image | Docker |
| **5. Deploy to Staging** | Deploy to staging environment | Docker, SSH |
| **6. Manual Test** | QA testing on staging | Manual |
| **7. Merge to Beta** | Promote to beta branch | Git |
| **8. Deploy to Beta** | Deploy to beta environment | Docker, SSH |
| **9. Merge to Master** | Production promotion | Git |
| **10. Deploy to Prod** | Deploy to production | Docker, SSH |
| **11. Tag Release** | Create version tag | Git |
### GitHub Actions Example
```yaml
name: CI/CD
on:
push:
branches: [ dev, beta, master ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest
- name: Lint
run: flake8 app/
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build Docker image
run: docker build -t ai-dungeon-master:${{ github.ref_name }} .
- name: Push to registry
run: docker push ai-dungeon-master:${{ github.ref_name }}
```
---
## Performance Optimization
### Caching Strategy
| Cache Type | What to Cache | TTL |
|------------|---------------|-----|
| **Redis Cache** | Session data | 30 minutes |
| | Character data (read-heavy) | 5 minutes |
| | Marketplace listings | 1 minute |
| | NPC shop items | 1 hour |
| **Browser Cache** | Static assets | 1 year |
| | API responses (GET) | 30 seconds |
### Database Optimization
**Appwrite Indexing:**
- Index `userId` on characters collection
- Index `status` on game_sessions collection
- Index `listing_type` + `status` on marketplace_listings
- Index `created_at` for time-based queries
### AI Call Optimization
**Strategies:**
| Strategy | Impact | Implementation |
|----------|--------|----------------|
| **Batch requests** | Reduce API calls | Combine multiple actions |
| **Cache common responses** | Reduce cost | Cache item descriptions |
| **Prompt optimization** | Reduce tokens | Shorter, more efficient prompts |
| **Model selection** | Reduce cost | Use cheaper models when appropriate |
---
## Troubleshooting
### Common Issues
| Issue | Symptoms | Solution |
|-------|----------|----------|
| **RQ workers not processing** | Jobs stuck in queue | Check Redis connection, restart workers |
| **AI calls failing** | 401/403 errors | Verify API keys, check rate limits |
| **Appwrite connection errors** | Database errors | Check Appwrite status, verify credentials |
| **Session not updating** | Stale data in UI | Check Appwrite Realtime connection |
| **High latency** | Slow API responses | Check RQ queue length, scale workers |
### Debug Mode
**Enable Debug Logging:**
```bash
export LOG_LEVEL=DEBUG
flask run --debug
```
**Debug Endpoints (development only):**
- `GET /debug/health` - Health check
- `GET /debug/redis` - Redis connection status
- `GET /debug/queues` - RQ queue status
---
## Resources
| Resource | URL |
|----------|-----|
| **Appwrite Docs** | https://appwrite.io/docs |
| **RQ Docs** | https://python-rq.org/ |
| **Flask Docs** | https://flask.palletsprojects.com/ |
| **Structlog Docs** | https://www.structlog.org/ |
| **HTMX Docs** | https://htmx.org/docs/ |
| **Anthropic API** | https://docs.anthropic.com/ |
| **Replicate API** | https://replicate.com/docs |