LightRAG/docs/archives/adr/007-deployment-guide-quick-reference.md

# ADR 007: Deployment Guide and Quick Reference

## Status: Proposed

## Summary of Multi-Tenant Architecture

### Core Components

| Component | Purpose | Responsibility |
|-----------|---------|-----------------|
| **Tenant** | Top-level isolation boundary | Grouping of knowledge bases |
| **Knowledge Base** | Domain-specific RAG system | Contains documents, entities, relationships |
| **TenantContext** | Request-scoped isolation | Passed through entire call stack |
| **RAGManager** | Instance caching | Creates/caches LightRAG per tenant/KB |
| **Storage Layer Filters** | Defense in depth | All queries scoped to tenant/KB |

### Key Design Decisions

```
┌──────────────────────────────────────┐
│   Composite Isolation Strategy       │
├──────────────────────────────────────┤
│ Tenant ID (UUID)                     │
│ └─ Knowledge Base ID (UUID)          │
│    └─ Composite Key: t:k:entity_id   │
│       └─ Storage filters all queries  │
└──────────────────────────────────────┘
```

### Files Modified/Created

**New Files (11 total)**:
1. `lightrag/models/tenant.py` - Tenant/KB models
2. `lightrag/services/tenant_service.py` - Tenant management
3. `lightrag/tenant_rag_manager.py` - Instance caching
4. `lightrag/api/dependencies.py` - DI for tenant context
5. `lightrag/api/models/requests.py` - API request models
6. `lightrag/api/routers/tenant_routes.py` - Tenant endpoints
7. `tests/test_tenant_isolation.py` - Unit tests
8. `tests/test_api_tenant_routes.py` - Integration tests
9. `scripts/migrate_workspace_to_tenant.py` - Migration script
10. `lightrag/kg/migrations/001_add_tenant_schema.sql` - DB schema
11. `lightrag/kg/migrations/mongo_001_add_tenant_collections.py` - MongoDB schema

**Modified Files (7 total)**:
1. `lightrag/base.py` - Add tenant/kb to StorageNameSpace
2. `lightrag/lightrag.py` - Add tenant context to query/insert
3. `lightrag/kg/postgres_impl.py` - Add tenant filtering to all queries
4. `lightrag/kg/json_kv_impl.py` - Add tenant/kb directories
5. `lightrag/api/lightrag_server.py` - Register new routes
6. `lightrag/api/auth.py` - Tenant-aware JWT validation
7. `lightrag/api/config.py` - Add tenant configuration

## Quick Start for Developers

### 1. Setting Up Development Environment

```bash
# Install dependencies
pip install -r requirements.txt

# Set up PostgreSQL for tenant metadata
docker run -d --name lightrag-postgres \
  -e POSTGRES_PASSWORD=password \
  -p 5432:5432 \
  postgres:15

# Run migrations
psql postgresql://postgres:password@localhost:5432/postgres < \
  lightrag/kg/migrations/001_add_tenant_schema.sql

# Set environment variables
export LIGHTRAG_KV_STORAGE=PGKVStorage
export TENANT_DB_HOST=localhost
export TENANT_DB_USER=postgres
export TENANT_DB_PASSWORD=password
```

### 2. Testing Locally

```bash
# Run unit tests
pytest tests/test_tenant_isolation.py -v

# Run integration tests
pytest tests/test_api_tenant_routes.py -v

# Run with coverage
pytest --cov=lightrag tests/ --cov-report=html

# Test tenant isolation (should fail if not working)
pytest tests/test_tenant_isolation.py::TestTenantIsolation::test_cross_tenant_data_isolation -v
```

### 3. Manual Testing via cURL

```bash
# 1. Create tenant (admin)
ADMIN_TOKEN="eyJhbGc..."  # From auth system
curl -X POST http://localhost:9621/api/v1/tenants \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"tenant_name": "Test Tenant"}'

# Response:
# {
#   "status": "success",
#   "data": {
#     "tenant_id": "550e8400-e29b-41d4-a716-446655440000",
#     "tenant_name": "Test Tenant",
#     "is_active": true,
#     "created_at": "2025-11-20T10:00:00Z"
#   }
# }

TENANT_ID="550e8400-e29b-41d4-a716-446655440000"

# 2. Create knowledge base
curl -X POST http://localhost:9621/api/v1/tenants/$TENANT_ID/knowledge-bases \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"kb_name": "Test KB"}'

KB_ID="660e8400-e29b-41d4-a716-446655440000"

# 3. Create API key for tenant
curl -X POST http://localhost:9621/api/v1/tenants/$TENANT_ID/api-keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "key_name": "test-key",
    "knowledge_base_ids": ["'$KB_ID'"],
    "permissions": ["query:run", "document:read"]
  }'

# Response includes: {"key": "sk-..."}
API_KEY="sk-..."

# 4. Add document with API key
curl -X POST http://localhost:9621/api/v1/tenants/$TENANT_ID/knowledge-bases/$KB_ID/documents/add \
  -H "X-API-Key: $API_KEY" \
  -F "file=@test_document.pdf"

# 5. Query knowledge base
curl -X POST http://localhost:9621/api/v1/tenants/$TENANT_ID/knowledge-bases/$KB_ID/query \
  -H "X-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is this document about?",
    "mode": "mix",
    "top_k": 10
  }'

# 6. Verify cross-tenant isolation (should fail)
TENANT_B_ID="770e8400-e29b-41d4-a716-446655440001"
curl -X GET http://localhost:9621/api/v1/tenants/$TENANT_B_ID \
  -H "X-API-Key: $API_KEY"

# Response: 403 Forbidden (API key only for Tenant A)
```

## Backward Compatibility

### Migrating from Workspace to Tenant

```bash
# 1. Backup existing data
cp -r ./rag_storage ./rag_storage.backup

# 2. Run migration script
python scripts/migrate_workspace_to_tenant.py \
  --working-dir ./rag_storage

# 3. Verify migration
python -c "
from lightrag.services.tenant_service import TenantService
import asyncio

async def verify():
    service = TenantService(...)
    tenants = await service.list_all_tenants()
    for t in tenants:
        print(f'Tenant: {t.tenant_id} ({t.tenant_name})')
        kbs = await service.list_knowledge_bases(t.tenant_id)
        for kb in kbs:
            print(f'  KB: {kb.kb_id} ({kb.kb_name})')

asyncio.run(verify())
"

# 4. Test that old workspace still accessible via tenant
# Legacy workspace 'myworkspace' becomes tenant 'myworkspace'
```

## Configuration Examples

### Docker Compose

```yaml
version: '3.8'

services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_DB: lightrag
      POSTGRES_PASSWORD: secret
    ports:
      - "5432:5432"
    volumes:
      - ./lightrag/kg/migrations/001_add_tenant_schema.sql:/docker-entrypoint-initdb.d/01_schema.sql

  redis:
    image: redis:7
    ports:
      - "6379:6379"

  lightrag:
    build: .
    environment:
      # Tenant Configuration
      TENANT_ENABLED: "true"
      MAX_CACHED_INSTANCES: "100"

      # Storage Configuration
      LIGHTRAG_KV_STORAGE: "PGKVStorage"
      LIGHTRAG_VECTOR_STORAGE: "PGVectorStorage"
      LIGHTRAG_GRAPH_STORAGE: "PGGraphStorage"

      # Database
      PG_HOST: "postgres"
      PG_DATABASE: "lightrag"
      PG_USER: "postgres"
      PG_PASSWORD: "secret"

      # LLM Configuration
      LLM_BINDING: "openai"
      LLM_MODEL: "gpt-4o-mini"
      LLM_BINDING_API_KEY: "${OPENAI_API_KEY}"

      # Embedding Configuration
      EMBEDDING_BINDING: "openai"
      EMBEDDING_MODEL: "text-embedding-3-small"
      EMBEDDING_DIM: "1536"

      # Authentication
      JWT_ALGORITHM: "HS256"
      TOKEN_SECRET: "your-secret-key-change-in-production"
      TOKEN_EXPIRE_HOURS: "24"

      # API
      CORS_ORIGINS: "*"
      LOG_LEVEL: "INFO"

    ports:
      - "9621:9621"

    depends_on:
      - postgres
      - redis

    volumes:
      - ./rag_storage:/app/rag_storage
```

### Environment Variables

```bash
# Tenant Manager
TENANT_ENABLED=true
MAX_CACHED_INSTANCES=100
TENANT_CONFIG_SYNC_INTERVAL=300

# Database
LIGHTRAG_KV_STORAGE=PGKVStorage
LIGHTRAG_VECTOR_STORAGE=PGVectorStorage
LIGHTRAG_GRAPH_STORAGE=PGGraphStorage

# PostgreSQL Connection
PG_HOST=localhost
PG_PORT=5432
PG_DATABASE=lightrag
PG_USER=postgres
PG_PASSWORD=secret

# Authentication
JWT_ALGORITHM=HS256
TOKEN_SECRET=your-secret-key
TOKEN_EXPIRE_HOURS=24
GUEST_TOKEN_EXPIRE_HOURS=1

# LLM Configuration
LLM_BINDING=openai
LLM_MODEL=gpt-4o-mini
LLM_BINDING_API_KEY=${OPENAI_API_KEY}
EMBEDDING_BINDING=openai
EMBEDDING_MODEL=text-embedding-3-small

# Quotas
MAX_DOCUMENTS=10000
MAX_STORAGE_GB=100
MAX_KB_PER_TENANT=50

# Rate Limiting
RATE_LIMIT_QUERIES_PER_MINUTE=100
RATE_LIMIT_DOCUMENTS_PER_HOUR=50
RATE_LIMIT_API_CALLS_PER_MONTH=100000

# Monitoring
LOG_LEVEL=INFO
ENABLE_AUDIT_LOGGING=true
AUDIT_LOG_RETENTION_DAYS=90
```

## Monitoring and Observability

### Metrics to Track

```python
# Key metrics for multi-tenant system

METRICS = {
    "tenant_management": {
        "active_tenants": "Gauge",
        "total_kbs": "Gauge",
        "tenant_creation_time": "Histogram",
    },
    "isolation": {
        "cross_tenant_access_attempts": "Counter",  # Should be 0
        "cross_kb_access_attempts": "Counter",      # Should be 0
        "isolation_violations": "Counter",           # Should be 0
    },
    "performance": {
        "query_latency_per_tenant": "Histogram",
        "document_processing_time": "Histogram",
        "rag_instance_cache_hits": "Counter",
        "rag_instance_cache_misses": "Counter",
    },
    "security": {
        "failed_auth_attempts": "Counter",
        "permission_denials": "Counter",
        "api_key_usage": "Counter (per key)",
    },
    "quotas": {
        "storage_used_per_tenant": "Gauge",
        "documents_per_tenant": "Gauge",
        "api_calls_per_tenant": "Counter",
    }
}
```

### Example Prometheus Queries

```promql
# Average query latency per tenant
histogram_quantile(0.95, query_latency_per_tenant) by (tenant_id)

# Cache hit rate
rag_instance_cache_hits / (rag_instance_cache_hits + rag_instance_cache_misses)

# Failed auth attempts
rate(failed_auth_attempts[5m])

# Cross-tenant access attempts (should be 0)
cross_tenant_access_attempts
```

### Logging

```python
# Structured logging for debugging

import structlog

logger = structlog.get_logger()

# Example log entry
logger.info(
    "query_executed",
    user_id="user-123",
    tenant_id="acme",
    kb_id="docs",
    query="What is...",
    mode="mix",
    latency_ms=145,
    result_count=5,
    request_id="req-abc-123"
)
```

## Rollout Strategy

### Phase 1: Soft Launch (Week 1)
```
- Deploy with TENANT_ENABLED=false (features off)
- Run in parallel with existing system
- Test against staging data
- Monitor for issues: 0 expected
```

### Phase 2: Closed Beta (Week 2)
```
- TENANT_ENABLED=true for 10% of traffic
- Small set of trusted customers
- Monitor metrics closely
- Rollback plan ready
```

### Phase 3: Gradual Rollout (Week 3)
```
- 25% → 50% → 100%
- Staggered by time of day
- Monitor isolation violations (should be 0)
- Customer education happening
```

### Phase 4: Full Production (Week 4)
```
- 100% of traffic on multi-tenant system
- Legacy workspace mode deprecated (6-month timeline)
- Full monitoring and alerting active
- Support team trained
```

## Troubleshooting Guide

### Issue: Cross-Tenant Data Visible

```
Symptom: User can see Tenant B data while using Tenant A credentials
Solution:
1. Check TokenPayload.tenant_id == request.path.tenant_id
2. Check storage filters include WHERE tenant_id = ? AND kb_id = ?
3. Review TenantContext creation in get_tenant_context()
4. Check RAGManager.get_rag_instance() is called with correct IDs
```

### Issue: Slow Queries

```
Symptom: Queries taking >1 second
Solution:
1. Check indexes on (tenant_id, kb_id) columns
2. Verify RAG instance cache is working (check metrics)
3. Check if instance is being recompiled every request
4. Profile with: SELECT * FROM documents WHERE tenant_id=? AND kb_id=?
```

### Issue: High Memory Usage

```
Symptom: Memory growing over time
Solution:
1. Check MAX_CACHED_INSTANCES setting (default 100)
2. Monitor rag_instance_cache_size metric
3. Verify finalize_storages() called on eviction
4. Check for memory leaks in embedding cache
```

## Support and Resources

### Documentation
- Architecture Overview: `adr/001-multi-tenant-architecture-overview.md`
- Implementation Guide: `adr/002-implementation-strategy.md`
- Data Models: `adr/003-data-models-and-storage.md`
- API Design: `adr/004-api-design.md`
- Security: `adr/005-security-analysis.md`
- Diagrams & Alternatives: `adr/006-architecture-diagrams-alternatives.md`

### Code Examples
- See `examples/multi_tenant_demo.py` for complete usage example
- See `tests/test_api_tenant_routes.py` for API testing examples
- See `scripts/migrate_workspace_to_tenant.py` for migration examples

### Getting Help
- GitHub Issues: [LightRAG/issues](https://github.com/HKUDS/LightRAG/issues)
- Discussions: [LightRAG/discussions](https://github.com/HKUDS/LightRAG/discussions)
- Discord: [LightRAG Community](https://discord.gg/yF2MmDJyGJ)

## Success Criteria

Multi-tenant implementation is successful when:

✓ **Functional Requirements Met**
- [ ] All API endpoints working with tenant/KB routing
- [ ] Data isolation verified (cross-tenant access prevents)
- [ ] RBAC enforcement working correctly
- [ ] Audit logging capturing all operations
- [ ] Migration from workspace to tenant successful

✓ **Performance Targets Met**
- [ ] Query latency < 200ms p99 (including tenant filtering)
- [ ] Storage overhead < 3%
- [ ] Instance cache hit rate > 90%
- [ ] API response time < 150ms average

✓ **Security Requirements Met**
- [ ] Zero cross-tenant data access
- [ ] JWT token validation in all requests
- [ ] Permission checking on every operation
- [ ] Rate limiting preventing abuse
- [ ] Audit logs tamper-proof and retained

✓ **Operational Readiness**
- [ ] Monitoring/alerting configured
- [ ] Runbooks for common issues
- [ ] Disaster recovery plan tested
- [ ] Support team trained
- [ ] Documentation complete

---

**Document Version**: 1.0
**Last Updated**: 2025-11-20
**Deployment Timeline**: 4 weeks
**Success Criteria**: All items checked off
**Status**: Ready for Implementation