* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad. * Add multi-tenant testing strategy and ADR index documentation - Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details. - Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles. * feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise - Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints. - Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options. - Documented the LightRAG architecture, storage backends, LLM integrations, and query modes. - Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
517 lines
14 KiB
Markdown
517 lines
14 KiB
Markdown
# ADR 007: Deployment Guide and Quick Reference
|
|
|
|
## Status: Proposed
|
|
|
|
## Summary of Multi-Tenant Architecture
|
|
|
|
### Core Components
|
|
|
|
| Component | Purpose | Responsibility |
|
|
|-----------|---------|-----------------|
|
|
| **Tenant** | Top-level isolation boundary | Grouping of knowledge bases |
|
|
| **Knowledge Base** | Domain-specific RAG system | Contains documents, entities, relationships |
|
|
| **TenantContext** | Request-scoped isolation | Passed through entire call stack |
|
|
| **RAGManager** | Instance caching | Creates/caches LightRAG per tenant/KB |
|
|
| **Storage Layer Filters** | Defense in depth | All queries scoped to tenant/KB |
|
|
|
|
### Key Design Decisions
|
|
|
|
```
|
|
┌──────────────────────────────────────┐
|
|
│ Composite Isolation Strategy │
|
|
├──────────────────────────────────────┤
|
|
│ Tenant ID (UUID) │
|
|
│ └─ Knowledge Base ID (UUID) │
|
|
│ └─ Composite Key: t:k:entity_id │
|
|
│ └─ Storage filters all queries │
|
|
└──────────────────────────────────────┘
|
|
```
|
|
|
|
### Files Modified/Created
|
|
|
|
**New Files (11 total)**:
|
|
1. `lightrag/models/tenant.py` - Tenant/KB models
|
|
2. `lightrag/services/tenant_service.py` - Tenant management
|
|
3. `lightrag/tenant_rag_manager.py` - Instance caching
|
|
4. `lightrag/api/dependencies.py` - DI for tenant context
|
|
5. `lightrag/api/models/requests.py` - API request models
|
|
6. `lightrag/api/routers/tenant_routes.py` - Tenant endpoints
|
|
7. `tests/test_tenant_isolation.py` - Unit tests
|
|
8. `tests/test_api_tenant_routes.py` - Integration tests
|
|
9. `scripts/migrate_workspace_to_tenant.py` - Migration script
|
|
10. `lightrag/kg/migrations/001_add_tenant_schema.sql` - DB schema
|
|
11. `lightrag/kg/migrations/mongo_001_add_tenant_collections.py` - MongoDB schema
|
|
|
|
**Modified Files (7 total)**:
|
|
1. `lightrag/base.py` - Add tenant/kb to StorageNameSpace
|
|
2. `lightrag/lightrag.py` - Add tenant context to query/insert
|
|
3. `lightrag/kg/postgres_impl.py` - Add tenant filtering to all queries
|
|
4. `lightrag/kg/json_kv_impl.py` - Add tenant/kb directories
|
|
5. `lightrag/api/lightrag_server.py` - Register new routes
|
|
6. `lightrag/api/auth.py` - Tenant-aware JWT validation
|
|
7. `lightrag/api/config.py` - Add tenant configuration
|
|
|
|
## Quick Start for Developers
|
|
|
|
### 1. Setting Up Development Environment
|
|
|
|
```bash
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Set up PostgreSQL for tenant metadata
|
|
docker run -d --name lightrag-postgres \
|
|
-e POSTGRES_PASSWORD=password \
|
|
-p 5432:5432 \
|
|
postgres:15
|
|
|
|
# Run migrations
|
|
psql postgresql://postgres:password@localhost:5432/postgres < \
|
|
lightrag/kg/migrations/001_add_tenant_schema.sql
|
|
|
|
# Set environment variables
|
|
export LIGHTRAG_KV_STORAGE=PGKVStorage
|
|
export TENANT_DB_HOST=localhost
|
|
export TENANT_DB_USER=postgres
|
|
export TENANT_DB_PASSWORD=password
|
|
```
|
|
|
|
### 2. Testing Locally
|
|
|
|
```bash
|
|
# Run unit tests
|
|
pytest tests/test_tenant_isolation.py -v
|
|
|
|
# Run integration tests
|
|
pytest tests/test_api_tenant_routes.py -v
|
|
|
|
# Run with coverage
|
|
pytest --cov=lightrag tests/ --cov-report=html
|
|
|
|
# Test tenant isolation (should fail if not working)
|
|
pytest tests/test_tenant_isolation.py::TestTenantIsolation::test_cross_tenant_data_isolation -v
|
|
```
|
|
|
|
### 3. Manual Testing via cURL
|
|
|
|
```bash
|
|
# 1. Create tenant (admin)
|
|
ADMIN_TOKEN="eyJhbGc..." # From auth system
|
|
curl -X POST http://localhost:9621/api/v1/tenants \
|
|
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"tenant_name": "Test Tenant"}'
|
|
|
|
# Response:
|
|
# {
|
|
# "status": "success",
|
|
# "data": {
|
|
# "tenant_id": "550e8400-e29b-41d4-a716-446655440000",
|
|
# "tenant_name": "Test Tenant",
|
|
# "is_active": true,
|
|
# "created_at": "2025-11-20T10:00:00Z"
|
|
# }
|
|
# }
|
|
|
|
TENANT_ID="550e8400-e29b-41d4-a716-446655440000"
|
|
|
|
# 2. Create knowledge base
|
|
curl -X POST http://localhost:9621/api/v1/tenants/$TENANT_ID/knowledge-bases \
|
|
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"kb_name": "Test KB"}'
|
|
|
|
KB_ID="660e8400-e29b-41d4-a716-446655440000"
|
|
|
|
# 3. Create API key for tenant
|
|
curl -X POST http://localhost:9621/api/v1/tenants/$TENANT_ID/api-keys \
|
|
-H "Authorization: Bearer $ADMIN_TOKEN" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"key_name": "test-key",
|
|
"knowledge_base_ids": ["'$KB_ID'"],
|
|
"permissions": ["query:run", "document:read"]
|
|
}'
|
|
|
|
# Response includes: {"key": "sk-..."}
|
|
API_KEY="sk-..."
|
|
|
|
# 4. Add document with API key
|
|
curl -X POST http://localhost:9621/api/v1/tenants/$TENANT_ID/knowledge-bases/$KB_ID/documents/add \
|
|
-H "X-API-Key: $API_KEY" \
|
|
-F "file=@test_document.pdf"
|
|
|
|
# 5. Query knowledge base
|
|
curl -X POST http://localhost:9621/api/v1/tenants/$TENANT_ID/knowledge-bases/$KB_ID/query \
|
|
-H "X-API-Key: $API_KEY" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"query": "What is this document about?",
|
|
"mode": "mix",
|
|
"top_k": 10
|
|
}'
|
|
|
|
# 6. Verify cross-tenant isolation (should fail)
|
|
TENANT_B_ID="770e8400-e29b-41d4-a716-446655440001"
|
|
curl -X GET http://localhost:9621/api/v1/tenants/$TENANT_B_ID \
|
|
-H "X-API-Key: $API_KEY"
|
|
|
|
# Response: 403 Forbidden (API key only for Tenant A)
|
|
```
|
|
|
|
## Backward Compatibility
|
|
|
|
### Migrating from Workspace to Tenant
|
|
|
|
```bash
|
|
# 1. Backup existing data
|
|
cp -r ./rag_storage ./rag_storage.backup
|
|
|
|
# 2. Run migration script
|
|
python scripts/migrate_workspace_to_tenant.py \
|
|
--working-dir ./rag_storage
|
|
|
|
# 3. Verify migration
|
|
python -c "
|
|
from lightrag.services.tenant_service import TenantService
|
|
import asyncio
|
|
|
|
async def verify():
|
|
service = TenantService(...)
|
|
tenants = await service.list_all_tenants()
|
|
for t in tenants:
|
|
print(f'Tenant: {t.tenant_id} ({t.tenant_name})')
|
|
kbs = await service.list_knowledge_bases(t.tenant_id)
|
|
for kb in kbs:
|
|
print(f' KB: {kb.kb_id} ({kb.kb_name})')
|
|
|
|
asyncio.run(verify())
|
|
"
|
|
|
|
# 4. Test that old workspace still accessible via tenant
|
|
# Legacy workspace 'myworkspace' becomes tenant 'myworkspace'
|
|
```
|
|
|
|
## Configuration Examples
|
|
|
|
### Docker Compose
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
services:
|
|
postgres:
|
|
image: postgres:15
|
|
environment:
|
|
POSTGRES_DB: lightrag
|
|
POSTGRES_PASSWORD: secret
|
|
ports:
|
|
- "5432:5432"
|
|
volumes:
|
|
- ./lightrag/kg/migrations/001_add_tenant_schema.sql:/docker-entrypoint-initdb.d/01_schema.sql
|
|
|
|
redis:
|
|
image: redis:7
|
|
ports:
|
|
- "6379:6379"
|
|
|
|
lightrag:
|
|
build: .
|
|
environment:
|
|
# Tenant Configuration
|
|
TENANT_ENABLED: "true"
|
|
MAX_CACHED_INSTANCES: "100"
|
|
|
|
# Storage Configuration
|
|
LIGHTRAG_KV_STORAGE: "PGKVStorage"
|
|
LIGHTRAG_VECTOR_STORAGE: "PGVectorStorage"
|
|
LIGHTRAG_GRAPH_STORAGE: "PGGraphStorage"
|
|
|
|
# Database
|
|
PG_HOST: "postgres"
|
|
PG_DATABASE: "lightrag"
|
|
PG_USER: "postgres"
|
|
PG_PASSWORD: "secret"
|
|
|
|
# LLM Configuration
|
|
LLM_BINDING: "openai"
|
|
LLM_MODEL: "gpt-4o-mini"
|
|
LLM_BINDING_API_KEY: "${OPENAI_API_KEY}"
|
|
|
|
# Embedding Configuration
|
|
EMBEDDING_BINDING: "openai"
|
|
EMBEDDING_MODEL: "text-embedding-3-small"
|
|
EMBEDDING_DIM: "1536"
|
|
|
|
# Authentication
|
|
JWT_ALGORITHM: "HS256"
|
|
TOKEN_SECRET: "your-secret-key-change-in-production"
|
|
TOKEN_EXPIRE_HOURS: "24"
|
|
|
|
# API
|
|
CORS_ORIGINS: "*"
|
|
LOG_LEVEL: "INFO"
|
|
|
|
ports:
|
|
- "9621:9621"
|
|
|
|
depends_on:
|
|
- postgres
|
|
- redis
|
|
|
|
volumes:
|
|
- ./rag_storage:/app/rag_storage
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Tenant Manager
|
|
TENANT_ENABLED=true
|
|
MAX_CACHED_INSTANCES=100
|
|
TENANT_CONFIG_SYNC_INTERVAL=300
|
|
|
|
# Database
|
|
LIGHTRAG_KV_STORAGE=PGKVStorage
|
|
LIGHTRAG_VECTOR_STORAGE=PGVectorStorage
|
|
LIGHTRAG_GRAPH_STORAGE=PGGraphStorage
|
|
|
|
# PostgreSQL Connection
|
|
PG_HOST=localhost
|
|
PG_PORT=5432
|
|
PG_DATABASE=lightrag
|
|
PG_USER=postgres
|
|
PG_PASSWORD=secret
|
|
|
|
# Authentication
|
|
JWT_ALGORITHM=HS256
|
|
TOKEN_SECRET=your-secret-key
|
|
TOKEN_EXPIRE_HOURS=24
|
|
GUEST_TOKEN_EXPIRE_HOURS=1
|
|
|
|
# LLM Configuration
|
|
LLM_BINDING=openai
|
|
LLM_MODEL=gpt-4o-mini
|
|
LLM_BINDING_API_KEY=${OPENAI_API_KEY}
|
|
EMBEDDING_BINDING=openai
|
|
EMBEDDING_MODEL=text-embedding-3-small
|
|
|
|
# Quotas
|
|
MAX_DOCUMENTS=10000
|
|
MAX_STORAGE_GB=100
|
|
MAX_KB_PER_TENANT=50
|
|
|
|
# Rate Limiting
|
|
RATE_LIMIT_QUERIES_PER_MINUTE=100
|
|
RATE_LIMIT_DOCUMENTS_PER_HOUR=50
|
|
RATE_LIMIT_API_CALLS_PER_MONTH=100000
|
|
|
|
# Monitoring
|
|
LOG_LEVEL=INFO
|
|
ENABLE_AUDIT_LOGGING=true
|
|
AUDIT_LOG_RETENTION_DAYS=90
|
|
```
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Metrics to Track
|
|
|
|
```python
|
|
# Key metrics for multi-tenant system
|
|
|
|
METRICS = {
|
|
"tenant_management": {
|
|
"active_tenants": "Gauge",
|
|
"total_kbs": "Gauge",
|
|
"tenant_creation_time": "Histogram",
|
|
},
|
|
"isolation": {
|
|
"cross_tenant_access_attempts": "Counter", # Should be 0
|
|
"cross_kb_access_attempts": "Counter", # Should be 0
|
|
"isolation_violations": "Counter", # Should be 0
|
|
},
|
|
"performance": {
|
|
"query_latency_per_tenant": "Histogram",
|
|
"document_processing_time": "Histogram",
|
|
"rag_instance_cache_hits": "Counter",
|
|
"rag_instance_cache_misses": "Counter",
|
|
},
|
|
"security": {
|
|
"failed_auth_attempts": "Counter",
|
|
"permission_denials": "Counter",
|
|
"api_key_usage": "Counter (per key)",
|
|
},
|
|
"quotas": {
|
|
"storage_used_per_tenant": "Gauge",
|
|
"documents_per_tenant": "Gauge",
|
|
"api_calls_per_tenant": "Counter",
|
|
}
|
|
}
|
|
```
|
|
|
|
### Example Prometheus Queries
|
|
|
|
```promql
|
|
# Average query latency per tenant
|
|
histogram_quantile(0.95, query_latency_per_tenant) by (tenant_id)
|
|
|
|
# Cache hit rate
|
|
rag_instance_cache_hits / (rag_instance_cache_hits + rag_instance_cache_misses)
|
|
|
|
# Failed auth attempts
|
|
rate(failed_auth_attempts[5m])
|
|
|
|
# Cross-tenant access attempts (should be 0)
|
|
cross_tenant_access_attempts
|
|
```
|
|
|
|
### Logging
|
|
|
|
```python
|
|
# Structured logging for debugging
|
|
|
|
import structlog
|
|
|
|
logger = structlog.get_logger()
|
|
|
|
# Example log entry
|
|
logger.info(
|
|
"query_executed",
|
|
user_id="user-123",
|
|
tenant_id="acme",
|
|
kb_id="docs",
|
|
query="What is...",
|
|
mode="mix",
|
|
latency_ms=145,
|
|
result_count=5,
|
|
request_id="req-abc-123"
|
|
)
|
|
```
|
|
|
|
## Rollout Strategy
|
|
|
|
### Phase 1: Soft Launch (Week 1)
|
|
```
|
|
- Deploy with TENANT_ENABLED=false (features off)
|
|
- Run in parallel with existing system
|
|
- Test against staging data
|
|
- Monitor for issues: 0 expected
|
|
```
|
|
|
|
### Phase 2: Closed Beta (Week 2)
|
|
```
|
|
- TENANT_ENABLED=true for 10% of traffic
|
|
- Small set of trusted customers
|
|
- Monitor metrics closely
|
|
- Rollback plan ready
|
|
```
|
|
|
|
### Phase 3: Gradual Rollout (Week 3)
|
|
```
|
|
- 25% → 50% → 100%
|
|
- Staggered by time of day
|
|
- Monitor isolation violations (should be 0)
|
|
- Customer education happening
|
|
```
|
|
|
|
### Phase 4: Full Production (Week 4)
|
|
```
|
|
- 100% of traffic on multi-tenant system
|
|
- Legacy workspace mode deprecated (6-month timeline)
|
|
- Full monitoring and alerting active
|
|
- Support team trained
|
|
```
|
|
|
|
## Troubleshooting Guide
|
|
|
|
### Issue: Cross-Tenant Data Visible
|
|
|
|
```
|
|
Symptom: User can see Tenant B data while using Tenant A credentials
|
|
Solution:
|
|
1. Check TokenPayload.tenant_id == request.path.tenant_id
|
|
2. Check storage filters include WHERE tenant_id = ? AND kb_id = ?
|
|
3. Review TenantContext creation in get_tenant_context()
|
|
4. Check RAGManager.get_rag_instance() is called with correct IDs
|
|
```
|
|
|
|
### Issue: Slow Queries
|
|
|
|
```
|
|
Symptom: Queries taking >1 second
|
|
Solution:
|
|
1. Check indexes on (tenant_id, kb_id) columns
|
|
2. Verify RAG instance cache is working (check metrics)
|
|
3. Check if instance is being recompiled every request
|
|
4. Profile with: SELECT * FROM documents WHERE tenant_id=? AND kb_id=?
|
|
```
|
|
|
|
### Issue: High Memory Usage
|
|
|
|
```
|
|
Symptom: Memory growing over time
|
|
Solution:
|
|
1. Check MAX_CACHED_INSTANCES setting (default 100)
|
|
2. Monitor rag_instance_cache_size metric
|
|
3. Verify finalize_storages() called on eviction
|
|
4. Check for memory leaks in embedding cache
|
|
```
|
|
|
|
## Support and Resources
|
|
|
|
### Documentation
|
|
- Architecture Overview: `adr/001-multi-tenant-architecture-overview.md`
|
|
- Implementation Guide: `adr/002-implementation-strategy.md`
|
|
- Data Models: `adr/003-data-models-and-storage.md`
|
|
- API Design: `adr/004-api-design.md`
|
|
- Security: `adr/005-security-analysis.md`
|
|
- Diagrams & Alternatives: `adr/006-architecture-diagrams-alternatives.md`
|
|
|
|
### Code Examples
|
|
- See `examples/multi_tenant_demo.py` for complete usage example
|
|
- See `tests/test_api_tenant_routes.py` for API testing examples
|
|
- See `scripts/migrate_workspace_to_tenant.py` for migration examples
|
|
|
|
### Getting Help
|
|
- GitHub Issues: [LightRAG/issues](https://github.com/HKUDS/LightRAG/issues)
|
|
- Discussions: [LightRAG/discussions](https://github.com/HKUDS/LightRAG/discussions)
|
|
- Discord: [LightRAG Community](https://discord.gg/yF2MmDJyGJ)
|
|
|
|
## Success Criteria
|
|
|
|
Multi-tenant implementation is successful when:
|
|
|
|
✓ **Functional Requirements Met**
|
|
- [ ] All API endpoints working with tenant/KB routing
|
|
- [ ] Data isolation verified (cross-tenant access prevents)
|
|
- [ ] RBAC enforcement working correctly
|
|
- [ ] Audit logging capturing all operations
|
|
- [ ] Migration from workspace to tenant successful
|
|
|
|
✓ **Performance Targets Met**
|
|
- [ ] Query latency < 200ms p99 (including tenant filtering)
|
|
- [ ] Storage overhead < 3%
|
|
- [ ] Instance cache hit rate > 90%
|
|
- [ ] API response time < 150ms average
|
|
|
|
✓ **Security Requirements Met**
|
|
- [ ] Zero cross-tenant data access
|
|
- [ ] JWT token validation in all requests
|
|
- [ ] Permission checking on every operation
|
|
- [ ] Rate limiting preventing abuse
|
|
- [ ] Audit logs tamper-proof and retained
|
|
|
|
✓ **Operational Readiness**
|
|
- [ ] Monitoring/alerting configured
|
|
- [ ] Runbooks for common issues
|
|
- [ ] Disaster recovery plan tested
|
|
- [ ] Support team trained
|
|
- [ ] Documentation complete
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0
|
|
**Last Updated**: 2025-11-20
|
|
**Deployment Timeline**: 4 weeks
|
|
**Success Criteria**: All items checked off
|
|
**Status**: Ready for Implementation
|