LightRAG/docs/archives/adr/008-multi-tenant-testing-strategy.md
Raphael MANSUY 2b292d4924
docs: Enterprise Edition & Multi-tenancy attribution (#5)
* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad.

* Add multi-tenant testing strategy and ADR index documentation

- Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details.
- Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles.

* feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise

- Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints.
- Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options.
- Documented the LightRAG architecture, storage backends, LLM integrations, and query modes.
- Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
2025-12-04 18:09:15 +08:00

586 lines
21 KiB
Markdown

# ADR 008: Multi-Tenant Testing Strategy for ./starter Environment
## Status: Proposed
## Context
The `./starter` directory provides a local development and testing environment for the LightRAG multi-tenant implementation. This environment must support two distinct operational modes:
1. **No Multi-Tenant Mode (Compatibility Mode)**: Behaves like the original single-workspace LightRAG, maintaining backward compatibility with the `main` branch
2. **Multi-Tenant Mode (Production Mode)**: Demonstrates full multi-tenant isolation with multiple tenants and knowledge bases
The testing strategy must ensure both modes work correctly and that switching between them is seamless and reproducible.
## Problem Statement
Current state (as of November 2025):
- The multi-tenant architecture is implemented in `feat/multi-tenant` branch
- The `./starter` directory uses Docker Compose with PostgreSQL, Redis, LightRAG API, and WebUI
- The environment supports automatic tenant/KB resolution with "default" tenant and KB names
- No clear testing protocol exists for validating both single-tenant and multi-tenant scenarios
- Documentation doesn't reflect the actual implementation details
### Key Challenges
1. **Backward Compatibility**: Must verify that old single-tenant code paths still work
2. **Switching Modes**: Need clear procedures to enable/disable multi-tenancy for testing
3. **Data Isolation**: Must verify tenant/KB isolation at both API and database levels
4. **Reproducibility**: Tests must be idempotent and produce consistent results
5. **Environment Configuration**: Starter must clearly document all configuration options for each mode
## Decision
### Multi-Mode Testing Architecture
We implement a **configurable testing environment** in `./starter` that can operate in two modes via environment variables:
```
Mode Selection:
├─ MULTITENANT_MODE=off → Single-tenant compatibility mode (like main branch)
├─ MULTITENANT_MODE=on → Full multi-tenant mode (default)
└─ MULTITENANT_MODE=demo → Multi-tenant demo mode (2 pre-configured tenants)
```
### Scenario 1: No Multi-Tenant Mode (Compatibility Mode)
**Goal**: Verify that LightRAG works exactly as it did before multi-tenancy was added.
**Configuration**:
```env
MULTITENANT_MODE=off
DEFAULT_TENANT=default
DEFAULT_KB=default
WORKSPACE_ISOLATION_TYPE=legacy
```
**Behavior**:
- API endpoints work WITHOUT requiring X-Tenant-ID or X-KB-ID headers
- All operations use "default" tenant and "default" KB internally
- Storage uses legacy workspace namespace: `tenant_id_kb_id``default_default`
- No tenant context validation errors
- Fully backward compatible with main branch code
**Testing Scenarios**:
| Test Case | Description | Expected Result |
|-----------|-------------|-----------------|
| **T1.1** | Upload document without tenant headers | ✓ Document stored in default workspace |
| **T1.2** | Query without tenant headers | ✓ Results returned from default workspace |
| **T1.3** | Create knowledge graph without tenant headers | ✓ Graph entities stored in default workspace |
| **T1.4** | Access WebUI without specifying tenant | ✓ UI works with implicit default tenant |
| **T1.5** | Database contains no tenant_id/kb_id fields | ✓ Tables use legacy workspace field only |
| **T1.6** | Mix of old client and new client requests | ✓ Both work without conflicts |
| **T1.7** | Verify authorization doesn't require tenant access | ✓ Auth tokens work with role only (no tenant scope) |
| **T1.8** | All stored data is in `default_default` namespace | ✓ Data isolation via workspace namespace only |
**Backward Compatibility Verification**:
```python
# Should work identically to main branch
rag = LightRAG(working_dir="./rag_data") # No tenant_id/kb_id
await rag.insert("document text")
results = await rag.query("query text")
# Should NOT require X-Tenant-ID header
curl -X POST http://localhost:8000/api/v1/documents/insert \
-H "Authorization: Bearer token" \
-H "Content-Type: application/json" \
-d '{"document": "text"}'
```
**Database Schema**:
- Uses legacy tables: `lightrag_doc_full`, `lightrag_doc_chunks`, etc.
- `workspace` field acts as tenant/KB namespace
- No `tenant_id`, `kb_id` columns added
- Composite indexes: `(workspace, id)` not `(tenant_id, kb_id, id)`
### Scenario 2: Single Tenant with Multiple KBs (Controlled Multi-Tenant)
**Goal**: Test multi-tenant architecture with a single tenant serving multiple knowledge bases.
**Configuration**:
```env
MULTITENANT_MODE=on
DEFAULT_TENANT=tenant-1
DEFAULT_KB=kb-default
CREATE_DEFAULT_KB=kb-default,kb-secondary,kb-experimental
```
**Behavior**:
- API requires X-Tenant-ID (resolves to tenant-1 if not provided)
- API requires X-KB-ID (can be any KB in the tenant)
- Different KBs can coexist for the same tenant
- Complete data isolation: `tenant-1:kb-default:entity1` vs `tenant-1:kb-secondary:entity1`
- Same tenant has access to all its KBs
**Testing Scenarios**:
| Test Case | Description | Expected Result |
|-----------|-------------|-----------------|
| **T2.1** | Insert into kb-default | ✓ Document stored in tenant-1_kb-default namespace |
| **T2.2** | Insert into kb-secondary | ✓ Document stored in tenant-1_kb-secondary namespace |
| **T2.3** | Query from kb-default returns kb-default data only | ✓ No cross-KB data leakage |
| **T2.4** | Query from kb-secondary returns kb-secondary data only | ✓ No cross-KB data leakage |
| **T2.5** | Switch KB in same request sequence | ✓ Isolation maintained at database level |
| **T2.6** | Create duplicate entity names in different KBs | ✓ Database allows duplicates (namespace isolated) |
| **T2.7** | Generate graph for kb-default | ✓ Graph contains only kb-default entities |
| **T2.8** | Database contains tenant_id and kb_id columns | ✓ Composite keys prevent collisions |
**Query Examples**:
```bash
# Query KB-1
curl -X POST http://localhost:8000/api/v1/query \
-H "Authorization: Bearer token" \
-H "X-Tenant-ID: tenant-1" \
-H "X-KB-ID: kb-default" \
-d '{"query": "text"}'
# Query KB-2 (same tenant)
curl -X POST http://localhost:8000/api/v1/query \
-H "Authorization: Bearer token" \
-H "X-Tenant-ID: tenant-1" \
-H "X-KB-ID: kb-secondary" \
-d '{"query": "text"}'
```
**Database Schema**:
- Tables include both `workspace` (legacy) and `tenant_id`, `kb_id` (new)
- Workspace auto-generated as `tenant_id_kb_id` for backward compatibility
- Composite indexes: `(tenant_id, kb_id, id)`
- Unique constraints prevent accidental cross-KB entity conflicts
### Scenario 3: Multiple Tenants (Full Multi-Tenant)
**Goal**: Demonstrate complete multi-tenant isolation with multiple independent organizations.
**Configuration**:
```env
MULTITENANT_MODE=demo
# Pre-configured demo tenants:
# - Tenant 1: "acme-corp"
# ├─ kb-prod (Production knowledge base)
# └─ kb-dev (Development knowledge base)
# - Tenant 2: "techstart"
# ├─ kb-main (Main knowledge base)
# └─ kb-backup (Backup knowledge base)
```
**Behavior**:
- API requires X-Tenant-ID and X-KB-ID on every request
- No "default" fallback - must be explicit
- Complete isolation: acme-corp cannot access techstart data
- Separate resource quotas per tenant
- Independent configurations (LLM models, embedding models)
**Testing Scenarios**:
| Test Case | Description | Expected Result |
|-----------|-------------|-----------------|
| **T3.1** | acme-corp inserts into kb-prod | ✓ Data isolated in acme-corp_kb-prod |
| **T3.2** | techstart inserts into kb-main | ✓ Data isolated in techstart_kb-main |
| **T3.3** | acme-corp queries kb-prod | ✓ Returns only acme-corp_kb-prod data |
| **T3.4** | techstart queries kb-main | ✓ Returns only techstart_kb-main data |
| **T3.5** | acme-corp attempts to query techstart KB | ✗ 403 Forbidden (access denied) |
| **T3.6** | User with acme-corp JWT tries techstart KB | ✗ Permission denied at API layer |
| **T3.7** | Different entity names in different tenants | ✓ Database allows identical names in different tenant_id values |
| **T3.8** | Database NEVER returns cross-tenant data | ✓ Query filters enforce `tenant_id` constraint |
| **T3.9** | Even with DB admin creds, workspace isolation works | ✓ Cannot accidentally query wrong tenant at SQL level |
| **T3.10** | Delete acme-corp entity doesn't affect techstart | ✓ DELETE uses composite key (tenant_id, kb_id, id) |
| **T3.11** | Verify JWT tokens scope to specific tenant | ✓ Token contains tenant_id, prevents cross-tenant access |
| **T3.12** | Resource quotas enforced per tenant | ✓ acme-corp quota limits don't affect techstart |
**Request Examples**:
```bash
# acme-corp accessing kb-prod
curl -X POST http://localhost:8000/api/v1/query \
-H "Authorization: Bearer acme-corp-token" \
-H "X-Tenant-ID: acme-corp" \
-H "X-KB-ID: kb-prod" \
-d '{"query": "revenue"}'
# techstart accessing kb-main
curl -X POST http://localhost:8000/api/v1/query \
-H "Authorization: Bearer techstart-token" \
-H "X-Tenant-ID: techstart" \
-H "X-KB-ID: kb-main" \
-d '{"query": "funding"}'
# acme-corp trying to access techstart data (should fail)
curl -X POST http://localhost:8000/api/v1/query \
-H "Authorization: Bearer acme-corp-token" \
-H "X-Tenant-ID: techstart" \
-H "X-KB-ID: kb-main" \
-d '{"query": "data"}' \
# Response: 403 Forbidden - "User does not have access to tenant"
```
**Database Verification**:
```sql
-- Verify tenant isolation at SQL level
SELECT COUNT(*) FROM lightrag_doc_full
WHERE tenant_id = 'acme-corp' AND kb_id = 'kb-prod'; -- Count acme-corp documents
SELECT COUNT(*) FROM lightrag_doc_full
WHERE tenant_id = 'techstart' AND kb_id = 'kb-main'; -- Count techstart documents
-- Verify no cross-tenant data in single query
SELECT DISTINCT tenant_id, kb_id FROM lightrag_doc_full; -- Should see: acme-corp, techstart only
-- Verify composite indexes exist
\di lightrag_doc_full* -- Should show idx on (tenant_id, kb_id, id)
```
## Implementation Details
### Environment Variable Configuration
**File: `./starter/env.example`** (updated with new options):
```env
# ============================================================================
# TESTING MODE CONFIGURATION
# ============================================================================
# Choose testing mode:
# off = Single-tenant compatibility mode (like main branch)
# on = Multi-tenant mode with single default tenant
# demo = Multi-tenant mode with 2 pre-configured tenants
MULTITENANT_MODE=demo
# For MULTITENANT_MODE=on, create additional KBs
# Format: kb_name,kb_name,kb_name
CREATE_DEFAULT_KB=kb-default,kb-secondary,kb-experimental
# ============================================================================
# TENANT CONFIGURATION (Used when MULTITENANT_MODE != off)
# ============================================================================
DEFAULT_TENANT=default
DEFAULT_KB=default
# Pre-configured demo tenants (for MULTITENANT_MODE=demo)
DEMO_TENANT_1_NAME=acme-corp
DEMO_TENANT_1_KBS=kb-prod,kb-dev
DEMO_TENANT_2_NAME=techstart
DEMO_TENANT_2_KBS=kb-main,kb-backup
# ============================================================================
```
### Docker Compose Modifications
**File: `./starter/docker-compose.yml`** (add initialization script):
```yaml
services:
postgres:
environment:
MULTITENANT_MODE: ${MULTITENANT_MODE:-demo}
volumes:
- ./init-postgres.sql:/docker-entrypoint-initdb.d/01-init.sql:ro
- ./init-demo-tenants.sql:/docker-entrypoint-initdb.d/02-demo-tenants.sql:ro # New
lightrag-api:
environment:
MULTITENANT_MODE: ${MULTITENANT_MODE:-demo}
DEFAULT_TENANT: ${DEFAULT_TENANT:-default}
DEFAULT_KB: ${DEFAULT_KB:-default}
CREATE_DEFAULT_KB: ${CREATE_DEFAULT_KB:-kb-default}
```
### Initialization SQL Scripts
**New File: `./starter/init-demo-tenants.sql`**
Creates the demo tenants and KBs when `MULTITENANT_MODE=demo`:
```sql
-- Only run if MULTITENANT_MODE=demo
-- This is handled via environment variable in docker-entrypoint
INSERT INTO tenants (tenant_id, tenant_name, description, is_active)
VALUES
('acme-corp', 'ACME Corporation', 'Demo tenant 1: Large enterprise', true),
('techstart', 'TechStart Inc', 'Demo tenant 2: Startup', true);
INSERT INTO knowledge_bases (kb_id, tenant_id, kb_name, description, is_active)
VALUES
('kb-prod', 'acme-corp', 'Production KB', 'Live production data', true),
('kb-dev', 'acme-corp', 'Development KB', 'Dev/staging data', true),
('kb-main', 'techstart', 'Main KB', 'Primary knowledge base', true),
('kb-backup', 'techstart', 'Backup KB', 'Backup and archival', true);
```
### Testing Procedures
#### Procedure 1: Run Compatibility Mode Tests
```bash
cd ./starter
# 1. Configure for compatibility mode
cp env.example .env
echo "MULTITENANT_MODE=off" >> .env
echo "WORKSPACE_ISOLATION_TYPE=legacy" >> .env
# 2. Start services
make setup
make up
make init-db
# 3. Run compatibility tests
pytest ../tests/test_backward_compatibility.py -v
# 4. Run manual tests
python3 reproduce_issue.py # Should work without tenant headers
```
**Expected Results**:
- All T1.x test cases pass
- No authorization failures due to missing tenant context
- Database uses workspace namespace only
- Queries return data regardless of tenant headers (or missing headers)
#### Procedure 2: Run Single-Tenant Multi-KB Tests
```bash
cd ./starter
# 1. Configure for single tenant with multiple KBs
cp env.example .env
echo "MULTITENANT_MODE=on" >> .env
echo "DEFAULT_TENANT=tenant-1" >> .env
echo "CREATE_DEFAULT_KB=kb-default,kb-secondary,kb-experimental" >> .env
# 2. Start services
make setup
make up
make init-db
# 3. Run isolation tests
pytest ../tests/test_multi_tenant_backends.py::TestTenantIsolation -v
# 4. Manual test: Verify KB isolation
# Create document in kb-default
curl -X POST http://localhost:8000/api/v1/documents/insert \
-H "X-Tenant-ID: tenant-1" \
-H "X-KB-ID: kb-default" \
-d '{"document": "document in kb-default"}'
# Query should return document
curl -X POST http://localhost:8000/api/v1/query \
-H "X-Tenant-ID: tenant-1" \
-H "X-KB-ID: kb-default"
# Query in different KB should return NOTHING
curl -X POST http://localhost:8000/api/v1/query \
-H "X-Tenant-ID: tenant-1" \
-H "X-KB-ID: kb-secondary"
```
**Expected Results**:
- All T2.x test cases pass
- Data isolated by KB within same tenant
- No cross-KB data leakage at API or database level
- Composite keys work correctly
#### Procedure 3: Run Full Multi-Tenant Tests
```bash
cd ./starter
# 1. Configure for full multi-tenant demo mode (default)
cp env.example .env
echo "MULTITENANT_MODE=demo" >> .env
# 2. Start services
make setup
make up
make init-db
# 3. Run full multi-tenant tests
pytest ../tests/test_multi_tenant_backends.py -v
pytest ../tests/test_tenant_security.py -v
# 4. Manual test: Verify cross-tenant isolation
# Insert as acme-corp
curl -X POST http://localhost:8000/api/v1/documents/insert \
-H "Authorization: Bearer acme-token" \
-H "X-Tenant-ID: acme-corp" \
-H "X-KB-ID: kb-prod" \
-d '{"document": "acme document"}'
# Try to query as techstart (should fail or return empty)
curl -X POST http://localhost:8000/api/v1/query \
-H "Authorization: Bearer acme-token" \
-H "X-Tenant-ID: techstart" \
-H "X-KB-ID: kb-main"
# Expected: 403 Forbidden
# Query as acme should succeed
curl -X POST http://localhost:8000/api/v1/query \
-H "Authorization: Bearer acme-token" \
-H "X-Tenant-ID: acme-corp" \
-H "X-KB-ID: kb-prod"
# Expected: 200 OK with results
# 5. Database verification
make db-shell
# SELECT COUNT(*) FROM lightrag_doc_full WHERE tenant_id='acme-corp';
# SELECT DISTINCT tenant_id FROM lightrag_doc_full;
```
**Expected Results**:
- All T3.x test cases pass
- Cross-tenant access denied (403)
- Complete data isolation at database level
- No authorization bypasses
## Testing Matrix
| Mode | Tenant Headers Required | Default Tenant | Multiple KBs | Multiple Tenants | Test File |
|------|-------------------------|------------------|--------------|------------------|-----------|
| **off** | No | Always default | ❌ Single workspace | ❌ Single tenant | `test_backward_compatibility.py` |
| **on** | Yes | Provided/resolved | ✓ Multiple per tenant | ❌ Single tenant only | `test_multi_tenant_backends.py` |
| **demo** | Yes | None/explicit | ✓ Multiple per tenant | ✓ 2 pre-configured | `test_tenant_security.py` |
## Test Coverage
### Unit Tests
- **test_backward_compatibility.py**: Validates old code paths still work
- **test_multi_tenant_backends.py**: Validates storage layer isolation
- **test_tenant_models.py**: Validates data models
- **test_tenant_security.py**: Validates permission/authorization
### Integration Tests
- **API Layer**: test_tenant_api_routes.py
- **Database**: test_tenant_storage_phase3.py
- **Graph Operations**: test_graph_storage.py
### Manual Verification
- Database schema validation (composite keys, indexes)
- Cross-tenant access attempts (should fail)
- KB isolation verification
- Authorization enforcement
## Consequences
### Positive
1. **Flexible Testing**: Can test backward compatibility and new multi-tenant features
2. **Clear Procedures**: Step-by-step procedures for each testing scenario
3. **Reproducibility**: Environment variables make tests repeatable
4. **Safety**: Explicit mode selection prevents accidental data mixing
5. **Documentation**: Clear understanding of what each mode does
6. **Validation**: Comprehensive test matrix covers all scenarios
### Negative/Tradeoffs
1. **Configuration Complexity**: Three modes add configuration overhead
2. **Initialization Scripts**: Must maintain both legacy and multi-tenant initialization
3. **Testing Duration**: Running all three modes sequentially takes time
4. **Documentation Maintenance**: Must keep mode-specific docs up to date
5. **Docker Image Size**: Includes both legacy and new code paths
## Rollback/Migration
### From Compatibility Mode to Multi-Tenant
```bash
# 1. Back up existing data
make db-backup
# 2. Switch mode
sed -i 's/MULTITENANT_MODE=off/MULTITENANT_MODE=on/' .env
# 3. Migrate database schema
# This requires: DROP old tables, CREATE new tables with tenant columns
# Migration script: scripts/migrate_to_multitenant.sql
# 4. Restart services
make restart
```
### From Multi-Tenant back to Compatibility Mode
```bash
# 1. Back up multi-tenant data
make db-backup
# 2. Switch mode
sed -i 's/MULTITENANT_MODE=on/MULTITENANT_MODE=off/' .env
# 3. Extract single tenant data
# SELECT * FROM lightrag_doc_full WHERE tenant_id='default'
# INTO workspace-based tables
# 4. Restart services
make restart
```
## Verification Checklist
Before considering the ADR complete:
- [ ] `MULTITENANT_MODE=off` works identically to main branch
- [ ] `MULTITENANT_MODE=on` prevents cross-KB data access
- [ ] `MULTITENANT_MODE=demo` prevents cross-tenant data access
- [ ] Environment variable switching is seamless
- [ ] All test cases (T1-T3) pass in their respective modes
- [ ] Database schema matches mode requirements
- [ ] Documentation reflects actual implementation
- [ ] Integration tests run successfully
- [ ] Manual verification procedures validate isolation
- [ ] Authorization failures work correctly (403, 401, etc.)
## References
### Related ADRs
- ADR 001: Multi-Tenant Architecture Overview
- ADR 002: Implementation Strategy
- ADR 003: Data Models and Storage
- ADR 004: API Design
- ADR 005: Security Analysis
### Implementation Files
- `lightrag/models/tenant.py`: TenantContext, Tenant, KnowledgeBase models
- `lightrag/tenant_rag_manager.py`: Per-tenant instance management
- `lightrag/api/dependencies.py`: Tenant context extraction
- `tests/test_backward_compatibility.py`: Legacy compatibility tests
- `tests/test_multi_tenant_backends.py`: Multi-tenant backend tests
- `tests/test_tenant_security.py`: Security validation
### Starter Files
- `starter/docker-compose.yml`: Service orchestration
- `starter/env.example`: Configuration template
- `starter/Makefile`: Testing procedures
- `starter/init-postgres.sql`: Database initialization
## Next Steps
1. **Implement environment variable handling** in docker-entrypoint-initdb.d scripts
2. **Create demo tenant initialization** SQL script (init-demo-tenants.sql)
3. **Update Makefile** with mode-specific test targets
4. **Create test runner script** that runs appropriate tests for each mode
5. **Document mode selection** in README.md
6. **Create CI/CD workflow** to test all three modes automatically
7. **Add health checks** that validate mode-specific expectations
8. **Create migration scripts** for switching between modes
9. **Update all existing ADRs** to reference this testing strategy
10. **Add mode detection** to API startup (warn if wrong mode configuration)
---
**Document Version**: 1.0
**Last Updated**: 2025-11-22
**Author**: Architecture Design Process
**Status**: Proposed - Ready for Implementation Review
**Implementation Notes**:
- Based on actual code examination of feat/multi-tenant branch
- Verified against: tenant.py, tenant_rag_manager.py, dependencies.py, docker-compose.yml
- Tested scenarios aligned with actual test files in tests/ directory
- Configuration options match env.example and existing environment setup