LightRAG/docs/archives/adr/008-multi-tenant-testing-strategy.md
Raphael MANSUY 2b292d4924
docs: Enterprise Edition & Multi-tenancy attribution (#5)
* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad.

* Add multi-tenant testing strategy and ADR index documentation

- Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details.
- Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles.

* feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise

- Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints.
- Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options.
- Documented the LightRAG architecture, storage backends, LLM integrations, and query modes.
- Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
2025-12-04 18:09:15 +08:00

21 KiB

ADR 008: Multi-Tenant Testing Strategy for ./starter Environment

Status: Proposed

Context

The ./starter directory provides a local development and testing environment for the LightRAG multi-tenant implementation. This environment must support two distinct operational modes:

  1. No Multi-Tenant Mode (Compatibility Mode): Behaves like the original single-workspace LightRAG, maintaining backward compatibility with the main branch
  2. Multi-Tenant Mode (Production Mode): Demonstrates full multi-tenant isolation with multiple tenants and knowledge bases

The testing strategy must ensure both modes work correctly and that switching between them is seamless and reproducible.

Problem Statement

Current state (as of November 2025):

  • The multi-tenant architecture is implemented in feat/multi-tenant branch
  • The ./starter directory uses Docker Compose with PostgreSQL, Redis, LightRAG API, and WebUI
  • The environment supports automatic tenant/KB resolution with "default" tenant and KB names
  • No clear testing protocol exists for validating both single-tenant and multi-tenant scenarios
  • Documentation doesn't reflect the actual implementation details

Key Challenges

  1. Backward Compatibility: Must verify that old single-tenant code paths still work
  2. Switching Modes: Need clear procedures to enable/disable multi-tenancy for testing
  3. Data Isolation: Must verify tenant/KB isolation at both API and database levels
  4. Reproducibility: Tests must be idempotent and produce consistent results
  5. Environment Configuration: Starter must clearly document all configuration options for each mode

Decision

Multi-Mode Testing Architecture

We implement a configurable testing environment in ./starter that can operate in two modes via environment variables:

Mode Selection:
├─ MULTITENANT_MODE=off    → Single-tenant compatibility mode (like main branch)
├─ MULTITENANT_MODE=on     → Full multi-tenant mode (default)
└─ MULTITENANT_MODE=demo   → Multi-tenant demo mode (2 pre-configured tenants)

Scenario 1: No Multi-Tenant Mode (Compatibility Mode)

Goal: Verify that LightRAG works exactly as it did before multi-tenancy was added.

Configuration:

MULTITENANT_MODE=off
DEFAULT_TENANT=default
DEFAULT_KB=default
WORKSPACE_ISOLATION_TYPE=legacy

Behavior:

  • API endpoints work WITHOUT requiring X-Tenant-ID or X-KB-ID headers
  • All operations use "default" tenant and "default" KB internally
  • Storage uses legacy workspace namespace: tenant_id_kb_iddefault_default
  • No tenant context validation errors
  • Fully backward compatible with main branch code

Testing Scenarios:

Test Case Description Expected Result
T1.1 Upload document without tenant headers ✓ Document stored in default workspace
T1.2 Query without tenant headers ✓ Results returned from default workspace
T1.3 Create knowledge graph without tenant headers ✓ Graph entities stored in default workspace
T1.4 Access WebUI without specifying tenant ✓ UI works with implicit default tenant
T1.5 Database contains no tenant_id/kb_id fields ✓ Tables use legacy workspace field only
T1.6 Mix of old client and new client requests ✓ Both work without conflicts
T1.7 Verify authorization doesn't require tenant access ✓ Auth tokens work with role only (no tenant scope)
T1.8 All stored data is in default_default namespace ✓ Data isolation via workspace namespace only

Backward Compatibility Verification:

# Should work identically to main branch
rag = LightRAG(working_dir="./rag_data")  # No tenant_id/kb_id
await rag.insert("document text")
results = await rag.query("query text")

# Should NOT require X-Tenant-ID header
curl -X POST http://localhost:8000/api/v1/documents/insert \
  -H "Authorization: Bearer token"  \
  -H "Content-Type: application/json" \
  -d '{"document": "text"}'

Database Schema:

  • Uses legacy tables: lightrag_doc_full, lightrag_doc_chunks, etc.
  • workspace field acts as tenant/KB namespace
  • No tenant_id, kb_id columns added
  • Composite indexes: (workspace, id) not (tenant_id, kb_id, id)

Scenario 2: Single Tenant with Multiple KBs (Controlled Multi-Tenant)

Goal: Test multi-tenant architecture with a single tenant serving multiple knowledge bases.

Configuration:

MULTITENANT_MODE=on
DEFAULT_TENANT=tenant-1
DEFAULT_KB=kb-default
CREATE_DEFAULT_KB=kb-default,kb-secondary,kb-experimental

Behavior:

  • API requires X-Tenant-ID (resolves to tenant-1 if not provided)
  • API requires X-KB-ID (can be any KB in the tenant)
  • Different KBs can coexist for the same tenant
  • Complete data isolation: tenant-1:kb-default:entity1 vs tenant-1:kb-secondary:entity1
  • Same tenant has access to all its KBs

Testing Scenarios:

Test Case Description Expected Result
T2.1 Insert into kb-default ✓ Document stored in tenant-1_kb-default namespace
T2.2 Insert into kb-secondary ✓ Document stored in tenant-1_kb-secondary namespace
T2.3 Query from kb-default returns kb-default data only ✓ No cross-KB data leakage
T2.4 Query from kb-secondary returns kb-secondary data only ✓ No cross-KB data leakage
T2.5 Switch KB in same request sequence ✓ Isolation maintained at database level
T2.6 Create duplicate entity names in different KBs ✓ Database allows duplicates (namespace isolated)
T2.7 Generate graph for kb-default ✓ Graph contains only kb-default entities
T2.8 Database contains tenant_id and kb_id columns ✓ Composite keys prevent collisions

Query Examples:

# Query KB-1
curl -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer token" \
  -H "X-Tenant-ID: tenant-1" \
  -H "X-KB-ID: kb-default" \
  -d '{"query": "text"}'

# Query KB-2 (same tenant)
curl -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer token" \
  -H "X-Tenant-ID: tenant-1" \
  -H "X-KB-ID: kb-secondary" \
  -d '{"query": "text"}'

Database Schema:

  • Tables include both workspace (legacy) and tenant_id, kb_id (new)
  • Workspace auto-generated as tenant_id_kb_id for backward compatibility
  • Composite indexes: (tenant_id, kb_id, id)
  • Unique constraints prevent accidental cross-KB entity conflicts

Scenario 3: Multiple Tenants (Full Multi-Tenant)

Goal: Demonstrate complete multi-tenant isolation with multiple independent organizations.

Configuration:

MULTITENANT_MODE=demo
# Pre-configured demo tenants:
# - Tenant 1: "acme-corp"
#   ├─ kb-prod (Production knowledge base)
#   └─ kb-dev (Development knowledge base)
# - Tenant 2: "techstart"
#   ├─ kb-main (Main knowledge base)
#   └─ kb-backup (Backup knowledge base)

Behavior:

  • API requires X-Tenant-ID and X-KB-ID on every request
  • No "default" fallback - must be explicit
  • Complete isolation: acme-corp cannot access techstart data
  • Separate resource quotas per tenant
  • Independent configurations (LLM models, embedding models)

Testing Scenarios:

Test Case Description Expected Result
T3.1 acme-corp inserts into kb-prod ✓ Data isolated in acme-corp_kb-prod
T3.2 techstart inserts into kb-main ✓ Data isolated in techstart_kb-main
T3.3 acme-corp queries kb-prod ✓ Returns only acme-corp_kb-prod data
T3.4 techstart queries kb-main ✓ Returns only techstart_kb-main data
T3.5 acme-corp attempts to query techstart KB ✗ 403 Forbidden (access denied)
T3.6 User with acme-corp JWT tries techstart KB ✗ Permission denied at API layer
T3.7 Different entity names in different tenants ✓ Database allows identical names in different tenant_id values
T3.8 Database NEVER returns cross-tenant data ✓ Query filters enforce tenant_id constraint
T3.9 Even with DB admin creds, workspace isolation works ✓ Cannot accidentally query wrong tenant at SQL level
T3.10 Delete acme-corp entity doesn't affect techstart ✓ DELETE uses composite key (tenant_id, kb_id, id)
T3.11 Verify JWT tokens scope to specific tenant ✓ Token contains tenant_id, prevents cross-tenant access
T3.12 Resource quotas enforced per tenant ✓ acme-corp quota limits don't affect techstart

Request Examples:

# acme-corp accessing kb-prod
curl -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer acme-corp-token" \
  -H "X-Tenant-ID: acme-corp" \
  -H "X-KB-ID: kb-prod" \
  -d '{"query": "revenue"}'

# techstart accessing kb-main
curl -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer techstart-token" \
  -H "X-Tenant-ID: techstart" \
  -H "X-KB-ID: kb-main" \
  -d '{"query": "funding"}'

# acme-corp trying to access techstart data (should fail)
curl -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer acme-corp-token" \
  -H "X-Tenant-ID: techstart" \
  -H "X-KB-ID: kb-main" \
  -d '{"query": "data"}' \
# Response: 403 Forbidden - "User does not have access to tenant"

Database Verification:

-- Verify tenant isolation at SQL level
SELECT COUNT(*) FROM lightrag_doc_full 
WHERE tenant_id = 'acme-corp' AND kb_id = 'kb-prod';  -- Count acme-corp documents

SELECT COUNT(*) FROM lightrag_doc_full 
WHERE tenant_id = 'techstart' AND kb_id = 'kb-main';  -- Count techstart documents

-- Verify no cross-tenant data in single query
SELECT DISTINCT tenant_id, kb_id FROM lightrag_doc_full;  -- Should see: acme-corp, techstart only

-- Verify composite indexes exist
\di lightrag_doc_full*  -- Should show idx on (tenant_id, kb_id, id)

Implementation Details

Environment Variable Configuration

File: ./starter/env.example (updated with new options):

# ============================================================================
# TESTING MODE CONFIGURATION
# ============================================================================

# Choose testing mode:
#   off   = Single-tenant compatibility mode (like main branch)
#   on    = Multi-tenant mode with single default tenant
#   demo  = Multi-tenant mode with 2 pre-configured tenants
MULTITENANT_MODE=demo

# For MULTITENANT_MODE=on, create additional KBs
# Format: kb_name,kb_name,kb_name
CREATE_DEFAULT_KB=kb-default,kb-secondary,kb-experimental

# ============================================================================
# TENANT CONFIGURATION (Used when MULTITENANT_MODE != off)
# ============================================================================

DEFAULT_TENANT=default
DEFAULT_KB=default

# Pre-configured demo tenants (for MULTITENANT_MODE=demo)
DEMO_TENANT_1_NAME=acme-corp
DEMO_TENANT_1_KBS=kb-prod,kb-dev

DEMO_TENANT_2_NAME=techstart
DEMO_TENANT_2_KBS=kb-main,kb-backup

# ============================================================================

Docker Compose Modifications

File: ./starter/docker-compose.yml (add initialization script):

services:
  postgres:
    environment:
      MULTITENANT_MODE: ${MULTITENANT_MODE:-demo}
    volumes:
      - ./init-postgres.sql:/docker-entrypoint-initdb.d/01-init.sql:ro
      - ./init-demo-tenants.sql:/docker-entrypoint-initdb.d/02-demo-tenants.sql:ro  # New

  lightrag-api:
    environment:
      MULTITENANT_MODE: ${MULTITENANT_MODE:-demo}
      DEFAULT_TENANT: ${DEFAULT_TENANT:-default}
      DEFAULT_KB: ${DEFAULT_KB:-default}
      CREATE_DEFAULT_KB: ${CREATE_DEFAULT_KB:-kb-default}

Initialization SQL Scripts

New File: ./starter/init-demo-tenants.sql

Creates the demo tenants and KBs when MULTITENANT_MODE=demo:

-- Only run if MULTITENANT_MODE=demo
-- This is handled via environment variable in docker-entrypoint

INSERT INTO tenants (tenant_id, tenant_name, description, is_active)
VALUES 
  ('acme-corp', 'ACME Corporation', 'Demo tenant 1: Large enterprise', true),
  ('techstart', 'TechStart Inc', 'Demo tenant 2: Startup', true);

INSERT INTO knowledge_bases (kb_id, tenant_id, kb_name, description, is_active)
VALUES 
  ('kb-prod', 'acme-corp', 'Production KB', 'Live production data', true),
  ('kb-dev', 'acme-corp', 'Development KB', 'Dev/staging data', true),
  ('kb-main', 'techstart', 'Main KB', 'Primary knowledge base', true),
  ('kb-backup', 'techstart', 'Backup KB', 'Backup and archival', true);

Testing Procedures

Procedure 1: Run Compatibility Mode Tests

cd ./starter

# 1. Configure for compatibility mode
cp env.example .env
echo "MULTITENANT_MODE=off" >> .env
echo "WORKSPACE_ISOLATION_TYPE=legacy" >> .env

# 2. Start services
make setup
make up
make init-db

# 3. Run compatibility tests
pytest ../tests/test_backward_compatibility.py -v

# 4. Run manual tests
python3 reproduce_issue.py  # Should work without tenant headers

Expected Results:

  • All T1.x test cases pass
  • No authorization failures due to missing tenant context
  • Database uses workspace namespace only
  • Queries return data regardless of tenant headers (or missing headers)

Procedure 2: Run Single-Tenant Multi-KB Tests

cd ./starter

# 1. Configure for single tenant with multiple KBs
cp env.example .env
echo "MULTITENANT_MODE=on" >> .env
echo "DEFAULT_TENANT=tenant-1" >> .env
echo "CREATE_DEFAULT_KB=kb-default,kb-secondary,kb-experimental" >> .env

# 2. Start services
make setup
make up
make init-db

# 3. Run isolation tests
pytest ../tests/test_multi_tenant_backends.py::TestTenantIsolation -v

# 4. Manual test: Verify KB isolation
# Create document in kb-default
curl -X POST http://localhost:8000/api/v1/documents/insert \
  -H "X-Tenant-ID: tenant-1" \
  -H "X-KB-ID: kb-default" \
  -d '{"document": "document in kb-default"}'

# Query should return document
curl -X POST http://localhost:8000/api/v1/query \
  -H "X-Tenant-ID: tenant-1" \
  -H "X-KB-ID: kb-default"

# Query in different KB should return NOTHING
curl -X POST http://localhost:8000/api/v1/query \
  -H "X-Tenant-ID: tenant-1" \
  -H "X-KB-ID: kb-secondary"

Expected Results:

  • All T2.x test cases pass
  • Data isolated by KB within same tenant
  • No cross-KB data leakage at API or database level
  • Composite keys work correctly

Procedure 3: Run Full Multi-Tenant Tests

cd ./starter

# 1. Configure for full multi-tenant demo mode (default)
cp env.example .env
echo "MULTITENANT_MODE=demo" >> .env

# 2. Start services
make setup
make up
make init-db

# 3. Run full multi-tenant tests
pytest ../tests/test_multi_tenant_backends.py -v
pytest ../tests/test_tenant_security.py -v

# 4. Manual test: Verify cross-tenant isolation
# Insert as acme-corp
curl -X POST http://localhost:8000/api/v1/documents/insert \
  -H "Authorization: Bearer acme-token" \
  -H "X-Tenant-ID: acme-corp" \
  -H "X-KB-ID: kb-prod" \
  -d '{"document": "acme document"}'

# Try to query as techstart (should fail or return empty)
curl -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer acme-token" \
  -H "X-Tenant-ID: techstart" \
  -H "X-KB-ID: kb-main"
# Expected: 403 Forbidden

# Query as acme should succeed
curl -X POST http://localhost:8000/api/v1/query \
  -H "Authorization: Bearer acme-token" \
  -H "X-Tenant-ID: acme-corp" \
  -H "X-KB-ID: kb-prod"
# Expected: 200 OK with results

# 5. Database verification
make db-shell
# SELECT COUNT(*) FROM lightrag_doc_full WHERE tenant_id='acme-corp';
# SELECT DISTINCT tenant_id FROM lightrag_doc_full;

Expected Results:

  • All T3.x test cases pass
  • Cross-tenant access denied (403)
  • Complete data isolation at database level
  • No authorization bypasses

Testing Matrix

Mode Tenant Headers Required Default Tenant Multiple KBs Multiple Tenants Test File
off No Always default Single workspace Single tenant test_backward_compatibility.py
on Yes Provided/resolved ✓ Multiple per tenant Single tenant only test_multi_tenant_backends.py
demo Yes None/explicit ✓ Multiple per tenant ✓ 2 pre-configured test_tenant_security.py

Test Coverage

Unit Tests

  • test_backward_compatibility.py: Validates old code paths still work
  • test_multi_tenant_backends.py: Validates storage layer isolation
  • test_tenant_models.py: Validates data models
  • test_tenant_security.py: Validates permission/authorization

Integration Tests

  • API Layer: test_tenant_api_routes.py
  • Database: test_tenant_storage_phase3.py
  • Graph Operations: test_graph_storage.py

Manual Verification

  • Database schema validation (composite keys, indexes)
  • Cross-tenant access attempts (should fail)
  • KB isolation verification
  • Authorization enforcement

Consequences

Positive

  1. Flexible Testing: Can test backward compatibility and new multi-tenant features
  2. Clear Procedures: Step-by-step procedures for each testing scenario
  3. Reproducibility: Environment variables make tests repeatable
  4. Safety: Explicit mode selection prevents accidental data mixing
  5. Documentation: Clear understanding of what each mode does
  6. Validation: Comprehensive test matrix covers all scenarios

Negative/Tradeoffs

  1. Configuration Complexity: Three modes add configuration overhead
  2. Initialization Scripts: Must maintain both legacy and multi-tenant initialization
  3. Testing Duration: Running all three modes sequentially takes time
  4. Documentation Maintenance: Must keep mode-specific docs up to date
  5. Docker Image Size: Includes both legacy and new code paths

Rollback/Migration

From Compatibility Mode to Multi-Tenant

# 1. Back up existing data
make db-backup

# 2. Switch mode
sed -i 's/MULTITENANT_MODE=off/MULTITENANT_MODE=on/' .env

# 3. Migrate database schema
# This requires: DROP old tables, CREATE new tables with tenant columns
# Migration script: scripts/migrate_to_multitenant.sql

# 4. Restart services
make restart

From Multi-Tenant back to Compatibility Mode

# 1. Back up multi-tenant data
make db-backup

# 2. Switch mode
sed -i 's/MULTITENANT_MODE=on/MULTITENANT_MODE=off/' .env

# 3. Extract single tenant data
# SELECT * FROM lightrag_doc_full WHERE tenant_id='default' 
# INTO workspace-based tables

# 4. Restart services
make restart

Verification Checklist

Before considering the ADR complete:

  • MULTITENANT_MODE=off works identically to main branch
  • MULTITENANT_MODE=on prevents cross-KB data access
  • MULTITENANT_MODE=demo prevents cross-tenant data access
  • Environment variable switching is seamless
  • All test cases (T1-T3) pass in their respective modes
  • Database schema matches mode requirements
  • Documentation reflects actual implementation
  • Integration tests run successfully
  • Manual verification procedures validate isolation
  • Authorization failures work correctly (403, 401, etc.)

References

  • ADR 001: Multi-Tenant Architecture Overview
  • ADR 002: Implementation Strategy
  • ADR 003: Data Models and Storage
  • ADR 004: API Design
  • ADR 005: Security Analysis

Implementation Files

  • lightrag/models/tenant.py: TenantContext, Tenant, KnowledgeBase models
  • lightrag/tenant_rag_manager.py: Per-tenant instance management
  • lightrag/api/dependencies.py: Tenant context extraction
  • tests/test_backward_compatibility.py: Legacy compatibility tests
  • tests/test_multi_tenant_backends.py: Multi-tenant backend tests
  • tests/test_tenant_security.py: Security validation

Starter Files

  • starter/docker-compose.yml: Service orchestration
  • starter/env.example: Configuration template
  • starter/Makefile: Testing procedures
  • starter/init-postgres.sql: Database initialization

Next Steps

  1. Implement environment variable handling in docker-entrypoint-initdb.d scripts
  2. Create demo tenant initialization SQL script (init-demo-tenants.sql)
  3. Update Makefile with mode-specific test targets
  4. Create test runner script that runs appropriate tests for each mode
  5. Document mode selection in README.md
  6. Create CI/CD workflow to test all three modes automatically
  7. Add health checks that validate mode-specific expectations
  8. Create migration scripts for switching between modes
  9. Update all existing ADRs to reference this testing strategy
  10. Add mode detection to API startup (warn if wrong mode configuration)

Document Version: 1.0
Last Updated: 2025-11-22
Author: Architecture Design Process
Status: Proposed - Ready for Implementation Review

Implementation Notes:

  • Based on actual code examination of feat/multi-tenant branch
  • Verified against: tenant.py, tenant_rag_manager.py, dependencies.py, docker-compose.yml
  • Tested scenarios aligned with actual test files in tests/ directory
  • Configuration options match env.example and existing environment setup