* feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit
681 lines
35 KiB
Markdown
681 lines
35 KiB
Markdown
# Multi-Tenant Visual Reference
|
|
|
|
> Quick reference guide with visual-first approach to multi-tenant concepts
|
|
|
|
**Last Updated**: November 20, 2025
|
|
**Status**: Production Ready
|
|
**Purpose**: Quick lookup for diagrams, patterns, and implementation checklists
|
|
|
|
---
|
|
|
|
## Color Scheme & Design
|
|
|
|
### Professional Pastel Palette
|
|
|
|
The documentation uses 5 carefully selected pastel colors designed for accessibility:
|
|
|
|
| Color | Hex Code | Use Case | Pastel | Bold | Text |
|
|
|-------|----------|----------|--------|------|------|
|
|
| Teal | #E0F2F1 / #00796B | Storage/Data | Light | Dark | #004D40 |
|
|
| Purple | #F3E5F5 / #6A1B9A | Tenants/Organization | Light | Dark | #38006B |
|
|
| Green | #E8F5E9 / #2E7D32 | Success/Deployment | Light | Dark | #1B5E20 |
|
|
| Orange | #FFF3E0 / #E65100 | Vectors/Performance | Light | Dark | #BF360C |
|
|
| Red | #FFEBEE / #C62828 | Security/Warnings | Light | Dark | #C62828 |
|
|
|
|
**Design Philosophy**:
|
|
- Pastel backgrounds reduce eye strain
|
|
- Bold accent colors provide contrast
|
|
- Designed for colorblind accessibility
|
|
- Professional yet pleasant appearance
|
|
|
|
---
|
|
|
|
## System Architecture Diagram
|
|
|
|
```
|
|
┌────────────────────────────────────────────────────────────────┐
|
|
│ LightRAG Multi-Tenant System │
|
|
├────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────────────┐ ┌──────────────────────────┐ │
|
|
│ │ Client Applications │ │ API Gateway/Middleware │ │
|
|
│ │ │ │ (Extract Tenant Context)│ │
|
|
│ │ - Web App │─────>│ - tenant_id │ │
|
|
│ │ - Mobile App │ │ - kb_id │ │
|
|
│ │ - CLI Tools │ │ │ │
|
|
│ │ - Batch Jobs │ │ - Validate Access │ │
|
|
│ └─────────────────────────┘ │ - Log Operations │ │
|
|
│ └──────────┬───────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌──────────────────────┐ │
|
|
│ │ LightRAG Core │ │
|
|
│ │ (Tenant-Aware) │ │
|
|
│ │ │ │
|
|
│ │ - Query Builder │ │
|
|
│ │ - Filter Generator │ │
|
|
│ │ - Response Handler │ │
|
|
│ └──────────┬───────────┘ │
|
|
│ │ │
|
|
│ ┌────────────────────────────┼───────────────┐ │
|
|
│ │ │ │ │
|
|
│ ▼ ▼ ▼ │
|
|
│ ┌──────────────────────┐ ┌─────────────────────┐ ┌─────┐ │
|
|
│ │ Relational DB │ │ Document DB │ │ KV │ │
|
|
│ │ (PostgreSQL) │ │ (MongoDB) │ │Store│ │
|
|
│ │ │ │ │ │(Red)│ │
|
|
│ │ Rows by: │ │ Docs by: │ │Keys:│ │
|
|
│ │ (tenant, kb, id) │ │ {tenant,kb,...} │ │t:k: │ │
|
|
│ │ │ │ │ │key │ │
|
|
│ └──────────────────────┘ └─────────────────────┘ └─────┘ │
|
|
│ │ │ │ │
|
|
│ │ │ │ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ Vector DBs & Graph DBs │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ Qdrant │ │ Neo4j │ │ NetworkX │ │ │
|
|
│ │ │ Metadata │ │ Node Props │ │ Subgraph │ │ │
|
|
│ │ │ Filter │ │ WHERE clause │ │ Extract │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
|
|
│ │ │ │
|
|
│ │ All scoped to (tenant_id, kb_id) automatically │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Core Principle: NO tenant context escapes storage layer │
|
|
│ │
|
|
└────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Data Isolation Layers
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Data Isolation - Three Layers │
|
|
├─────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ LAYER 1: Tenant Isolation │
|
|
│ ┌──────────────────────────────────────────────────┐ │
|
|
│ │ Highest level: Different organizations/customers │ │
|
|
│ │ │ │
|
|
│ │ Acme Corp │ TechStart Inc │ │
|
|
│ │ tenant:acme │ tenant:techstart │ │
|
|
│ │ │ │
|
|
│ │ Complete separation - no cross-tenant access │ │
|
|
│ └──────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ LAYER 2: Knowledge Base Isolation │
|
|
│ ┌──────────────────────────────────────────────────┐ │
|
|
│ │ Within tenant: Different projects/environments │ │
|
|
│ │ │ │
|
|
│ │ Acme Corp: │ │
|
|
│ │ ├─ kb-prod (Production) │ │
|
|
│ │ ├─ kb-staging (Pre-production) │ │
|
|
│ │ └─ kb-dev (Development) │ │
|
|
│ │ │ │
|
|
│ │ Data in kb-prod never leaks to kb-staging │ │
|
|
│ └──────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ LAYER 3: Resource Isolation │
|
|
│ ┌──────────────────────────────────────────────────┐ │
|
|
│ │ Within kb: Documents, entities, vectors, etc. │ │
|
|
│ │ │ │
|
|
│ │ kb-prod: │ │
|
|
│ │ ├─ Document: "sales-report-2025" │ │
|
|
│ │ ├─ Entity: "John Doe" │ │
|
|
│ │ ├─ Vector: <embedding vector> │ │
|
|
│ │ └─ Relation: "manages" (between entities) │ │
|
|
│ │ │ │
|
|
│ │ All accessed only via (tenant, kb) context │ │
|
|
│ └──────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Access Pattern: tenant -> kb -> resources │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Query Execution Flow
|
|
|
|
```
|
|
START
|
|
│
|
|
├─> Receive Request
|
|
│ GET /api/documents?status=active
|
|
│ Header: tenant-id=acme-corp
|
|
│ Header: kb-id=kb-prod
|
|
│
|
|
├─> Extract Tenant Context
|
|
│ tenant_id = "acme-corp"
|
|
│ kb_id = "kb-prod"
|
|
│ [VALIDATED: User owns this tenant/kb]
|
|
│
|
|
├─> Build Application Query
|
|
│ Base: "SELECT * FROM documents WHERE status='active'"
|
|
│
|
|
├─> Apply Tenant Filter
|
|
│ Final: "SELECT * FROM documents"
|
|
│ "WHERE status='active'"
|
|
│ "AND tenant_id='acme-corp'"
|
|
│ "AND kb_id='kb-prod'"
|
|
│
|
|
├─> Execute Query (Storage Layer)
|
|
│ PostgreSQL/MongoDB/Redis/etc.
|
|
│ [ENFORCED: Only returns scoped rows]
|
|
│
|
|
├─> Process Results
|
|
│ ├─> Acme Corp's documents: RETURNED
|
|
│ ├─> TechStart's documents: FILTERED OUT
|
|
│ └─> Other tenant's documents: FILTERED OUT
|
|
│
|
|
├─> Return to Client
|
|
│ {
|
|
│ "tenant": "acme-corp",
|
|
│ "kb": "kb-prod",
|
|
│ "documents": [...],
|
|
│ "count": 42
|
|
│ }
|
|
│
|
|
END (with tenant/kb context intact)
|
|
```
|
|
|
|
---
|
|
|
|
## Composite Key Pattern
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────┐
|
|
│ Composite Key Structure │
|
|
├──────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Single Key (OLD - Not Tenant-Aware) │
|
|
│ ┌──────────────────────────────────┐ │
|
|
│ │ id: 12345 │ │
|
|
│ │ │ │
|
|
│ │ Problem: Same ID in different │ │
|
|
│ │ tenants = collision! │ │
|
|
│ └──────────────────────────────────┘ │
|
|
│ │
|
|
│ Composite Key (NEW - Tenant-Aware) │
|
|
│ ┌──────────────────────────────────┐ │
|
|
│ │ (tenant_id, kb_id, id) │ │
|
|
│ │ ("acme", "kb-prod", "doc-123") │ │
|
|
│ │ │ │
|
|
│ │ Same ID with different contexts: │ │
|
|
│ │ ("acme", "kb-prod", "doc-123") │ <- Acme │
|
|
│ │ ("acme", "kb-dev", "doc-123") │ <- Acme │
|
|
│ │ ("techstart", "kb-main", "123") │ <- Tech │
|
|
│ │ │ │
|
|
│ │ All unique! No collisions! │ │
|
|
│ └──────────────────────────────────┘ │
|
|
│ │
|
|
│ Storage Implementation │
|
|
│ ┌──────────────────────────────────┐ │
|
|
│ │ PostgreSQL: │ │
|
|
│ │ PRIMARY KEY (tenant_id, kb_id, id) │
|
|
│ │ │ │
|
|
│ │ MongoDB: │ │
|
|
│ │ db.createIndex({ │ │
|
|
│ │ tenant_id: 1, │ │
|
|
│ │ kb_id: 1, │ │
|
|
│ │ _id: 1 │ │
|
|
│ │ }) │ │
|
|
│ │ │ │
|
|
│ │ Redis: │ │
|
|
│ │ key = "tenant:kb:id" │ │
|
|
│ │ key = "acme:kb-prod:doc-123" │ │
|
|
│ └──────────────────────────────────┘ │
|
|
│ │
|
|
└──────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Data Organization by Backend
|
|
|
|
```
|
|
┌───────────────────────────────────────────────────────┐
|
|
│ How Each Backend Organizes Tenant Data │
|
|
├───────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ PostgreSQL │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ Table: documents │ │
|
|
│ │ ┌─────────┬──────┬────┬──────────┬──────────┐ │ │
|
|
│ │ │tenant_id│kb_id │ id │ title │ content │ │ │
|
|
│ │ ├─────────┼──────┼────┼──────────┼──────────┤ │ │
|
|
│ │ │ acme │prod │ 1 │ Report │ [data] │ │ │
|
|
│ │ │ acme │dev │ 2 │ Draft │ [data] │ │ │
|
|
│ │ │ tech │main │ 1 │ Spec │ [data] │ │ │
|
|
│ │ └─────────┴──────┴────┴──────────┴──────────┘ │ │
|
|
│ │ Row filtering: WHERE tenant='acme' AND kb='prod' │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ MongoDB │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ Collection: documents │ │
|
|
│ │ Document 1: {tenant:"acme", kb:"prod", _id:1} │ │
|
|
│ │ Document 2: {tenant:"acme", kb:"dev", _id:2} │ │
|
|
│ │ Document 3: {tenant:"tech", kb:"main", _id:1} │ │
|
|
│ │ │ │
|
|
│ │ Filter: {tenant:"acme", kb:"prod"} │ │
|
|
│ │ Returns: Document 1 only │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Redis │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ Key Namespace Pattern: │ │
|
|
│ │ "acme:prod:doc:1" -> Document 1 (Acme) │ │
|
|
│ │ "acme:dev:doc:2" -> Document 2 (Acme) │ │
|
|
│ │ "tech:main:doc:1" -> Document 1 (Tech) │ │
|
|
│ │ │ │
|
|
│ │ Query pattern: "acme:prod:*" │ │
|
|
│ │ Returns: All keys matching tenant:kb scope │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Qdrant (Vector DB) │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ Collection: embeddings │ │
|
|
│ │ │ │
|
|
│ │ Point 1: { │ │
|
|
│ │ "vector": [...], │ │
|
|
│ │ "payload": { │ │
|
|
│ │ "tenant_id": "acme", │ │
|
|
│ │ "kb_id": "prod" │ │
|
|
│ │ } │ │
|
|
│ │ } │ │
|
|
│ │ │ │
|
|
│ │ Search filter: │ │
|
|
│ │ {"must": [ │ │
|
|
│ │ {"key":"tenant_id", "match":{"value":"acme"}},│
|
|
│ │ {"key":"kb_id", "match":{"value":"prod"}} │ │
|
|
│ │ ]} │ │
|
|
│ │ Returns: Only vectors with matching metadata │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Neo4j (Graph DB) │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ Node structure: │ │
|
|
│ │ (Entity { │ │
|
|
│ │ tenant_id: "acme", │ │
|
|
│ │ kb_id: "prod", │ │
|
|
│ │ name: "John Doe" │ │
|
|
│ │ }) │ │
|
|
│ │ │ │
|
|
│ │ Query: │ │
|
|
│ │ MATCH (n:Entity) │ │
|
|
│ │ WHERE n.tenant_id = 'acme' │ │
|
|
│ │ AND n.kb_id = 'prod' │ │
|
|
│ │ RETURN n │ │
|
|
│ │ Returns: Entities scoped to acme:prod │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
└───────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Security Boundaries
|
|
|
|
```
|
|
┌────────────────────────────────────────────────────┐
|
|
│ Security Boundary Enforcement │
|
|
├────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Client Request │
|
|
│ GET /documents?tenant=acme-corp&kb=kb-prod │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ API Layer - VALIDATE │
|
|
│ ┌────────────────────────────────────────────┐ │
|
|
│ │ Check: User has permission for tenant │ │
|
|
│ │ Check: kb_id belongs to tenant │ │
|
|
│ │ Failure: Return 403 Forbidden │ │
|
|
│ └────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ (validated) │
|
|
│ Query Builder - ENFORCE │
|
|
│ ┌────────────────────────────────────────────┐ │
|
|
│ │ Base query: SELECT * FROM documents │ │
|
|
│ │ │ │
|
|
│ │ Add filter: AND tenant_id='acme-corp' │ │
|
|
│ │ Add filter: AND kb_id='kb-prod' │ │
|
|
│ │ │ │
|
|
│ │ Even if app developer forgets tenant │ │
|
|
│ │ context, storage layer won't return data │ │
|
|
│ └────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ Storage Layer - DATABASE │
|
|
│ ┌────────────────────────────────────────────┐ │
|
|
│ │ PostgreSQL executes: │ │
|
|
│ │ SELECT * FROM documents │ │
|
|
│ │ WHERE tenant_id='acme-corp' │ │
|
|
│ │ AND kb_id='kb-prod' │ │
|
|
│ │ │ │
|
|
│ │ Result: Only matching rows returned │ │
|
|
│ │ Impossible to get other tenant data │ │
|
|
│ └────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ Response │
|
|
│ Documents from acme-corp/kb-prod ONLY │
|
|
│ │
|
|
│ Key Point: TWO layers of protection │
|
|
│ 1. API validation (user has access) │
|
|
│ 2. Database enforcement (scope in query) │
|
|
│ │
|
|
│ If either fails: NO DATA LEAKED │
|
|
│ │
|
|
└────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Decision Tree
|
|
|
|
```
|
|
START: Need to implement multi-tenant feature?
|
|
│
|
|
├─> YES, new feature
|
|
│ │
|
|
│ ├─> Data needs tenant/kb context?
|
|
│ │ │
|
|
│ │ ├─> YES
|
|
│ │ │ └─> Use TenantContext in all queries
|
|
│ │ │ Add tenant_id, kb_id to schema
|
|
│ │ │ Use support module helpers
|
|
│ │ │
|
|
│ │ └─> NO (metadata, config, etc.)
|
|
│ │ └─> Store normally, reference by tenant later
|
|
│ │
|
|
│ └─> Done: Feature is multi-tenant safe
|
|
│
|
|
├─> NO, maintaining existing feature
|
|
│ │
|
|
│ ├─> Feature crosses tenant boundaries?
|
|
│ │ │
|
|
│ │ ├─> YES (e.g., searching across tenants)
|
|
│ │ │ └─> Explicitly separate results by tenant
|
|
│ │ │ Never merge tenant data
|
|
│ │ │ Document cross-tenant behavior
|
|
│ │ │
|
|
│ │ └─> NO (operates within single tenant)
|
|
│ │ └─> Add tenant filter to query
|
|
│ │ Test with multiple tenants
|
|
│ │
|
|
│ └─> Done: Feature remains tenant-safe
|
|
│
|
|
└─> Migration time
|
|
│
|
|
├─> Have existing single-tenant data?
|
|
│ │
|
|
│ ├─> YES
|
|
│ │ └─> Run migration script with dry-run
|
|
│ │ Backup data
|
|
│ │ Verify statistics
|
|
│ │ Apply migration
|
|
│ │ Run tests
|
|
│ │
|
|
│ └─> NO (new deployment)
|
|
│ └─> Deploy with multi-tenant enabled
|
|
│ No migration needed
|
|
│
|
|
└─> Done: Data is multi-tenant compatible
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Implementation Checklist
|
|
|
|
```
|
|
BEFORE IMPLEMENTATION
|
|
[ ] Read Section 2: Data Isolation Layers
|
|
[ ] Review relevant backend examples (Section 6)
|
|
[ ] Check if new tables needed - plan composite keys
|
|
[ ] Get team buy-in on tenant context requirements
|
|
|
|
DURING IMPLEMENTATION
|
|
[ ] Add tenant_id, kb_id to schema (if new data)
|
|
[ ] Use TenantSQLBuilder/MongoTenantHelper/etc.
|
|
[ ] Extract tenant context from request headers
|
|
[ ] Add tests with multiple tenants
|
|
[ ] Add tenant context to logging/monitoring
|
|
[ ] Update documentation with tenant notes
|
|
|
|
BEFORE TESTING
|
|
[ ] Verify composite indexes exist
|
|
[ ] Check that all queries include tenant filter
|
|
[ ] Review code for hardcoded assumptions
|
|
[ ] Ensure tenant context flows through async tasks
|
|
[ ] Set up test data for multiple tenants
|
|
|
|
TESTING
|
|
[ ] Single tenant operations work
|
|
[ ] Multiple tenant queries return correct data
|
|
[ ] Cross-tenant queries return nothing
|
|
[ ] Edge cases: empty results, large datasets
|
|
[ ] Performance: check index usage with EXPLAIN
|
|
[ ] Concurrent operations from multiple tenants
|
|
|
|
BEFORE PRODUCTION
|
|
[ ] Run full test suite multiple times
|
|
[ ] Load test with multiple tenants
|
|
[ ] Backup production database
|
|
[ ] Have rollback plan ready
|
|
[ ] Monitor tenant-specific metrics
|
|
[ ] Update runbooks for multi-tenant queries
|
|
|
|
AFTER DEPLOYMENT
|
|
[ ] Monitor for 24+ hours
|
|
[ ] Check logs for any tenant context issues
|
|
[ ] Verify performance didn't degrade
|
|
[ ] Get user feedback from different tenants
|
|
[ ] Document any lessons learned
|
|
```
|
|
|
|
---
|
|
|
|
## Integration Points
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────┐
|
|
│ Where Multi-Tenant Touches System │
|
|
├──────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ API Layer │
|
|
│ ├─> Authentication: Get user's tenant ID │
|
|
│ ├─> Headers: Extract tenant_id, kb_id │
|
|
│ ├─> Validation: Verify user owns tenant │
|
|
│ └─> Responses: Always include tenant context │
|
|
│ │
|
|
│ Query Layer │
|
|
│ ├─> Query Builder: Add tenant filters │
|
|
│ ├─> Parameters: Include tenant values │
|
|
│ ├─> Optimization: Use composite indexes │
|
|
│ └─> Caching: Key by (tenant, kb, ...) │
|
|
│ │
|
|
│ Storage Layer │
|
|
│ ├─> Schema: (tenant_id, kb_id) in composite key │
|
|
│ ├─> Indexes: Multi-column indexes │
|
|
│ ├─> Constraints: Prevent ID collisions │
|
|
│ └─> Filters: WHERE clause enforcement │
|
|
│ │
|
|
│ Monitoring & Logging │
|
|
│ ├─> Logs: Include tenant in all entries │
|
|
│ ├─> Metrics: Track per-tenant usage │
|
|
│ ├─> Alerts: Tenant-specific thresholds │
|
|
│ └─> Audit: Record who accessed what data │
|
|
│ │
|
|
│ Testing │
|
|
│ ├─> Unit Tests: Test with multiple tenants │
|
|
│ ├─> Integration: Test isolation between tenants │
|
|
│ ├─> Performance: Benchmark multi-tenant queries │
|
|
│ └─> Security: Verify no data leaks │
|
|
│ │
|
|
└──────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Characteristics
|
|
|
|
| Scenario | Single Tenant | Multi Tenant | Notes |
|
|
|----------|---|---|---|
|
|
| **Query Speed** | Baseline | +0-5% | Composite index slightly slower on insert |
|
|
| **Storage Size** | Baseline | +5-10% | tenant_id, kb_id columns add overhead |
|
|
| **Index Count** | Fewer | More | Composite indexes needed |
|
|
| **Query Plans** | Simple | Clear | WHERE clause filters effectively |
|
|
| **Concurrent Access** | Good | Excellent | Isolation prevents lock contention |
|
|
| **Cache Efficiency** | High | Medium | Must key by (tenant, kb) |
|
|
|
|
---
|
|
|
|
## Quick Reference Patterns
|
|
|
|
### Pattern 1: Simple Query
|
|
|
|
```python
|
|
# PostgreSQL
|
|
from lightrag.kg.postgres_tenant_support import TenantSQLBuilder
|
|
|
|
sql = "SELECT * FROM documents WHERE status = :status"
|
|
filtered_sql, params = TenantSQLBuilder.build_filtered_query(
|
|
sql, tenant_id="acme", kb_id="prod",
|
|
additional_params=[{"status": "active"}]
|
|
)
|
|
results = await db.query(filtered_sql, params)
|
|
# Returns: Only active documents from acme/prod
|
|
```
|
|
|
|
### Pattern 2: Filter + Sort
|
|
|
|
```python
|
|
# MongoDB
|
|
from lightrag.kg.mongo_tenant_support import MongoTenantHelper
|
|
|
|
query = MongoTenantHelper.get_tenant_filter(
|
|
tenant_id="acme", kb_id="prod",
|
|
additional_filter={"status": "active"}
|
|
)
|
|
results = await collection.find(query).sort("created_at", -1).limit(10)
|
|
# Returns: Latest 10 active docs from acme/prod
|
|
```
|
|
|
|
### Pattern 3: Batch Operations
|
|
|
|
```python
|
|
# Redis batch
|
|
from lightrag.kg.redis_tenant_support import RedisTenantNamespace
|
|
|
|
ns = RedisTenantNamespace(redis, "acme", "prod")
|
|
|
|
# Batch set
|
|
await ns.mset({
|
|
"user:1": json.dumps(user1_data),
|
|
"user:2": json.dumps(user2_data),
|
|
})
|
|
|
|
# Batch get - all scoped to acme:prod
|
|
users = await ns.mget("user:1", "user:2")
|
|
# Keys expanded to "acme:prod:user:1", "acme:prod:user:2"
|
|
```
|
|
|
|
---
|
|
|
|
## Learning Path
|
|
|
|
A structured 7-step progression to understand multi-tenant architecture:
|
|
|
|
1. **Understand the Problem** (10 min)
|
|
- Read: Section 1 - Overview
|
|
- Watch the Real-World Scenario diagram
|
|
- Why: Single deployment, multiple customers
|
|
|
|
2. **Learn the Concepts** (15 min)
|
|
- Read: Section 2 - Data Isolation Layers
|
|
- Read: Section 3 - Composite Key Pattern
|
|
- Why: How isolation actually works
|
|
|
|
3. **See the Architecture** (10 min)
|
|
- Read: Section 4 - System Architecture Diagram
|
|
- Read: Section 5 - Query Execution Flow
|
|
- Why: How requests are processed
|
|
|
|
4. **Find Your Backend** (10 min)
|
|
- Read: Section 6 - Data Organization by Backend
|
|
- Find your database type (PostgreSQL/MongoDB/Redis/etc.)
|
|
- Why: Each backend has different approach
|
|
|
|
5. **Implement the Pattern** (20 min)
|
|
- Read: Section 7 - Quick Reference Patterns
|
|
- Copy the relevant example for your backend
|
|
- Adapt it to your use case
|
|
- Why: Actual working code you can use
|
|
|
|
6. **Secure It** (15 min)
|
|
- Read: Section 8 - Security Boundaries
|
|
- Review the checklist
|
|
- Why: Prevent cross-tenant data leaks
|
|
|
|
7. **Test & Deploy** (30 min)
|
|
- Use: Section 9 - Quick Implementation Checklist
|
|
- Run multi-tenant tests
|
|
- Deploy to production
|
|
- Monitor for issues
|
|
- Why: Ensure reliability
|
|
|
|
**Total Time**: ~90 minutes to full understanding and implementation
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
After implementing multi-tenant support, verify:
|
|
|
|
- [YES] Multiple tenants can exist in same deployment
|
|
- [YES] Tenant A cannot access Tenant B's data
|
|
- [YES] Queries automatically scoped to tenant
|
|
- [YES] No breaking changes to existing code
|
|
- [YES] All 10 backends supported
|
|
- [YES] Performance within baseline +5%
|
|
- [YES] Composite indexes created
|
|
- [YES] Tests pass with multiple tenants
|
|
- [YES] Logging includes tenant context
|
|
- [YES] Backward compatible with single-tenant code
|
|
|
|
---
|
|
|
|
## Common Questions
|
|
|
|
**Q: Do I need to change my existing code?**
|
|
A: No. Multi-tenant is built-in with defaults. Use support modules for new features.
|
|
|
|
**Q: What about backward compatibility?**
|
|
A: Complete. Legacy code uses "default" tenant automatically.
|
|
|
|
**Q: How do I test multi-tenant isolation?**
|
|
A: Create test data in 2+ tenants, verify queries return only scoped data.
|
|
|
|
**Q: Can I run single and multi-tenant tenants together?**
|
|
A: Yes. All data coexists. Default tenant for legacy code.
|
|
|
|
**Q: What if a query is missing tenant filter?**
|
|
A: Returns empty result (safe). Logging will show missing context.
|
|
|
|
---
|
|
|
|
## Resources
|
|
|
|
- Full Details: See `0001-multi-tenant-architecture.md`
|
|
- Navigation: See `0003-multi-tenant-documentation-index.md`
|
|
- Code Modules: See `lightrag/kg/` directory
|
|
- Tests: See `tests/test_multi_tenant_*.py`
|
|
|
|
---
|
|
|
|
**Status**: Production Ready
|
|
**Last Updated**: November 20, 2025
|
|
**Questions?** Review the learning path or check full architecture guide
|