# Multi-Tenant vs Workspace Architecture Audit Report **Date:** 2024-12-05 **Status:** ✅ PASSED - No Redundancy Found **Author:** AI Audit Agent ## Executive Summary This audit evaluates whether the **Multi-Tenant feature** (local HKU implementation) is redundant with the **Workspace feature** (upstream HKUDS/LightRAG). **Verdict: NOT REDUNDANT** - The features serve different purposes in a well-designed layered architecture: | Feature | Layer | Purpose | |---------|-------|---------| | **Workspace** (upstream) | Storage Layer | Low-level data isolation mechanism in database tables | | **Tenant** (local) | Application Layer | High-level multi-tenant SaaS with user management, RBAC, and APIs | The Tenant feature **extends and uses** the Workspace feature - it's a proper abstraction layer, not duplication. --- ## 1. Workspace Feature (Upstream LightRAG) ### 1.1 Purpose The `workspace` parameter in LightRAG provides **storage-level data isolation** between different LightRAG instances. ### 1.2 Implementation **Core Parameter:** ```python # From lightrag/lightrag.py @dataclass class LightRAG: workspace: str = field(default_factory=lambda: os.getenv("WORKSPACE", "")) """Workspace for data isolation. Defaults to empty string if WORKSPACE environment variable is not set.""" ``` **Storage Isolation:** All storage classes receive the `workspace` parameter and use it in their primary keys: ```python # From lightrag/lightrag.py - storage initialization self.llm_response_cache = self.key_string_value_json_storage_cls( namespace=NameSpace.KV_STORE_LLM_RESPONSE_CACHE, workspace=self.workspace, # Passed to all storages ... ) ``` **Database Schema (PostgreSQL):** ```sql -- Every LIGHTRAG_* table has workspace in PRIMARY KEY CREATE TABLE LIGHTRAG_DOC_FULL ( id VARCHAR(255), workspace VARCHAR(255), ... CONSTRAINT LIGHTRAG_DOC_FULL_PK PRIMARY KEY (workspace, id) ); ``` ### 1.3 Environment Variables | Variable | Storage Type | Description | |----------|-------------|-------------| | `WORKSPACE` | Generic | Default workspace for all storages | | `POSTGRES_WORKSPACE` | PostgreSQL | PostgreSQL-specific workspace | | `REDIS_WORKSPACE` | Redis | Redis-specific workspace | | `MONGODB_WORKSPACE` | MongoDB | MongoDB-specific workspace | | `MILVUS_WORKSPACE` | Milvus | Milvus-specific workspace | | `QDRANT_WORKSPACE` | Qdrant | Qdrant-specific workspace | | `NEO4J_WORKSPACE` | Neo4j | Neo4j-specific workspace | ### 1.4 Limitations The workspace feature provides **only storage isolation**: - ❌ No user management - ❌ No authentication/authorization - ❌ No CRUD API for workspace management - ❌ No metadata or descriptions - ❌ No UI support - ❌ No concept of multiple knowledge bases per workspace --- ## 2. Multi-Tenant Feature (Local Implementation) ### 2.1 Purpose The Multi-Tenant feature provides a **complete SaaS multi-tenancy layer** on top of LightRAG, including: - Organization (tenant) management - Multiple knowledge bases per tenant - Role-based access control (RBAC) - User-tenant membership - REST API for management - WebUI for tenant/KB selection ### 2.2 Key Components | Component | File | Purpose | |-----------|------|---------| | **Tenant Model** | `lightrag/models/tenant.py` | Data models for Tenant, KnowledgeBase, TenantContext | | **TenantService** | `lightrag/services/tenant_service.py` | CRUD operations, access verification | | **TenantRAGManager** | `lightrag/tenant_rag_manager.py` | Manages RAG instances per tenant/KB | | **Tenant Routes** | `lightrag/api/routers/tenant_routes.py` | REST API endpoints | | **Security** | `lightrag/security.py` | Validation, path traversal prevention | ### 2.3 How Tenant Uses Workspace **Critical Integration Point:** ```python # From lightrag/tenant_rag_manager.py async def get_rag_instance(self, tenant_id: str, kb_id: str, user_id: str): # SECURITY: Validate identifiers tenant_id = validate_identifier(tenant_id, "tenant_id") kb_id = validate_identifier(kb_id, "kb_id") # Create composite workspace tenant_working_dir, composite_workspace = validate_working_directory( self.base_working_dir, tenant_id, kb_id ) # composite_workspace = f"{tenant_id}:{kb_id}" # Create RAG instance with composite workspace instance = LightRAG( working_dir=tenant_working_dir, workspace=composite_workspace, # Uses workspace under the hood! ... ) ``` **The Tenant feature DELEGATES to Workspace for actual data isolation.** ### 2.4 Database Schema **Management Tables (Tenant Layer):** ```sql -- Tenant metadata CREATE TABLE tenants ( tenant_id VARCHAR(255) UNIQUE NOT NULL, name VARCHAR(255) NOT NULL, description TEXT, metadata JSONB, ... ); -- Knowledge bases within tenants CREATE TABLE knowledge_bases ( tenant_id VARCHAR(255) REFERENCES tenants(tenant_id), kb_id VARCHAR(255) NOT NULL, name VARCHAR(255) NOT NULL, ... ); -- User access control CREATE TABLE user_tenant_memberships ( user_id VARCHAR(255) NOT NULL, tenant_id VARCHAR(255) REFERENCES tenants(tenant_id), role VARCHAR(50) NOT NULL, -- owner, admin, editor, viewer ... ); ``` **Generated Columns for Integration:** ```sql -- LIGHTRAG_* tables have generated columns to extract tenant/kb ALTER TABLE LIGHTRAG_DOC_FULL ADD COLUMN tenant_id VARCHAR(255) GENERATED ALWAYS AS ( CASE WHEN workspace LIKE '%:%' THEN SPLIT_PART(workspace, ':', 1) ELSE workspace END ) STORED, kb_id VARCHAR(255) GENERATED ALWAYS AS ( CASE WHEN workspace LIKE '%:%' THEN SPLIT_PART(workspace, ':', 2) ELSE 'default' END ) STORED; ``` This allows querying data by tenant/KB without modifying the core storage implementation. ### 2.5 Roles and Permissions | Role | Permissions | |------|-------------| | **Owner** | Full control, manage members, delete tenant | | **Admin** | Create/delete KBs, manage documents | | **Editor** | Create/update/delete documents, run queries | | **Viewer** | Read documents, run queries | --- ## 3. Architecture Comparison ### 3.1 Feature Matrix | Aspect | Workspace (Upstream) | Tenant (Local) | |--------|---------------------|----------------| | Data Isolation | ✅ Storage-level | ✅ Uses workspace | | User Management | ❌ | ✅ Full RBAC | | Authentication | ❌ | ✅ JWT tokens | | Authorization | ❌ | ✅ Role-based | | CRUD API | ❌ | ✅ REST endpoints | | Multiple KBs | ❌ One per workspace | ✅ Many per tenant | | Configuration | ❌ Global only | ✅ Per-tenant | | Quotas/Limits | ❌ | ✅ Per-tenant | | Metadata | ❌ | ✅ Rich metadata | | UI Support | ❌ | ✅ Selection UI | | File Storage | ✅ Subdirectories | ✅ Uses subdirs | | Backward Compatible | ✅ | ✅ Single-tenant mode | ### 3.2 Layered Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ WebUI / REST API │ │ - Tenant/KB selection │ │ - Document upload, query interface │ ├─────────────────────────────────────────────────────────────┤ │ Authentication Layer │ │ - JWT token validation │ │ - User session management │ ├─────────────────────────────────────────────────────────────┤ │ Authorization Layer │ │ - TenantService.verify_user_access() │ │ - Role-based permission checks │ ├─────────────────────────────────────────────────────────────┤ │ TenantRAGManager (Instance Cache) │ │ - Manages per-tenant/KB LightRAG instances │ │ - LRU eviction for memory management │ │ - Creates composite_workspace = "{tenant}:{kb}" │ ├─────────────────────────────────────────────────────────────┤ │ LightRAG Core │ │ - Uses workspace for storage isolation │ │ - KV, Vector, Graph, DocStatus storages │ ├─────────────────────────────────────────────────────────────┤ │ PostgreSQL / Storage Backend │ │ - PRIMARY KEY (workspace, id) for isolation │ │ - Generated columns extract tenant_id, kb_id │ └─────────────────────────────────────────────────────────────┘ ``` --- ## 4. Findings ### 4.1 No Redundancy Found ✅ The Tenant feature is **complementary**, not redundant: 1. **Workspace** = Storage mechanism (HOW data is isolated) 2. **Tenant** = Application layer (WHO can access WHAT data) They work together: ``` User Request → Tenant Auth → TenantRAGManager → workspace="{tenant}:{kb}" → Storage ``` ### 4.2 Design Quality Assessment | Criterion | Score | Notes | |-----------|-------|-------| | Separation of Concerns | ⭐⭐⭐⭐⭐ | Clean layered architecture | | Code Reuse | ⭐⭐⭐⭐⭐ | Tenant uses workspace, doesn't duplicate | | Security | ⭐⭐⭐⭐ | Validation, RBAC, path traversal prevention | | Backward Compatibility | ⭐⭐⭐⭐⭐ | Single-tenant mode still works | | Database Design | ⭐⭐⭐⭐ | Generated columns enable efficient queries | ### 4.3 Positive Design Decisions 1. **Composite Workspace Format:** Using `{tenant_id}:{kb_id}` as workspace allows multiple KBs per tenant while reusing storage isolation 2. **Generated Columns:** PostgreSQL generated columns (`tenant_id`, `kb_id`) enable efficient queries without schema changes to core tables 3. **Instance Caching:** TenantRAGManager caches RAG instances with LRU eviction for performance 4. **Security Validation:** `validate_identifier()` and `validate_working_directory()` prevent injection and path traversal 5. **Environment Toggle:** `LIGHTRAG_MULTI_TENANT` allows switching between single-tenant and multi-tenant modes --- ## 5. Recommendations ### 5.1 Improvements Needed | Priority | Issue | Recommendation | |----------|-------|----------------| | **High** | Cascade Delete | Add cleanup of LIGHTRAG_* tables when tenant is deleted | | **Medium** | Documentation | Document workspace naming convention clearly | | **Medium** | Orphan Prevention | Add DB triggers to validate tenant/kb exists on insert | | **Low** | Naming Clarity | Consider renaming `workspace` to `isolation_key` in docs | ### 5.2 Implementation: Cascade Delete Add this to `TenantService.delete_tenant()`: ```python async def delete_tenant(self, tenant_id: str) -> bool: # Existing: delete KBs kbs_result = await self.list_knowledge_bases(tenant_id) for kb in kbs_result.get("items", []): await self.delete_knowledge_base(tenant_id, kb.kb_id) # NEW: Clean up LIGHTRAG_* tables if hasattr(self.kv_storage, 'db') and self.kv_storage.db: await self.kv_storage.db.execute( "DELETE FROM LIGHTRAG_DOC_FULL WHERE workspace LIKE $1", [f"{tenant_id}:%"] ) # Repeat for other LIGHTRAG_* tables... # Existing: delete tenant metadata await self.kv_storage.delete([f"{self.tenant_namespace}:{tenant_id}"]) return True ``` ### 5.3 Documentation Update Add this to README or multi-tenancy docs: ```markdown ## Workspace vs Multi-Tenant LightRAG supports two isolation modes: ### Single-Tenant Mode (Default) - Set `WORKSPACE=myworkspace` environment variable - All data stored under one workspace - No authentication required ### Multi-Tenant Mode - Set `LIGHTRAG_MULTI_TENANT=true` - Workspace format: `{tenant_id}:{kb_id}` - Full authentication and RBAC - Multiple knowledge bases per tenant ``` --- ## 6. Conclusion **The Multi-Tenant implementation is well-designed and NOT redundant with the Workspace feature.** The architecture correctly layers: 1. **Workspace (upstream)** for storage-level isolation 2. **Tenant (local)** for application-level multi-tenancy This follows best practices for extending open-source projects: - Minimal changes to core code - Clear abstraction layers - Backward compatibility maintained **Recommendation:** Approve the current implementation with minor improvements for cascade delete and documentation clarity. --- ## Appendix A: File Reference | File | Purpose | |------|---------| | `lightrag/lightrag.py` | Core LightRAG class with workspace parameter | | `lightrag/kg/postgres_impl.py` | PostgreSQL storage with workspace in PK | | `lightrag/models/tenant.py` | Tenant, KnowledgeBase, TenantContext models | | `lightrag/services/tenant_service.py` | Tenant/KB CRUD, access verification | | `lightrag/tenant_rag_manager.py` | RAG instance management per tenant/KB | | `lightrag/api/routers/tenant_routes.py` | REST API for tenant management | | `lightrag/security.py` | Identifier validation, security utilities | | `starter/init-postgres.sql` | Database schema with generated columns | ## Appendix B: Environment Variables ### Workspace Variables (Upstream) - `WORKSPACE` - Default workspace name - `POSTGRES_WORKSPACE` - PostgreSQL-specific workspace - `REDIS_WORKSPACE` - Redis-specific workspace - `MONGODB_WORKSPACE` - MongoDB-specific workspace ### Tenant Variables (Local) - `LIGHTRAG_MULTI_TENANT` - Enable multi-tenant mode (true/false) - `LIGHTRAG_SUPER_ADMIN_USERS` - Comma-separated super admin usernames - `REQUIRE_USER_AUTH` - Require authentication (true/false)