LightRAG/docs/0009-multi-tenant-vs-workspace-audit.md
2025-12-05 14:31:13 +08:00

14 KiB

Multi-Tenant vs Workspace Architecture Audit Report

Date: 2024-12-05 Status: PASSED - No Redundancy Found Author: AI Audit Agent

Executive Summary

This audit evaluates whether the Multi-Tenant feature (local HKU implementation) is redundant with the Workspace feature (upstream HKUDS/LightRAG).

Verdict: NOT REDUNDANT - The features serve different purposes in a well-designed layered architecture:

Feature Layer Purpose
Workspace (upstream) Storage Layer Low-level data isolation mechanism in database tables
Tenant (local) Application Layer High-level multi-tenant SaaS with user management, RBAC, and APIs

The Tenant feature extends and uses the Workspace feature - it's a proper abstraction layer, not duplication.


1. Workspace Feature (Upstream LightRAG)

1.1 Purpose

The workspace parameter in LightRAG provides storage-level data isolation between different LightRAG instances.

1.2 Implementation

Core Parameter:

# From lightrag/lightrag.py
@dataclass
class LightRAG:
    workspace: str = field(default_factory=lambda: os.getenv("WORKSPACE", ""))
    """Workspace for data isolation. Defaults to empty string if WORKSPACE environment variable is not set."""

Storage Isolation: All storage classes receive the workspace parameter and use it in their primary keys:

# From lightrag/lightrag.py - storage initialization
self.llm_response_cache = self.key_string_value_json_storage_cls(
    namespace=NameSpace.KV_STORE_LLM_RESPONSE_CACHE,
    workspace=self.workspace,  # Passed to all storages
    ...
)

Database Schema (PostgreSQL):

-- Every LIGHTRAG_* table has workspace in PRIMARY KEY
CREATE TABLE LIGHTRAG_DOC_FULL (
    id VARCHAR(255),
    workspace VARCHAR(255),
    ...
    CONSTRAINT LIGHTRAG_DOC_FULL_PK PRIMARY KEY (workspace, id)
);

1.3 Environment Variables

Variable Storage Type Description
WORKSPACE Generic Default workspace for all storages
POSTGRES_WORKSPACE PostgreSQL PostgreSQL-specific workspace
REDIS_WORKSPACE Redis Redis-specific workspace
MONGODB_WORKSPACE MongoDB MongoDB-specific workspace
MILVUS_WORKSPACE Milvus Milvus-specific workspace
QDRANT_WORKSPACE Qdrant Qdrant-specific workspace
NEO4J_WORKSPACE Neo4j Neo4j-specific workspace

1.4 Limitations

The workspace feature provides only storage isolation:

  • No user management
  • No authentication/authorization
  • No CRUD API for workspace management
  • No metadata or descriptions
  • No UI support
  • No concept of multiple knowledge bases per workspace

2. Multi-Tenant Feature (Local Implementation)

2.1 Purpose

The Multi-Tenant feature provides a complete SaaS multi-tenancy layer on top of LightRAG, including:

  • Organization (tenant) management
  • Multiple knowledge bases per tenant
  • Role-based access control (RBAC)
  • User-tenant membership
  • REST API for management
  • WebUI for tenant/KB selection

2.2 Key Components

Component File Purpose
Tenant Model lightrag/models/tenant.py Data models for Tenant, KnowledgeBase, TenantContext
TenantService lightrag/services/tenant_service.py CRUD operations, access verification
TenantRAGManager lightrag/tenant_rag_manager.py Manages RAG instances per tenant/KB
Tenant Routes lightrag/api/routers/tenant_routes.py REST API endpoints
Security lightrag/security.py Validation, path traversal prevention

2.3 How Tenant Uses Workspace

Critical Integration Point:

# From lightrag/tenant_rag_manager.py
async def get_rag_instance(self, tenant_id: str, kb_id: str, user_id: str):
    # SECURITY: Validate identifiers
    tenant_id = validate_identifier(tenant_id, "tenant_id")
    kb_id = validate_identifier(kb_id, "kb_id")

    # Create composite workspace
    tenant_working_dir, composite_workspace = validate_working_directory(
        self.base_working_dir, tenant_id, kb_id
    )
    # composite_workspace = f"{tenant_id}:{kb_id}"

    # Create RAG instance with composite workspace
    instance = LightRAG(
        working_dir=tenant_working_dir,
        workspace=composite_workspace,  # Uses workspace under the hood!
        ...
    )

The Tenant feature DELEGATES to Workspace for actual data isolation.

2.4 Database Schema

Management Tables (Tenant Layer):

-- Tenant metadata
CREATE TABLE tenants (
    tenant_id VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    metadata JSONB,
    ...
);

-- Knowledge bases within tenants
CREATE TABLE knowledge_bases (
    tenant_id VARCHAR(255) REFERENCES tenants(tenant_id),
    kb_id VARCHAR(255) NOT NULL,
    name VARCHAR(255) NOT NULL,
    ...
);

-- User access control
CREATE TABLE user_tenant_memberships (
    user_id VARCHAR(255) NOT NULL,
    tenant_id VARCHAR(255) REFERENCES tenants(tenant_id),
    role VARCHAR(50) NOT NULL,  -- owner, admin, editor, viewer
    ...
);

Generated Columns for Integration:

-- LIGHTRAG_* tables have generated columns to extract tenant/kb
ALTER TABLE LIGHTRAG_DOC_FULL ADD COLUMN
    tenant_id VARCHAR(255) GENERATED ALWAYS AS (
        CASE WHEN workspace LIKE '%:%'
             THEN SPLIT_PART(workspace, ':', 1)
             ELSE workspace END
    ) STORED,
    kb_id VARCHAR(255) GENERATED ALWAYS AS (
        CASE WHEN workspace LIKE '%:%'
             THEN SPLIT_PART(workspace, ':', 2)
             ELSE 'default' END
    ) STORED;

This allows querying data by tenant/KB without modifying the core storage implementation.

2.5 Roles and Permissions

Role Permissions
Owner Full control, manage members, delete tenant
Admin Create/delete KBs, manage documents
Editor Create/update/delete documents, run queries
Viewer Read documents, run queries

3. Architecture Comparison

3.1 Feature Matrix

Aspect Workspace (Upstream) Tenant (Local)
Data Isolation Storage-level Uses workspace
User Management Full RBAC
Authentication JWT tokens
Authorization Role-based
CRUD API REST endpoints
Multiple KBs One per workspace Many per tenant
Configuration Global only Per-tenant
Quotas/Limits Per-tenant
Metadata Rich metadata
UI Support Selection UI
File Storage Subdirectories Uses subdirs
Backward Compatible Single-tenant mode

3.2 Layered Architecture

┌─────────────────────────────────────────────────────────────┐
│                    WebUI / REST API                         │
│   - Tenant/KB selection                                     │
│   - Document upload, query interface                        │
├─────────────────────────────────────────────────────────────┤
│                 Authentication Layer                         │
│   - JWT token validation                                    │
│   - User session management                                 │
├─────────────────────────────────────────────────────────────┤
│                 Authorization Layer                          │
│   - TenantService.verify_user_access()                     │
│   - Role-based permission checks                           │
├─────────────────────────────────────────────────────────────┤
│              TenantRAGManager (Instance Cache)              │
│   - Manages per-tenant/KB LightRAG instances               │
│   - LRU eviction for memory management                     │
│   - Creates composite_workspace = "{tenant}:{kb}"          │
├─────────────────────────────────────────────────────────────┤
│                     LightRAG Core                           │
│   - Uses workspace for storage isolation                   │
│   - KV, Vector, Graph, DocStatus storages                  │
├─────────────────────────────────────────────────────────────┤
│                PostgreSQL / Storage Backend                  │
│   - PRIMARY KEY (workspace, id) for isolation              │
│   - Generated columns extract tenant_id, kb_id             │
└─────────────────────────────────────────────────────────────┘

4. Findings

4.1 No Redundancy Found

The Tenant feature is complementary, not redundant:

  1. Workspace = Storage mechanism (HOW data is isolated)
  2. Tenant = Application layer (WHO can access WHAT data)

They work together:

User Request → Tenant Auth → TenantRAGManager → workspace="{tenant}:{kb}" → Storage

4.2 Design Quality Assessment

Criterion Score Notes
Separation of Concerns Clean layered architecture
Code Reuse Tenant uses workspace, doesn't duplicate
Security Validation, RBAC, path traversal prevention
Backward Compatibility Single-tenant mode still works
Database Design Generated columns enable efficient queries

4.3 Positive Design Decisions

  1. Composite Workspace Format: Using {tenant_id}:{kb_id} as workspace allows multiple KBs per tenant while reusing storage isolation

  2. Generated Columns: PostgreSQL generated columns (tenant_id, kb_id) enable efficient queries without schema changes to core tables

  3. Instance Caching: TenantRAGManager caches RAG instances with LRU eviction for performance

  4. Security Validation: validate_identifier() and validate_working_directory() prevent injection and path traversal

  5. Environment Toggle: LIGHTRAG_MULTI_TENANT allows switching between single-tenant and multi-tenant modes


5. Recommendations

5.1 Improvements Needed

Priority Issue Recommendation
High Cascade Delete Add cleanup of LIGHTRAG_* tables when tenant is deleted
Medium Documentation Document workspace naming convention clearly
Medium Orphan Prevention Add DB triggers to validate tenant/kb exists on insert
Low Naming Clarity Consider renaming workspace to isolation_key in docs

5.2 Implementation: Cascade Delete

Add this to TenantService.delete_tenant():

async def delete_tenant(self, tenant_id: str) -> bool:
    # Existing: delete KBs
    kbs_result = await self.list_knowledge_bases(tenant_id)
    for kb in kbs_result.get("items", []):
        await self.delete_knowledge_base(tenant_id, kb.kb_id)

    # NEW: Clean up LIGHTRAG_* tables
    if hasattr(self.kv_storage, 'db') and self.kv_storage.db:
        await self.kv_storage.db.execute(
            "DELETE FROM LIGHTRAG_DOC_FULL WHERE workspace LIKE $1",
            [f"{tenant_id}:%"]
        )
        # Repeat for other LIGHTRAG_* tables...

    # Existing: delete tenant metadata
    await self.kv_storage.delete([f"{self.tenant_namespace}:{tenant_id}"])
    return True

5.3 Documentation Update

Add this to README or multi-tenancy docs:

## Workspace vs Multi-Tenant

LightRAG supports two isolation modes:

### Single-Tenant Mode (Default)
- Set `WORKSPACE=myworkspace` environment variable
- All data stored under one workspace
- No authentication required

### Multi-Tenant Mode
- Set `LIGHTRAG_MULTI_TENANT=true`
- Workspace format: `{tenant_id}:{kb_id}`
- Full authentication and RBAC
- Multiple knowledge bases per tenant

6. Conclusion

The Multi-Tenant implementation is well-designed and NOT redundant with the Workspace feature.

The architecture correctly layers:

  1. Workspace (upstream) for storage-level isolation
  2. Tenant (local) for application-level multi-tenancy

This follows best practices for extending open-source projects:

  • Minimal changes to core code
  • Clear abstraction layers
  • Backward compatibility maintained

Recommendation: Approve the current implementation with minor improvements for cascade delete and documentation clarity.


Appendix A: File Reference

File Purpose
lightrag/lightrag.py Core LightRAG class with workspace parameter
lightrag/kg/postgres_impl.py PostgreSQL storage with workspace in PK
lightrag/models/tenant.py Tenant, KnowledgeBase, TenantContext models
lightrag/services/tenant_service.py Tenant/KB CRUD, access verification
lightrag/tenant_rag_manager.py RAG instance management per tenant/KB
lightrag/api/routers/tenant_routes.py REST API for tenant management
lightrag/security.py Identifier validation, security utilities
starter/init-postgres.sql Database schema with generated columns

Appendix B: Environment Variables

Workspace Variables (Upstream)

  • WORKSPACE - Default workspace name
  • POSTGRES_WORKSPACE - PostgreSQL-specific workspace
  • REDIS_WORKSPACE - Redis-specific workspace
  • MONGODB_WORKSPACE - MongoDB-specific workspace

Tenant Variables (Local)

  • LIGHTRAG_MULTI_TENANT - Enable multi-tenant mode (true/false)
  • LIGHTRAG_SUPER_ADMIN_USERS - Comma-separated super admin usernames
  • REQUIRE_USER_AUTH - Require authentication (true/false)