Raphaël MANSUY cf75f71291 ci: fix lint/pre-commit issues and apply autoformat

2025-12-05 14:31:13 +08:00

14 KiB

Raw Blame History

Multi-Tenant vs Workspace Architecture Audit Report

Date: 2024-12-05 Status: ✅ PASSED - No Redundancy Found Author: AI Audit Agent

Executive Summary

This audit evaluates whether the Multi-Tenant feature (local HKU implementation) is redundant with the Workspace feature (upstream HKUDS/LightRAG).

Verdict: NOT REDUNDANT - The features serve different purposes in a well-designed layered architecture:

Feature	Layer	Purpose
Workspace (upstream)	Storage Layer	Low-level data isolation mechanism in database tables
Tenant (local)	Application Layer	High-level multi-tenant SaaS with user management, RBAC, and APIs

The Tenant feature extends and uses the Workspace feature - it's a proper abstraction layer, not duplication.

1. Workspace Feature (Upstream LightRAG)

1.1 Purpose

The workspace parameter in LightRAG provides storage-level data isolation between different LightRAG instances.

1.2 Implementation

Core Parameter:

# From lightrag/lightrag.py
@dataclass
class LightRAG:
    workspace: str = field(default_factory=lambda: os.getenv("WORKSPACE", ""))
    """Workspace for data isolation. Defaults to empty string if WORKSPACE environment variable is not set."""

Storage Isolation: All storage classes receive the workspace parameter and use it in their primary keys:

# From lightrag/lightrag.py - storage initialization
self.llm_response_cache = self.key_string_value_json_storage_cls(
    namespace=NameSpace.KV_STORE_LLM_RESPONSE_CACHE,
    workspace=self.workspace,  # Passed to all storages
    ...
)

Database Schema (PostgreSQL):

-- Every LIGHTRAG_* table has workspace in PRIMARY KEY
CREATE TABLE LIGHTRAG_DOC_FULL (
    id VARCHAR(255),
    workspace VARCHAR(255),
    ...
    CONSTRAINT LIGHTRAG_DOC_FULL_PK PRIMARY KEY (workspace, id)
);

1.3 Environment Variables

Variable	Storage Type	Description
`WORKSPACE`	Generic	Default workspace for all storages
`POSTGRES_WORKSPACE`	PostgreSQL	PostgreSQL-specific workspace
`REDIS_WORKSPACE`	Redis	Redis-specific workspace
`MONGODB_WORKSPACE`	MongoDB	MongoDB-specific workspace
`MILVUS_WORKSPACE`	Milvus	Milvus-specific workspace
`QDRANT_WORKSPACE`	Qdrant	Qdrant-specific workspace
`NEO4J_WORKSPACE`	Neo4j	Neo4j-specific workspace

1.4 Limitations

The workspace feature provides only storage isolation:

❌ No user management
❌ No authentication/authorization
❌ No CRUD API for workspace management
❌ No metadata or descriptions
❌ No UI support
❌ No concept of multiple knowledge bases per workspace

2. Multi-Tenant Feature (Local Implementation)

2.1 Purpose

The Multi-Tenant feature provides a complete SaaS multi-tenancy layer on top of LightRAG, including:

Organization (tenant) management
Multiple knowledge bases per tenant
Role-based access control (RBAC)
User-tenant membership
REST API for management
WebUI for tenant/KB selection

2.2 Key Components

Component	File	Purpose
Tenant Model	`lightrag/models/tenant.py`	Data models for Tenant, KnowledgeBase, TenantContext
TenantService	`lightrag/services/tenant_service.py`	CRUD operations, access verification
TenantRAGManager	`lightrag/tenant_rag_manager.py`	Manages RAG instances per tenant/KB
Tenant Routes	`lightrag/api/routers/tenant_routes.py`	REST API endpoints
Security	`lightrag/security.py`	Validation, path traversal prevention

2.3 How Tenant Uses Workspace

Critical Integration Point:

# From lightrag/tenant_rag_manager.py
async def get_rag_instance(self, tenant_id: str, kb_id: str, user_id: str):
    # SECURITY: Validate identifiers
    tenant_id = validate_identifier(tenant_id, "tenant_id")
    kb_id = validate_identifier(kb_id, "kb_id")

    # Create composite workspace
    tenant_working_dir, composite_workspace = validate_working_directory(
        self.base_working_dir, tenant_id, kb_id
    )
    # composite_workspace = f"{tenant_id}:{kb_id}"

    # Create RAG instance with composite workspace
    instance = LightRAG(
        working_dir=tenant_working_dir,
        workspace=composite_workspace,  # Uses workspace under the hood!
        ...
    )

The Tenant feature DELEGATES to Workspace for actual data isolation.

2.4 Database Schema

Management Tables (Tenant Layer):

-- Tenant metadata
CREATE TABLE tenants (
    tenant_id VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    metadata JSONB,
    ...
);

-- Knowledge bases within tenants
CREATE TABLE knowledge_bases (
    tenant_id VARCHAR(255) REFERENCES tenants(tenant_id),
    kb_id VARCHAR(255) NOT NULL,
    name VARCHAR(255) NOT NULL,
    ...
);

-- User access control
CREATE TABLE user_tenant_memberships (
    user_id VARCHAR(255) NOT NULL,
    tenant_id VARCHAR(255) REFERENCES tenants(tenant_id),
    role VARCHAR(50) NOT NULL,  -- owner, admin, editor, viewer
    ...
);

Generated Columns for Integration:

-- LIGHTRAG_* tables have generated columns to extract tenant/kb
ALTER TABLE LIGHTRAG_DOC_FULL ADD COLUMN
    tenant_id VARCHAR(255) GENERATED ALWAYS AS (
        CASE WHEN workspace LIKE '%:%'
             THEN SPLIT_PART(workspace, ':', 1)
             ELSE workspace END
    ) STORED,
    kb_id VARCHAR(255) GENERATED ALWAYS AS (
        CASE WHEN workspace LIKE '%:%'
             THEN SPLIT_PART(workspace, ':', 2)
             ELSE 'default' END
    ) STORED;

This allows querying data by tenant/KB without modifying the core storage implementation.

2.5 Roles and Permissions

Role	Permissions
Owner	Full control, manage members, delete tenant
Admin	Create/delete KBs, manage documents
Editor	Create/update/delete documents, run queries
Viewer	Read documents, run queries

3. Architecture Comparison

3.1 Feature Matrix

Aspect	Workspace (Upstream)	Tenant (Local)
Data Isolation	✅ Storage-level	✅ Uses workspace
User Management	❌	✅ Full RBAC
Authentication	❌	✅ JWT tokens
Authorization	❌	✅ Role-based
CRUD API	❌	✅ REST endpoints
Multiple KBs	❌ One per workspace	✅ Many per tenant
Configuration	❌ Global only	✅ Per-tenant
Quotas/Limits	❌	✅ Per-tenant
Metadata	❌	✅ Rich metadata
UI Support	❌	✅ Selection UI
File Storage	✅ Subdirectories	✅ Uses subdirs
Backward Compatible	✅	✅ Single-tenant mode

3.2 Layered Architecture

┌─────────────────────────────────────────────────────────────┐
│                    WebUI / REST API                         │
│   - Tenant/KB selection                                     │
│   - Document upload, query interface                        │
├─────────────────────────────────────────────────────────────┤
│                 Authentication Layer                         │
│   - JWT token validation                                    │
│   - User session management                                 │
├─────────────────────────────────────────────────────────────┤
│                 Authorization Layer                          │
│   - TenantService.verify_user_access()                     │
│   - Role-based permission checks                           │
├─────────────────────────────────────────────────────────────┤
│              TenantRAGManager (Instance Cache)              │
│   - Manages per-tenant/KB LightRAG instances               │
│   - LRU eviction for memory management                     │
│   - Creates composite_workspace = "{tenant}:{kb}"          │
├─────────────────────────────────────────────────────────────┤
│                     LightRAG Core                           │
│   - Uses workspace for storage isolation                   │
│   - KV, Vector, Graph, DocStatus storages                  │
├─────────────────────────────────────────────────────────────┤
│                PostgreSQL / Storage Backend                  │
│   - PRIMARY KEY (workspace, id) for isolation              │
│   - Generated columns extract tenant_id, kb_id             │
└─────────────────────────────────────────────────────────────┘

4. Findings

4.1 No Redundancy Found ✅

The Tenant feature is complementary, not redundant:

Workspace = Storage mechanism (HOW data is isolated)
Tenant = Application layer (WHO can access WHAT data)

They work together:

User Request → Tenant Auth → TenantRAGManager → workspace="{tenant}:{kb}" → Storage

4.2 Design Quality Assessment

Criterion	Score	Notes
Separation of Concerns	⭐⭐⭐⭐⭐	Clean layered architecture
Code Reuse	⭐⭐⭐⭐⭐	Tenant uses workspace, doesn't duplicate
Security	⭐⭐⭐⭐	Validation, RBAC, path traversal prevention
Backward Compatibility	⭐⭐⭐⭐⭐	Single-tenant mode still works
Database Design	⭐⭐⭐⭐	Generated columns enable efficient queries

4.3 Positive Design Decisions

Composite Workspace Format: Using {tenant_id}:{kb_id} as workspace allows multiple KBs per tenant while reusing storage isolation
Generated Columns: PostgreSQL generated columns (tenant_id, kb_id) enable efficient queries without schema changes to core tables
Instance Caching: TenantRAGManager caches RAG instances with LRU eviction for performance
Security Validation: validate_identifier() and validate_working_directory() prevent injection and path traversal
Environment Toggle: LIGHTRAG_MULTI_TENANT allows switching between single-tenant and multi-tenant modes

5. Recommendations

5.1 Improvements Needed

Priority	Issue	Recommendation
High	Cascade Delete	Add cleanup of LIGHTRAG_* tables when tenant is deleted
Medium	Documentation	Document workspace naming convention clearly
Medium	Orphan Prevention	Add DB triggers to validate tenant/kb exists on insert
Low	Naming Clarity	Consider renaming `workspace` to `isolation_key` in docs

5.2 Implementation: Cascade Delete

Add this to TenantService.delete_tenant():

async def delete_tenant(self, tenant_id: str) -> bool:
    # Existing: delete KBs
    kbs_result = await self.list_knowledge_bases(tenant_id)
    for kb in kbs_result.get("items", []):
        await self.delete_knowledge_base(tenant_id, kb.kb_id)

    # NEW: Clean up LIGHTRAG_* tables
    if hasattr(self.kv_storage, 'db') and self.kv_storage.db:
        await self.kv_storage.db.execute(
            "DELETE FROM LIGHTRAG_DOC_FULL WHERE workspace LIKE $1",
            [f"{tenant_id}:%"]
        )
        # Repeat for other LIGHTRAG_* tables...

    # Existing: delete tenant metadata
    await self.kv_storage.delete([f"{self.tenant_namespace}:{tenant_id}"])
    return True

5.3 Documentation Update

Add this to README or multi-tenancy docs:

## Workspace vs Multi-Tenant

LightRAG supports two isolation modes:

### Single-Tenant Mode (Default)
- Set `WORKSPACE=myworkspace` environment variable
- All data stored under one workspace
- No authentication required

### Multi-Tenant Mode
- Set `LIGHTRAG_MULTI_TENANT=true`
- Workspace format: `{tenant_id}:{kb_id}`
- Full authentication and RBAC
- Multiple knowledge bases per tenant

6. Conclusion

The Multi-Tenant implementation is well-designed and NOT redundant with the Workspace feature.

The architecture correctly layers:

Workspace (upstream) for storage-level isolation
Tenant (local) for application-level multi-tenancy

This follows best practices for extending open-source projects:

Minimal changes to core code
Clear abstraction layers
Backward compatibility maintained

Recommendation: Approve the current implementation with minor improvements for cascade delete and documentation clarity.

Appendix A: File Reference

File	Purpose
`lightrag/lightrag.py`	Core LightRAG class with workspace parameter
`lightrag/kg/postgres_impl.py`	PostgreSQL storage with workspace in PK
`lightrag/models/tenant.py`	Tenant, KnowledgeBase, TenantContext models
`lightrag/services/tenant_service.py`	Tenant/KB CRUD, access verification
`lightrag/tenant_rag_manager.py`	RAG instance management per tenant/KB
`lightrag/api/routers/tenant_routes.py`	REST API for tenant management
`lightrag/security.py`	Identifier validation, security utilities
`starter/init-postgres.sql`	Database schema with generated columns

14 KiB

Raw Blame History

Multi-Tenant vs Workspace Architecture Audit Report

Executive Summary

1. Workspace Feature (Upstream LightRAG)

1.1 Purpose

1.2 Implementation

1.3 Environment Variables

1.4 Limitations

2. Multi-Tenant Feature (Local Implementation)

2.1 Purpose

2.2 Key Components

2.3 How Tenant Uses Workspace

2.4 Database Schema

2.5 Roles and Permissions

3. Architecture Comparison

3.1 Feature Matrix

3.2 Layered Architecture

4. Findings

4.1 No Redundancy Found ✅

4.2 Design Quality Assessment

4.3 Positive Design Decisions

5. Recommendations

5.1 Improvements Needed

5.2 Implementation: Cascade Delete

5.3 Documentation Update

6. Conclusion

Appendix A: File Reference

Appendix B: Environment Variables

Workspace Variables (Upstream)

Tenant Variables (Local)

14 KiB Raw Blame History

Multi-Tenant vs Workspace Architecture Audit Report

Executive Summary

1. Workspace Feature (Upstream LightRAG)

1.1 Purpose

1.2 Implementation

1.3 Environment Variables

1.4 Limitations

2. Multi-Tenant Feature (Local Implementation)

2.1 Purpose

2.2 Key Components

2.3 How Tenant Uses Workspace

2.4 Database Schema

2.5 Roles and Permissions

3. Architecture Comparison

3.1 Feature Matrix

3.2 Layered Architecture

4. Findings

4.1 No Redundancy Found ✅

4.2 Design Quality Assessment

4.3 Positive Design Decisions

5. Recommendations

5.1 Improvements Needed

5.2 Implementation: Cascade Delete

5.3 Documentation Update

6. Conclusion

Appendix A: File Reference

Appendix B: Environment Variables

Workspace Variables (Upstream)

Tenant Variables (Local)

14 KiB

Raw Blame History