14 KiB
Multi-Tenant vs Workspace Architecture Audit Report
Date: 2024-12-05 Status: ✅ PASSED - No Redundancy Found Author: AI Audit Agent
Executive Summary
This audit evaluates whether the Multi-Tenant feature (local HKU implementation) is redundant with the Workspace feature (upstream HKUDS/LightRAG).
Verdict: NOT REDUNDANT - The features serve different purposes in a well-designed layered architecture:
| Feature | Layer | Purpose |
|---|---|---|
| Workspace (upstream) | Storage Layer | Low-level data isolation mechanism in database tables |
| Tenant (local) | Application Layer | High-level multi-tenant SaaS with user management, RBAC, and APIs |
The Tenant feature extends and uses the Workspace feature - it's a proper abstraction layer, not duplication.
1. Workspace Feature (Upstream LightRAG)
1.1 Purpose
The workspace parameter in LightRAG provides storage-level data isolation between different LightRAG instances.
1.2 Implementation
Core Parameter:
# From lightrag/lightrag.py
@dataclass
class LightRAG:
workspace: str = field(default_factory=lambda: os.getenv("WORKSPACE", ""))
"""Workspace for data isolation. Defaults to empty string if WORKSPACE environment variable is not set."""
Storage Isolation:
All storage classes receive the workspace parameter and use it in their primary keys:
# From lightrag/lightrag.py - storage initialization
self.llm_response_cache = self.key_string_value_json_storage_cls(
namespace=NameSpace.KV_STORE_LLM_RESPONSE_CACHE,
workspace=self.workspace, # Passed to all storages
...
)
Database Schema (PostgreSQL):
-- Every LIGHTRAG_* table has workspace in PRIMARY KEY
CREATE TABLE LIGHTRAG_DOC_FULL (
id VARCHAR(255),
workspace VARCHAR(255),
...
CONSTRAINT LIGHTRAG_DOC_FULL_PK PRIMARY KEY (workspace, id)
);
1.3 Environment Variables
| Variable | Storage Type | Description |
|---|---|---|
WORKSPACE |
Generic | Default workspace for all storages |
POSTGRES_WORKSPACE |
PostgreSQL | PostgreSQL-specific workspace |
REDIS_WORKSPACE |
Redis | Redis-specific workspace |
MONGODB_WORKSPACE |
MongoDB | MongoDB-specific workspace |
MILVUS_WORKSPACE |
Milvus | Milvus-specific workspace |
QDRANT_WORKSPACE |
Qdrant | Qdrant-specific workspace |
NEO4J_WORKSPACE |
Neo4j | Neo4j-specific workspace |
1.4 Limitations
The workspace feature provides only storage isolation:
- ❌ No user management
- ❌ No authentication/authorization
- ❌ No CRUD API for workspace management
- ❌ No metadata or descriptions
- ❌ No UI support
- ❌ No concept of multiple knowledge bases per workspace
2. Multi-Tenant Feature (Local Implementation)
2.1 Purpose
The Multi-Tenant feature provides a complete SaaS multi-tenancy layer on top of LightRAG, including:
- Organization (tenant) management
- Multiple knowledge bases per tenant
- Role-based access control (RBAC)
- User-tenant membership
- REST API for management
- WebUI for tenant/KB selection
2.2 Key Components
| Component | File | Purpose |
|---|---|---|
| Tenant Model | lightrag/models/tenant.py |
Data models for Tenant, KnowledgeBase, TenantContext |
| TenantService | lightrag/services/tenant_service.py |
CRUD operations, access verification |
| TenantRAGManager | lightrag/tenant_rag_manager.py |
Manages RAG instances per tenant/KB |
| Tenant Routes | lightrag/api/routers/tenant_routes.py |
REST API endpoints |
| Security | lightrag/security.py |
Validation, path traversal prevention |
2.3 How Tenant Uses Workspace
Critical Integration Point:
# From lightrag/tenant_rag_manager.py
async def get_rag_instance(self, tenant_id: str, kb_id: str, user_id: str):
# SECURITY: Validate identifiers
tenant_id = validate_identifier(tenant_id, "tenant_id")
kb_id = validate_identifier(kb_id, "kb_id")
# Create composite workspace
tenant_working_dir, composite_workspace = validate_working_directory(
self.base_working_dir, tenant_id, kb_id
)
# composite_workspace = f"{tenant_id}:{kb_id}"
# Create RAG instance with composite workspace
instance = LightRAG(
working_dir=tenant_working_dir,
workspace=composite_workspace, # Uses workspace under the hood!
...
)
The Tenant feature DELEGATES to Workspace for actual data isolation.
2.4 Database Schema
Management Tables (Tenant Layer):
-- Tenant metadata
CREATE TABLE tenants (
tenant_id VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
metadata JSONB,
...
);
-- Knowledge bases within tenants
CREATE TABLE knowledge_bases (
tenant_id VARCHAR(255) REFERENCES tenants(tenant_id),
kb_id VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
...
);
-- User access control
CREATE TABLE user_tenant_memberships (
user_id VARCHAR(255) NOT NULL,
tenant_id VARCHAR(255) REFERENCES tenants(tenant_id),
role VARCHAR(50) NOT NULL, -- owner, admin, editor, viewer
...
);
Generated Columns for Integration:
-- LIGHTRAG_* tables have generated columns to extract tenant/kb
ALTER TABLE LIGHTRAG_DOC_FULL ADD COLUMN
tenant_id VARCHAR(255) GENERATED ALWAYS AS (
CASE WHEN workspace LIKE '%:%'
THEN SPLIT_PART(workspace, ':', 1)
ELSE workspace END
) STORED,
kb_id VARCHAR(255) GENERATED ALWAYS AS (
CASE WHEN workspace LIKE '%:%'
THEN SPLIT_PART(workspace, ':', 2)
ELSE 'default' END
) STORED;
This allows querying data by tenant/KB without modifying the core storage implementation.
2.5 Roles and Permissions
| Role | Permissions |
|---|---|
| Owner | Full control, manage members, delete tenant |
| Admin | Create/delete KBs, manage documents |
| Editor | Create/update/delete documents, run queries |
| Viewer | Read documents, run queries |
3. Architecture Comparison
3.1 Feature Matrix
| Aspect | Workspace (Upstream) | Tenant (Local) |
|---|---|---|
| Data Isolation | ✅ Storage-level | ✅ Uses workspace |
| User Management | ❌ | ✅ Full RBAC |
| Authentication | ❌ | ✅ JWT tokens |
| Authorization | ❌ | ✅ Role-based |
| CRUD API | ❌ | ✅ REST endpoints |
| Multiple KBs | ❌ One per workspace | ✅ Many per tenant |
| Configuration | ❌ Global only | ✅ Per-tenant |
| Quotas/Limits | ❌ | ✅ Per-tenant |
| Metadata | ❌ | ✅ Rich metadata |
| UI Support | ❌ | ✅ Selection UI |
| File Storage | ✅ Subdirectories | ✅ Uses subdirs |
| Backward Compatible | ✅ | ✅ Single-tenant mode |
3.2 Layered Architecture
┌─────────────────────────────────────────────────────────────┐
│ WebUI / REST API │
│ - Tenant/KB selection │
│ - Document upload, query interface │
├─────────────────────────────────────────────────────────────┤
│ Authentication Layer │
│ - JWT token validation │
│ - User session management │
├─────────────────────────────────────────────────────────────┤
│ Authorization Layer │
│ - TenantService.verify_user_access() │
│ - Role-based permission checks │
├─────────────────────────────────────────────────────────────┤
│ TenantRAGManager (Instance Cache) │
│ - Manages per-tenant/KB LightRAG instances │
│ - LRU eviction for memory management │
│ - Creates composite_workspace = "{tenant}:{kb}" │
├─────────────────────────────────────────────────────────────┤
│ LightRAG Core │
│ - Uses workspace for storage isolation │
│ - KV, Vector, Graph, DocStatus storages │
├─────────────────────────────────────────────────────────────┤
│ PostgreSQL / Storage Backend │
│ - PRIMARY KEY (workspace, id) for isolation │
│ - Generated columns extract tenant_id, kb_id │
└─────────────────────────────────────────────────────────────┘
4. Findings
4.1 No Redundancy Found ✅
The Tenant feature is complementary, not redundant:
- Workspace = Storage mechanism (HOW data is isolated)
- Tenant = Application layer (WHO can access WHAT data)
They work together:
User Request → Tenant Auth → TenantRAGManager → workspace="{tenant}:{kb}" → Storage
4.2 Design Quality Assessment
| Criterion | Score | Notes |
|---|---|---|
| Separation of Concerns | ⭐⭐⭐⭐⭐ | Clean layered architecture |
| Code Reuse | ⭐⭐⭐⭐⭐ | Tenant uses workspace, doesn't duplicate |
| Security | ⭐⭐⭐⭐ | Validation, RBAC, path traversal prevention |
| Backward Compatibility | ⭐⭐⭐⭐⭐ | Single-tenant mode still works |
| Database Design | ⭐⭐⭐⭐ | Generated columns enable efficient queries |
4.3 Positive Design Decisions
-
Composite Workspace Format: Using
{tenant_id}:{kb_id}as workspace allows multiple KBs per tenant while reusing storage isolation -
Generated Columns: PostgreSQL generated columns (
tenant_id,kb_id) enable efficient queries without schema changes to core tables -
Instance Caching: TenantRAGManager caches RAG instances with LRU eviction for performance
-
Security Validation:
validate_identifier()andvalidate_working_directory()prevent injection and path traversal -
Environment Toggle:
LIGHTRAG_MULTI_TENANTallows switching between single-tenant and multi-tenant modes
5. Recommendations
5.1 Improvements Needed
| Priority | Issue | Recommendation |
|---|---|---|
| High | Cascade Delete | Add cleanup of LIGHTRAG_* tables when tenant is deleted |
| Medium | Documentation | Document workspace naming convention clearly |
| Medium | Orphan Prevention | Add DB triggers to validate tenant/kb exists on insert |
| Low | Naming Clarity | Consider renaming workspace to isolation_key in docs |
5.2 Implementation: Cascade Delete
Add this to TenantService.delete_tenant():
async def delete_tenant(self, tenant_id: str) -> bool:
# Existing: delete KBs
kbs_result = await self.list_knowledge_bases(tenant_id)
for kb in kbs_result.get("items", []):
await self.delete_knowledge_base(tenant_id, kb.kb_id)
# NEW: Clean up LIGHTRAG_* tables
if hasattr(self.kv_storage, 'db') and self.kv_storage.db:
await self.kv_storage.db.execute(
"DELETE FROM LIGHTRAG_DOC_FULL WHERE workspace LIKE $1",
[f"{tenant_id}:%"]
)
# Repeat for other LIGHTRAG_* tables...
# Existing: delete tenant metadata
await self.kv_storage.delete([f"{self.tenant_namespace}:{tenant_id}"])
return True
5.3 Documentation Update
Add this to README or multi-tenancy docs:
## Workspace vs Multi-Tenant
LightRAG supports two isolation modes:
### Single-Tenant Mode (Default)
- Set `WORKSPACE=myworkspace` environment variable
- All data stored under one workspace
- No authentication required
### Multi-Tenant Mode
- Set `LIGHTRAG_MULTI_TENANT=true`
- Workspace format: `{tenant_id}:{kb_id}`
- Full authentication and RBAC
- Multiple knowledge bases per tenant
6. Conclusion
The Multi-Tenant implementation is well-designed and NOT redundant with the Workspace feature.
The architecture correctly layers:
- Workspace (upstream) for storage-level isolation
- Tenant (local) for application-level multi-tenancy
This follows best practices for extending open-source projects:
- Minimal changes to core code
- Clear abstraction layers
- Backward compatibility maintained
Recommendation: Approve the current implementation with minor improvements for cascade delete and documentation clarity.
Appendix A: File Reference
| File | Purpose |
|---|---|
lightrag/lightrag.py |
Core LightRAG class with workspace parameter |
lightrag/kg/postgres_impl.py |
PostgreSQL storage with workspace in PK |
lightrag/models/tenant.py |
Tenant, KnowledgeBase, TenantContext models |
lightrag/services/tenant_service.py |
Tenant/KB CRUD, access verification |
lightrag/tenant_rag_manager.py |
RAG instance management per tenant/KB |
lightrag/api/routers/tenant_routes.py |
REST API for tenant management |
lightrag/security.py |
Identifier validation, security utilities |
starter/init-postgres.sql |
Database schema with generated columns |
Appendix B: Environment Variables
Workspace Variables (Upstream)
WORKSPACE- Default workspace namePOSTGRES_WORKSPACE- PostgreSQL-specific workspaceREDIS_WORKSPACE- Redis-specific workspaceMONGODB_WORKSPACE- MongoDB-specific workspace
Tenant Variables (Local)
LIGHTRAG_MULTI_TENANT- Enable multi-tenant mode (true/false)LIGHTRAG_SUPER_ADMIN_USERS- Comma-separated super admin usernamesREQUIRE_USER_AUTH- Require authentication (true/false)