381 lines
14 KiB
Markdown
381 lines
14 KiB
Markdown
# Multi-Tenant vs Workspace Architecture Audit Report
|
|
|
|
**Date:** 2024-12-05
|
|
**Status:** ✅ PASSED - No Redundancy Found
|
|
**Author:** AI Audit Agent
|
|
|
|
## Executive Summary
|
|
|
|
This audit evaluates whether the **Multi-Tenant feature** (local HKU implementation) is redundant with the **Workspace feature** (upstream HKUDS/LightRAG).
|
|
|
|
**Verdict: NOT REDUNDANT** - The features serve different purposes in a well-designed layered architecture:
|
|
|
|
| Feature | Layer | Purpose |
|
|
|---------|-------|---------|
|
|
| **Workspace** (upstream) | Storage Layer | Low-level data isolation mechanism in database tables |
|
|
| **Tenant** (local) | Application Layer | High-level multi-tenant SaaS with user management, RBAC, and APIs |
|
|
|
|
The Tenant feature **extends and uses** the Workspace feature - it's a proper abstraction layer, not duplication.
|
|
|
|
---
|
|
|
|
## 1. Workspace Feature (Upstream LightRAG)
|
|
|
|
### 1.1 Purpose
|
|
The `workspace` parameter in LightRAG provides **storage-level data isolation** between different LightRAG instances.
|
|
|
|
### 1.2 Implementation
|
|
|
|
**Core Parameter:**
|
|
```python
|
|
# From lightrag/lightrag.py
|
|
@dataclass
|
|
class LightRAG:
|
|
workspace: str = field(default_factory=lambda: os.getenv("WORKSPACE", ""))
|
|
"""Workspace for data isolation. Defaults to empty string if WORKSPACE environment variable is not set."""
|
|
```
|
|
|
|
**Storage Isolation:**
|
|
All storage classes receive the `workspace` parameter and use it in their primary keys:
|
|
|
|
```python
|
|
# From lightrag/lightrag.py - storage initialization
|
|
self.llm_response_cache = self.key_string_value_json_storage_cls(
|
|
namespace=NameSpace.KV_STORE_LLM_RESPONSE_CACHE,
|
|
workspace=self.workspace, # Passed to all storages
|
|
...
|
|
)
|
|
```
|
|
|
|
**Database Schema (PostgreSQL):**
|
|
```sql
|
|
-- Every LIGHTRAG_* table has workspace in PRIMARY KEY
|
|
CREATE TABLE LIGHTRAG_DOC_FULL (
|
|
id VARCHAR(255),
|
|
workspace VARCHAR(255),
|
|
...
|
|
CONSTRAINT LIGHTRAG_DOC_FULL_PK PRIMARY KEY (workspace, id)
|
|
);
|
|
```
|
|
|
|
### 1.3 Environment Variables
|
|
|
|
| Variable | Storage Type | Description |
|
|
|----------|-------------|-------------|
|
|
| `WORKSPACE` | Generic | Default workspace for all storages |
|
|
| `POSTGRES_WORKSPACE` | PostgreSQL | PostgreSQL-specific workspace |
|
|
| `REDIS_WORKSPACE` | Redis | Redis-specific workspace |
|
|
| `MONGODB_WORKSPACE` | MongoDB | MongoDB-specific workspace |
|
|
| `MILVUS_WORKSPACE` | Milvus | Milvus-specific workspace |
|
|
| `QDRANT_WORKSPACE` | Qdrant | Qdrant-specific workspace |
|
|
| `NEO4J_WORKSPACE` | Neo4j | Neo4j-specific workspace |
|
|
|
|
### 1.4 Limitations
|
|
|
|
The workspace feature provides **only storage isolation**:
|
|
- ❌ No user management
|
|
- ❌ No authentication/authorization
|
|
- ❌ No CRUD API for workspace management
|
|
- ❌ No metadata or descriptions
|
|
- ❌ No UI support
|
|
- ❌ No concept of multiple knowledge bases per workspace
|
|
|
|
---
|
|
|
|
## 2. Multi-Tenant Feature (Local Implementation)
|
|
|
|
### 2.1 Purpose
|
|
The Multi-Tenant feature provides a **complete SaaS multi-tenancy layer** on top of LightRAG, including:
|
|
- Organization (tenant) management
|
|
- Multiple knowledge bases per tenant
|
|
- Role-based access control (RBAC)
|
|
- User-tenant membership
|
|
- REST API for management
|
|
- WebUI for tenant/KB selection
|
|
|
|
### 2.2 Key Components
|
|
|
|
| Component | File | Purpose |
|
|
|-----------|------|---------|
|
|
| **Tenant Model** | `lightrag/models/tenant.py` | Data models for Tenant, KnowledgeBase, TenantContext |
|
|
| **TenantService** | `lightrag/services/tenant_service.py` | CRUD operations, access verification |
|
|
| **TenantRAGManager** | `lightrag/tenant_rag_manager.py` | Manages RAG instances per tenant/KB |
|
|
| **Tenant Routes** | `lightrag/api/routers/tenant_routes.py` | REST API endpoints |
|
|
| **Security** | `lightrag/security.py` | Validation, path traversal prevention |
|
|
|
|
### 2.3 How Tenant Uses Workspace
|
|
|
|
**Critical Integration Point:**
|
|
|
|
```python
|
|
# From lightrag/tenant_rag_manager.py
|
|
async def get_rag_instance(self, tenant_id: str, kb_id: str, user_id: str):
|
|
# SECURITY: Validate identifiers
|
|
tenant_id = validate_identifier(tenant_id, "tenant_id")
|
|
kb_id = validate_identifier(kb_id, "kb_id")
|
|
|
|
# Create composite workspace
|
|
tenant_working_dir, composite_workspace = validate_working_directory(
|
|
self.base_working_dir, tenant_id, kb_id
|
|
)
|
|
# composite_workspace = f"{tenant_id}:{kb_id}"
|
|
|
|
# Create RAG instance with composite workspace
|
|
instance = LightRAG(
|
|
working_dir=tenant_working_dir,
|
|
workspace=composite_workspace, # Uses workspace under the hood!
|
|
...
|
|
)
|
|
```
|
|
|
|
**The Tenant feature DELEGATES to Workspace for actual data isolation.**
|
|
|
|
### 2.4 Database Schema
|
|
|
|
**Management Tables (Tenant Layer):**
|
|
```sql
|
|
-- Tenant metadata
|
|
CREATE TABLE tenants (
|
|
tenant_id VARCHAR(255) UNIQUE NOT NULL,
|
|
name VARCHAR(255) NOT NULL,
|
|
description TEXT,
|
|
metadata JSONB,
|
|
...
|
|
);
|
|
|
|
-- Knowledge bases within tenants
|
|
CREATE TABLE knowledge_bases (
|
|
tenant_id VARCHAR(255) REFERENCES tenants(tenant_id),
|
|
kb_id VARCHAR(255) NOT NULL,
|
|
name VARCHAR(255) NOT NULL,
|
|
...
|
|
);
|
|
|
|
-- User access control
|
|
CREATE TABLE user_tenant_memberships (
|
|
user_id VARCHAR(255) NOT NULL,
|
|
tenant_id VARCHAR(255) REFERENCES tenants(tenant_id),
|
|
role VARCHAR(50) NOT NULL, -- owner, admin, editor, viewer
|
|
...
|
|
);
|
|
```
|
|
|
|
**Generated Columns for Integration:**
|
|
```sql
|
|
-- LIGHTRAG_* tables have generated columns to extract tenant/kb
|
|
ALTER TABLE LIGHTRAG_DOC_FULL ADD COLUMN
|
|
tenant_id VARCHAR(255) GENERATED ALWAYS AS (
|
|
CASE WHEN workspace LIKE '%:%'
|
|
THEN SPLIT_PART(workspace, ':', 1)
|
|
ELSE workspace END
|
|
) STORED,
|
|
kb_id VARCHAR(255) GENERATED ALWAYS AS (
|
|
CASE WHEN workspace LIKE '%:%'
|
|
THEN SPLIT_PART(workspace, ':', 2)
|
|
ELSE 'default' END
|
|
) STORED;
|
|
```
|
|
|
|
This allows querying data by tenant/KB without modifying the core storage implementation.
|
|
|
|
### 2.5 Roles and Permissions
|
|
|
|
| Role | Permissions |
|
|
|------|-------------|
|
|
| **Owner** | Full control, manage members, delete tenant |
|
|
| **Admin** | Create/delete KBs, manage documents |
|
|
| **Editor** | Create/update/delete documents, run queries |
|
|
| **Viewer** | Read documents, run queries |
|
|
|
|
---
|
|
|
|
## 3. Architecture Comparison
|
|
|
|
### 3.1 Feature Matrix
|
|
|
|
| Aspect | Workspace (Upstream) | Tenant (Local) |
|
|
|--------|---------------------|----------------|
|
|
| Data Isolation | ✅ Storage-level | ✅ Uses workspace |
|
|
| User Management | ❌ | ✅ Full RBAC |
|
|
| Authentication | ❌ | ✅ JWT tokens |
|
|
| Authorization | ❌ | ✅ Role-based |
|
|
| CRUD API | ❌ | ✅ REST endpoints |
|
|
| Multiple KBs | ❌ One per workspace | ✅ Many per tenant |
|
|
| Configuration | ❌ Global only | ✅ Per-tenant |
|
|
| Quotas/Limits | ❌ | ✅ Per-tenant |
|
|
| Metadata | ❌ | ✅ Rich metadata |
|
|
| UI Support | ❌ | ✅ Selection UI |
|
|
| File Storage | ✅ Subdirectories | ✅ Uses subdirs |
|
|
| Backward Compatible | ✅ | ✅ Single-tenant mode |
|
|
|
|
### 3.2 Layered Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ WebUI / REST API │
|
|
│ - Tenant/KB selection │
|
|
│ - Document upload, query interface │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ Authentication Layer │
|
|
│ - JWT token validation │
|
|
│ - User session management │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ Authorization Layer │
|
|
│ - TenantService.verify_user_access() │
|
|
│ - Role-based permission checks │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ TenantRAGManager (Instance Cache) │
|
|
│ - Manages per-tenant/KB LightRAG instances │
|
|
│ - LRU eviction for memory management │
|
|
│ - Creates composite_workspace = "{tenant}:{kb}" │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ LightRAG Core │
|
|
│ - Uses workspace for storage isolation │
|
|
│ - KV, Vector, Graph, DocStatus storages │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ PostgreSQL / Storage Backend │
|
|
│ - PRIMARY KEY (workspace, id) for isolation │
|
|
│ - Generated columns extract tenant_id, kb_id │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Findings
|
|
|
|
### 4.1 No Redundancy Found ✅
|
|
|
|
The Tenant feature is **complementary**, not redundant:
|
|
|
|
1. **Workspace** = Storage mechanism (HOW data is isolated)
|
|
2. **Tenant** = Application layer (WHO can access WHAT data)
|
|
|
|
They work together:
|
|
```
|
|
User Request → Tenant Auth → TenantRAGManager → workspace="{tenant}:{kb}" → Storage
|
|
```
|
|
|
|
### 4.2 Design Quality Assessment
|
|
|
|
| Criterion | Score | Notes |
|
|
|-----------|-------|-------|
|
|
| Separation of Concerns | ⭐⭐⭐⭐⭐ | Clean layered architecture |
|
|
| Code Reuse | ⭐⭐⭐⭐⭐ | Tenant uses workspace, doesn't duplicate |
|
|
| Security | ⭐⭐⭐⭐ | Validation, RBAC, path traversal prevention |
|
|
| Backward Compatibility | ⭐⭐⭐⭐⭐ | Single-tenant mode still works |
|
|
| Database Design | ⭐⭐⭐⭐ | Generated columns enable efficient queries |
|
|
|
|
### 4.3 Positive Design Decisions
|
|
|
|
1. **Composite Workspace Format:** Using `{tenant_id}:{kb_id}` as workspace allows multiple KBs per tenant while reusing storage isolation
|
|
|
|
2. **Generated Columns:** PostgreSQL generated columns (`tenant_id`, `kb_id`) enable efficient queries without schema changes to core tables
|
|
|
|
3. **Instance Caching:** TenantRAGManager caches RAG instances with LRU eviction for performance
|
|
|
|
4. **Security Validation:** `validate_identifier()` and `validate_working_directory()` prevent injection and path traversal
|
|
|
|
5. **Environment Toggle:** `LIGHTRAG_MULTI_TENANT` allows switching between single-tenant and multi-tenant modes
|
|
|
|
---
|
|
|
|
## 5. Recommendations
|
|
|
|
### 5.1 Improvements Needed
|
|
|
|
| Priority | Issue | Recommendation |
|
|
|----------|-------|----------------|
|
|
| **High** | Cascade Delete | Add cleanup of LIGHTRAG_* tables when tenant is deleted |
|
|
| **Medium** | Documentation | Document workspace naming convention clearly |
|
|
| **Medium** | Orphan Prevention | Add DB triggers to validate tenant/kb exists on insert |
|
|
| **Low** | Naming Clarity | Consider renaming `workspace` to `isolation_key` in docs |
|
|
|
|
### 5.2 Implementation: Cascade Delete
|
|
|
|
Add this to `TenantService.delete_tenant()`:
|
|
|
|
```python
|
|
async def delete_tenant(self, tenant_id: str) -> bool:
|
|
# Existing: delete KBs
|
|
kbs_result = await self.list_knowledge_bases(tenant_id)
|
|
for kb in kbs_result.get("items", []):
|
|
await self.delete_knowledge_base(tenant_id, kb.kb_id)
|
|
|
|
# NEW: Clean up LIGHTRAG_* tables
|
|
if hasattr(self.kv_storage, 'db') and self.kv_storage.db:
|
|
await self.kv_storage.db.execute(
|
|
"DELETE FROM LIGHTRAG_DOC_FULL WHERE workspace LIKE $1",
|
|
[f"{tenant_id}:%"]
|
|
)
|
|
# Repeat for other LIGHTRAG_* tables...
|
|
|
|
# Existing: delete tenant metadata
|
|
await self.kv_storage.delete([f"{self.tenant_namespace}:{tenant_id}"])
|
|
return True
|
|
```
|
|
|
|
### 5.3 Documentation Update
|
|
|
|
Add this to README or multi-tenancy docs:
|
|
|
|
```markdown
|
|
## Workspace vs Multi-Tenant
|
|
|
|
LightRAG supports two isolation modes:
|
|
|
|
### Single-Tenant Mode (Default)
|
|
- Set `WORKSPACE=myworkspace` environment variable
|
|
- All data stored under one workspace
|
|
- No authentication required
|
|
|
|
### Multi-Tenant Mode
|
|
- Set `LIGHTRAG_MULTI_TENANT=true`
|
|
- Workspace format: `{tenant_id}:{kb_id}`
|
|
- Full authentication and RBAC
|
|
- Multiple knowledge bases per tenant
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Conclusion
|
|
|
|
**The Multi-Tenant implementation is well-designed and NOT redundant with the Workspace feature.**
|
|
|
|
The architecture correctly layers:
|
|
1. **Workspace (upstream)** for storage-level isolation
|
|
2. **Tenant (local)** for application-level multi-tenancy
|
|
|
|
This follows best practices for extending open-source projects:
|
|
- Minimal changes to core code
|
|
- Clear abstraction layers
|
|
- Backward compatibility maintained
|
|
|
|
**Recommendation:** Approve the current implementation with minor improvements for cascade delete and documentation clarity.
|
|
|
|
---
|
|
|
|
## Appendix A: File Reference
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `lightrag/lightrag.py` | Core LightRAG class with workspace parameter |
|
|
| `lightrag/kg/postgres_impl.py` | PostgreSQL storage with workspace in PK |
|
|
| `lightrag/models/tenant.py` | Tenant, KnowledgeBase, TenantContext models |
|
|
| `lightrag/services/tenant_service.py` | Tenant/KB CRUD, access verification |
|
|
| `lightrag/tenant_rag_manager.py` | RAG instance management per tenant/KB |
|
|
| `lightrag/api/routers/tenant_routes.py` | REST API for tenant management |
|
|
| `lightrag/security.py` | Identifier validation, security utilities |
|
|
| `starter/init-postgres.sql` | Database schema with generated columns |
|
|
|
|
## Appendix B: Environment Variables
|
|
|
|
### Workspace Variables (Upstream)
|
|
- `WORKSPACE` - Default workspace name
|
|
- `POSTGRES_WORKSPACE` - PostgreSQL-specific workspace
|
|
- `REDIS_WORKSPACE` - Redis-specific workspace
|
|
- `MONGODB_WORKSPACE` - MongoDB-specific workspace
|
|
|
|
### Tenant Variables (Local)
|
|
- `LIGHTRAG_MULTI_TENANT` - Enable multi-tenant mode (true/false)
|
|
- `LIGHTRAG_SUPER_ADMIN_USERS` - Comma-separated super admin usernames
|
|
- `REQUIRE_USER_AUTH` - Require authentication (true/false)
|