* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad. * Add multi-tenant testing strategy and ADR index documentation - Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details. - Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles. * feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise - Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints. - Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options. - Documented the LightRAG architecture, storage backends, LLM integrations, and query modes. - Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
39 KiB
ADR 002: Implementation Strategy - Multi-Tenant, Multi-Knowledge-Base Architecture
Status: Proposed
Overview
This document provides a detailed, step-by-step implementation strategy for the multi-tenant, multi-knowledge-base (MT-MKB) architecture. It includes specific code changes, file modifications, new components, and testing strategies.
Phase 1: Core Infrastructure (Weeks 1-3)
1.1 Database Schema Changes
Files to Create/Modify
- New:
lightrag/models/tenant.py- Tenant and KnowledgeBase models - New:
lightrag/models/__init__.py- Model exports - Modify: All storage implementations (PostgreSQL, Neo4j, MongoDB, etc.)
1.1.1 Tenant and KnowledgeBase Models
File: lightrag/models/tenant.py
from dataclasses import dataclass, field
from typing import Optional, Dict, Any
from datetime import datetime
from uuid import uuid4
@dataclass
class ResourceQuota:
"""Resource limits for a tenant"""
max_documents: int = 10000
max_storage_gb: float = 100.0
max_concurrent_queries: int = 10
max_monthly_api_calls: int = 100000
max_kb_per_tenant: int = 50
@dataclass
class TenantConfig:
"""Per-tenant configuration for models and parameters"""
llm_model: str = "gpt-4o-mini"
embedding_model: str = "bge-m3:latest"
rerank_model: Optional[str] = None
chunk_size: int = 1200
chunk_overlap: int = 100
top_k: int = 40
cosine_threshold: float = 0.2
enable_llm_cache: bool = True
custom_metadata: Dict[str, Any] = field(default_factory=dict)
@dataclass
class Tenant:
"""Tenant representation"""
tenant_id: str = field(default_factory=lambda: str(uuid4()))
tenant_name: str = ""
description: Optional[str] = None
config: TenantConfig = field(default_factory=TenantConfig)
quota: ResourceQuota = field(default_factory=ResourceQuota)
is_active: bool = True
created_at: datetime = field(default_factory=datetime.utcnow)
updated_at: datetime = field(default_factory=datetime.utcnow)
metadata: Dict[str, Any] = field(default_factory=dict)
@dataclass
class KnowledgeBase:
"""Knowledge Base representation"""
kb_id: str = field(default_factory=lambda: str(uuid4()))
tenant_id: str = "" # Foreign key to Tenant
kb_name: str = ""
description: Optional[str] = None
is_active: bool = True
doc_count: int = 0
storage_used_mb: float = 0.0
last_indexed_at: Optional[datetime] = None
created_at: datetime = field(default_factory=datetime.utcnow)
updated_at: datetime = field(default_factory=datetime.utcnow)
metadata: Dict[str, Any] = field(default_factory=dict)
@dataclass
class TenantContext:
"""Request-scoped tenant context"""
tenant_id: str
kb_id: str
user_id: str
role: str # admin, editor, viewer
permissions: Dict[str, bool] = field(default_factory=dict)
@property
def workspace_namespace(self) -> str:
"""Backward compatible workspace namespace"""
return f"{self.tenant_id}_{self.kb_id}"
1.1.2 PostgreSQL Schema Migration
File: lightrag/kg/migrations/001_add_tenant_schema.sql
-- Create tenants table
CREATE TABLE IF NOT EXISTS tenants (
tenant_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_name VARCHAR(255) NOT NULL,
description TEXT,
llm_model VARCHAR(255) DEFAULT 'gpt-4o-mini',
embedding_model VARCHAR(255) DEFAULT 'bge-m3:latest',
rerank_model VARCHAR(255),
chunk_size INTEGER DEFAULT 1200,
chunk_overlap INTEGER DEFAULT 100,
top_k INTEGER DEFAULT 40,
cosine_threshold FLOAT DEFAULT 0.2,
enable_llm_cache BOOLEAN DEFAULT TRUE,
max_documents INTEGER DEFAULT 10000,
max_storage_gb FLOAT DEFAULT 100.0,
max_concurrent_queries INTEGER DEFAULT 10,
max_monthly_api_calls INTEGER DEFAULT 100000,
is_active BOOLEAN DEFAULT TRUE,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by VARCHAR(255),
updated_by VARCHAR(255)
);
-- Create knowledge_bases table
CREATE TABLE IF NOT EXISTS knowledge_bases (
kb_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id) ON DELETE CASCADE,
kb_name VARCHAR(255) NOT NULL,
description TEXT,
doc_count INTEGER DEFAULT 0,
storage_used_mb FLOAT DEFAULT 0.0,
is_active BOOLEAN DEFAULT TRUE,
last_indexed_at TIMESTAMP,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
created_by VARCHAR(255),
updated_by VARCHAR(255),
UNIQUE(tenant_id, kb_name),
INDEX idx_tenant_kb (tenant_id, kb_id)
);
-- Create api_keys table (for per-tenant API keys)
CREATE TABLE IF NOT EXISTS api_keys (
api_key_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(tenant_id) ON DELETE CASCADE,
key_name VARCHAR(255) NOT NULL,
hashed_key VARCHAR(255) NOT NULL UNIQUE,
knowledge_base_ids UUID[] DEFAULT '{}', -- NULL = all KBs
permissions TEXT[] DEFAULT ARRAY['query', 'document:read'],
is_active BOOLEAN DEFAULT TRUE,
last_used_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
created_by VARCHAR(255)
);
-- Add tenant/kb columns to existing tables with defaults for backward compatibility
ALTER TABLE IF EXISTS kv_store_full_docs
ADD COLUMN IF NOT EXISTS tenant_id UUID DEFAULT NULL,
ADD COLUMN IF NOT EXISTS kb_id UUID DEFAULT NULL;
ALTER TABLE IF EXISTS kv_store_text_chunks
ADD COLUMN IF NOT EXISTS tenant_id UUID DEFAULT NULL,
ADD COLUMN IF NOT EXISTS kb_id UUID DEFAULT NULL;
ALTER TABLE IF EXISTS vector_store_entities
ADD COLUMN IF NOT EXISTS tenant_id UUID DEFAULT NULL,
ADD COLUMN IF NOT EXISTS kb_id UUID DEFAULT NULL;
-- Create indexes for tenant/kb filtering
CREATE INDEX IF NOT EXISTS idx_kv_store_tenant_kb ON kv_store_full_docs(tenant_id, kb_id);
CREATE INDEX IF NOT EXISTS idx_chunks_tenant_kb ON kv_store_text_chunks(tenant_id, kb_id);
CREATE INDEX IF NOT EXISTS idx_vectors_tenant_kb ON vector_store_entities(tenant_id, kb_id);
1.1.3 MongoDB Schema
File: lightrag/kg/migrations/mongo_001_add_tenant_collections.py
from typing import Any
import motor.motor_asyncio # type: ignore
async def migrate_add_tenant_collections(client: motor.motor_asyncio.AsyncMotorClient):
"""Add tenant and knowledge base collections to MongoDB"""
db = client.lightrag
# Create tenants collection with schema validation
await db.create_collection("tenants", validator={
"$jsonSchema": {
"bsonType": "object",
"required": ["tenant_id", "tenant_name", "created_at"],
"properties": {
"tenant_id": {"bsonType": "string"},
"tenant_name": {"bsonType": "string"},
"description": {"bsonType": "string"},
"llm_model": {"bsonType": "string", "default": "gpt-4o-mini"},
"embedding_model": {"bsonType": "string", "default": "bge-m3:latest"},
"is_active": {"bsonType": "bool", "default": True},
"metadata": {"bsonType": "object"},
"created_at": {"bsonType": "date"},
"updated_at": {"bsonType": "date"},
}
}
})
# Create knowledge_bases collection
await db.create_collection("knowledge_bases", validator={
"$jsonSchema": {
"bsonType": "object",
"required": ["kb_id", "tenant_id", "kb_name"],
"properties": {
"kb_id": {"bsonType": "string"},
"tenant_id": {"bsonType": "string"},
"kb_name": {"bsonType": "string"},
"description": {"bsonType": "string"},
"is_active": {"bsonType": "bool", "default": True},
"metadata": {"bsonType": "object"},
"created_at": {"bsonType": "date"},
}
}
})
# Create indexes
await db.tenants.create_index("tenant_id", unique=True)
await db.knowledge_bases.create_index([("tenant_id", 1), ("kb_id", 1)], unique=True)
await db.knowledge_bases.create_index([("tenant_id", 1)])
# Add tenant_id and kb_id indexes to existing collections
for collection_name in ["documents", "chunks", "entities"]:
col = db[collection_name]
await col.create_index([("tenant_id", 1), ("kb_id", 1)])
1.2 Create Tenant Management Service
File: lightrag/services/tenant_service.py
from typing import Optional, List, Dict, Any
from uuid import UUID
from lightrag.models.tenant import Tenant, KnowledgeBase, TenantContext, TenantConfig
from lightrag.base import BaseKVStorage
class TenantService:
"""Service for managing tenants and knowledge bases"""
def __init__(self, kv_storage: BaseKVStorage):
self.kv_storage = kv_storage
self.tenant_namespace = "__tenants__"
self.kb_namespace = "__knowledge_bases__"
async def create_tenant(self, tenant_name: str, config: Optional[TenantConfig] = None) -> Tenant:
"""Create a new tenant"""
tenant = Tenant(tenant_name=tenant_name, config=config or TenantConfig())
await self.kv_storage.upsert({
f"{self.tenant_namespace}:{tenant.tenant_id}": {
"id": tenant.tenant_id,
"name": tenant.tenant_name,
"config": asdict(tenant.config),
"quota": asdict(tenant.quota),
"is_active": tenant.is_active,
"created_at": tenant.created_at.isoformat(),
"updated_at": tenant.updated_at.isoformat(),
}
})
return tenant
async def get_tenant(self, tenant_id: str) -> Optional[Tenant]:
"""Retrieve a tenant by ID"""
data = await self.kv_storage.get_by_id(f"{self.tenant_namespace}:{tenant_id}")
if not data:
return None
return self._deserialize_tenant(data)
async def create_knowledge_base(self, tenant_id: str, kb_name: str, description: Optional[str] = None) -> KnowledgeBase:
"""Create a new knowledge base for a tenant"""
# Verify tenant exists
tenant = await self.get_tenant(tenant_id)
if not tenant:
raise ValueError(f"Tenant {tenant_id} not found")
kb = KnowledgeBase(
tenant_id=tenant_id,
kb_name=kb_name,
description=description
)
await self.kv_storage.upsert({
f"{self.kb_namespace}:{tenant_id}:{kb.kb_id}": {
"id": kb.kb_id,
"tenant_id": kb.tenant_id,
"kb_name": kb.kb_name,
"description": kb.description,
"is_active": kb.is_active,
"created_at": kb.created_at.isoformat(),
}
})
return kb
async def list_knowledge_bases(self, tenant_id: str) -> List[KnowledgeBase]:
"""List all knowledge bases for a tenant"""
# Implementation depends on storage backend
pass
def _deserialize_tenant(self, data: Dict[str, Any]) -> Tenant:
"""Convert stored data to Tenant object"""
pass
1.3 Update Storage Base Classes
File: lightrag/base.py (Modifications)
Add tenant context to all StorageNameSpace classes:
@dataclass
class StorageNameSpace(ABC):
namespace: str
workspace: str # Keep for backward compatibility
global_config: dict[str, Any]
tenant_id: Optional[str] = None # NEW
kb_id: Optional[str] = None # NEW
async def initialize(self):
"""Initialize the storage"""
pass
# Helper method to build composite workspace key
def _get_composite_workspace(self) -> str:
"""Build workspace key with tenant/kb isolation"""
if self.tenant_id and self.kb_id:
return f"{self.tenant_id}_{self.kb_id}"
elif self.workspace:
return self.workspace
else:
return "_" # Default for backward compatibility
1.4 Update Storage Implementations
PostgreSQL Storage Update
File: lightrag/kg/postgres_impl.py (Key modifications)
# Modify all queries to include tenant/kb filters
class PGKVStorage(BaseKVStorage):
async def upsert(self, data: dict[str, dict[str, Any]]) -> None:
# Add tenant/kb columns when upserting
for key, value in data.items():
if self.tenant_id and self.kb_id:
value['tenant_id'] = self.tenant_id
value['kb_id'] = self.kb_id
# Original upsert logic with tenant/kb in WHERE clause
# ... existing code ...
async def query_with_tenant_filter(self, query: str) -> List[Any]:
"""Execute query with automatic tenant/kb filtering"""
if self.tenant_id and self.kb_id:
# Add WHERE clause filters
if "WHERE" in query:
query += f" AND tenant_id = $1 AND kb_id = $2"
else:
query += f" WHERE tenant_id = $1 AND kb_id = $2"
return await self._execute(query, [self.tenant_id, self.kb_id])
return await self._execute(query)
class PGVectorStorage(BaseVectorStorage):
async def query(self, query: str, top_k: int, query_embedding: list[float] = None) -> list[dict[str, Any]]:
# Add tenant/kb filtering
sql = """
SELECT * FROM vector_store_entities
WHERE tenant_id = $1 AND kb_id = $2
AND vector <-> $3 < $4
ORDER BY vector <-> $3
LIMIT $5
"""
# Filter results by tenant/kb
results = await self._execute(sql, [self.tenant_id, self.kb_id, query_embedding, threshold, top_k])
return results
JSON Storage Update
File: lightrag/kg/json_kv_impl.py (Key modifications)
@dataclass
class JsonKVStorage(BaseKVStorage):
async def _get_file_path(self) -> str:
"""Get file path with tenant/kb isolation"""
working_dir = self.global_config["working_dir"]
# Build tenant/kb specific directory
if self.tenant_id and self.kb_id:
dir_path = os.path.join(working_dir, self.tenant_id, self.kb_id)
file_name = f"kv_store_{self.namespace}.json"
elif self.workspace:
dir_path = os.path.join(working_dir, self.workspace)
file_name = f"kv_store_{self.namespace}.json"
else:
dir_path = working_dir
file_name = f"kv_store_{self.namespace}.json"
os.makedirs(dir_path, exist_ok=True)
return os.path.join(dir_path, file_name)
async def upsert(self, data: dict[str, dict[str, Any]]) -> None:
"""Insert with tenant/kb context"""
# Add tenant/kb to metadata
for key, value in data.items():
if self.tenant_id:
value['__tenant_id__'] = self.tenant_id
if self.kb_id:
value['__kb_id__'] = self.kb_id
# Original upsert logic
# ... existing code ...
Phase 2: API Layer (Weeks 2-3)
2.1 Create Tenant-Aware Request Models
File: lightrag/api/models/requests.py (New)
from pydantic import BaseModel, Field, validator
from typing import Optional, List
from uuid import UUID
class TenantRequest(BaseModel):
"""Base model for tenant-scoped requests"""
tenant_id: str = Field(..., description="Tenant identifier")
kb_id: str = Field(..., description="Knowledge base identifier")
class CreateTenantRequest(BaseModel):
tenant_name: str = Field(..., min_length=1, max_length=255)
description: Optional[str] = None
llm_model: Optional[str] = None
embedding_model: Optional[str] = None
class CreateKnowledgeBaseRequest(BaseModel):
kb_name: str = Field(..., min_length=1, max_length=255)
description: Optional[str] = None
class DocumentAddRequest(TenantRequest):
"""Request to add documents to a knowledge base"""
document_path: str = Field(..., description="Path to document")
metadata: Optional[dict] = None
class QueryRequest(TenantRequest):
"""Request to query a knowledge base"""
query: str = Field(..., min_length=3)
mode: str = Field(default="mix", regex="local|global|hybrid|naive|mix|bypass")
top_k: Optional[int] = None
stream: Optional[bool] = None
2.2 Create Tenant-Aware Dependency Injection
File: lightrag/api/dependencies.py (New)
from fastapi import Depends, HTTPException, status, Path, Header
from typing import Optional
from lightrag.models.tenant import TenantContext
from lightrag.services.tenant_service import TenantService
from lightrag.api.auth import validate_token, get_tenant_from_token
async def get_tenant_context(
tenant_id: str = Path(..., description="Tenant ID"),
kb_id: str = Path(..., description="Knowledge Base ID"),
authorization: Optional[str] = Header(None),
api_key: Optional[str] = Header(None, alias="X-API-Key"),
tenant_service: TenantService = Depends(get_tenant_service),
) -> TenantContext:
"""
Dependency to extract and validate tenant context from request.
Verifies user has access to the specified tenant/KB.
"""
# Determine authentication method
if authorization and authorization.startswith("Bearer "):
# JWT token authentication
token = authorization[7:]
try:
token_data = await validate_token(token)
except Exception as e:
raise HTTPException(status_code=401, detail="Invalid token")
user_id = token_data.get("sub")
token_tenant_id = token_data.get("tenant_id")
# Verify user's tenant matches request tenant
if token_tenant_id != tenant_id:
raise HTTPException(status_code=403, detail="Access denied: tenant mismatch")
# Verify user can access this KB
accessible_kbs = token_data.get("knowledge_base_ids", [])
if kb_id not in accessible_kbs and "*" not in accessible_kbs:
raise HTTPException(status_code=403, detail="Access denied: KB not accessible")
elif api_key:
# API key authentication
user_id = await validate_api_key(api_key, tenant_id, kb_id)
if not user_id:
raise HTTPException(status_code=401, detail="Invalid API key")
else:
raise HTTPException(status_code=401, detail="Missing authentication")
# Verify tenant and KB exist
tenant = await tenant_service.get_tenant(tenant_id)
if not tenant or not tenant.is_active:
raise HTTPException(status_code=404, detail="Tenant not found")
# Return validated context
return TenantContext(
tenant_id=tenant_id,
kb_id=kb_id,
user_id=user_id,
role=token_data.get("role", "viewer"),
permissions=token_data.get("permissions", {})
)
async def get_tenant_service() -> TenantService:
"""Get singleton tenant service"""
# This should be initialized at app startup
pass
2.3 Create Tenant-Aware API Routes
File: lightrag/api/routers/tenant_routes.py (New)
from fastapi import APIRouter, Depends, HTTPException
from typing import List, Optional
from lightrag.api.models.requests import CreateTenantRequest, CreateKnowledgeBaseRequest
from lightrag.api.dependencies import get_tenant_context, get_tenant_service
from lightrag.models.tenant import TenantContext
router = APIRouter(prefix="/api/v1/tenants", tags=["tenants"])
@router.post("")
async def create_tenant(
request: CreateTenantRequest,
tenant_service = Depends(get_tenant_service),
) -> dict:
"""Create a new tenant"""
tenant = await tenant_service.create_tenant(
tenant_name=request.tenant_name,
config=request.dict(exclude_none=True)
)
return {"status": "success", "data": tenant}
@router.get("/{tenant_id}")
async def get_tenant(
tenant_context: TenantContext = Depends(get_tenant_context),
tenant_service = Depends(get_tenant_service),
) -> dict:
"""Get tenant details"""
tenant = await tenant_service.get_tenant(tenant_context.tenant_id)
return {"status": "success", "data": tenant}
@router.post("/{tenant_id}/knowledge-bases")
async def create_knowledge_base(
request: CreateKnowledgeBaseRequest,
tenant_context: TenantContext = Depends(get_tenant_context),
tenant_service = Depends(get_tenant_service),
) -> dict:
"""Create a knowledge base in a tenant"""
kb = await tenant_service.create_knowledge_base(
tenant_id=tenant_context.tenant_id,
kb_name=request.kb_name,
description=request.description
)
return {"status": "success", "data": kb}
@router.get("/{tenant_id}/knowledge-bases")
async def list_knowledge_bases(
tenant_context: TenantContext = Depends(get_tenant_context),
tenant_service = Depends(get_tenant_service),
) -> dict:
"""List all knowledge bases in a tenant"""
kbs = await tenant_service.list_knowledge_bases(tenant_context.tenant_id)
return {"status": "success", "data": kbs}
2.4 Update Query Routes for Multi-Tenancy
File: lightrag/api/routers/query_routes.py (Modifications)
@router.post("/api/v1/tenants/{tenant_id}/knowledge-bases/{kb_id}/query")
async def query_knowledge_base(
request: QueryRequest,
tenant_context: TenantContext = Depends(get_tenant_context),
rag_manager = Depends(get_rag_instance_manager),
) -> QueryResponse:
"""
Query a specific knowledge base with tenant isolation.
The request context is automatically scoped to the tenant/KB
via dependency injection.
"""
# Get tenant-specific RAG instance (with per-tenant config)
rag = await rag_manager.get_rag_instance(
tenant_id=tenant_context.tenant_id,
kb_id=tenant_context.kb_id
)
# Execute query with tenant context
result = await rag.aquery(
query=request.query,
param=QueryParam(mode=request.mode, top_k=request.top_k or 40),
# Inject tenant context into query execution
tenant_context=tenant_context
)
return QueryResponse(response=result["response"])
2.5 Update Document Routes for Multi-Tenancy
File: lightrag/api/routers/document_routes.py (Modifications)
@router.post("/api/v1/tenants/{tenant_id}/knowledge-bases/{kb_id}/documents/add")
async def add_document(
file: UploadFile = File(...),
tenant_context: TenantContext = Depends(get_tenant_context),
rag_manager = Depends(get_rag_instance_manager),
) -> dict:
"""
Add a document to a specific knowledge base.
Tenant/KB context is enforced through dependency injection.
"""
# Get tenant-specific RAG instance
rag = await rag_manager.get_rag_instance(
tenant_id=tenant_context.tenant_id,
kb_id=tenant_context.kb_id
)
# Insert document with tenant/KB context automatically
result = await rag.ainsert(
file_path=file.filename,
tenant_id=tenant_context.tenant_id,
kb_id=tenant_context.kb_id
)
return {"status": "success", "data": result}
@router.delete("/api/v1/tenants/{tenant_id}/knowledge-bases/{kb_id}/documents/{doc_id}")
async def delete_document(
doc_id: str,
tenant_context: TenantContext = Depends(get_tenant_context),
rag_manager = Depends(get_rag_instance_manager),
) -> dict:
"""Delete document with tenant isolation"""
rag = await rag_manager.get_rag_instance(
tenant_id=tenant_context.tenant_id,
kb_id=tenant_context.kb_id
)
# Verify document belongs to this tenant/KB before deletion
result = await rag.adelete_by_doc_id(
doc_id=doc_id,
tenant_id=tenant_context.tenant_id,
kb_id=tenant_context.kb_id
)
return {"status": "success", "message": "Document deleted"}
Phase 3: LightRAG Integration (Weeks 2-4)
3.1 Create Tenant-Aware LightRAG Instance Manager
File: lightrag/tenant_rag_manager.py (New)
from typing import Dict, Optional, Tuple
from lightrag import LightRAG
from lightrag.models.tenant import TenantContext, TenantConfig
from lightrag.services.tenant_service import TenantService
import asyncio
from functools import lru_cache
class TenantRAGManager:
"""
Manages LightRAG instances per tenant/KB combination.
Handles caching, initialization, and cleanup of instances.
"""
def __init__(
self,
base_working_dir: str,
tenant_service: TenantService,
max_cached_instances: int = 100,
):
self.base_working_dir = base_working_dir
self.tenant_service = tenant_service
self.max_cached_instances = max_cached_instances
self._instances: Dict[Tuple[str, str], LightRAG] = {}
self._lock = asyncio.Lock()
async def get_rag_instance(
self,
tenant_id: str,
kb_id: str,
) -> LightRAG:
"""
Get or create a LightRAG instance for a tenant/KB combination.
Instances are cached to avoid repeated initialization.
Each instance uses a separate namespace for complete isolation.
"""
cache_key = (tenant_id, kb_id)
# Return cached instance if exists
if cache_key in self._instances:
instance = self._instances[cache_key]
if instance._storages_status.value >= 1: # INITIALIZED
return instance
async with self._lock:
# Double-check locking pattern
if cache_key in self._instances:
return self._instances[cache_key]
# Get tenant config
tenant = await self.tenant_service.get_tenant(tenant_id)
if not tenant:
raise ValueError(f"Tenant {tenant_id} not found")
# Create tenant-specific working directory
tenant_working_dir = os.path.join(
self.base_working_dir,
tenant_id,
kb_id
)
# Create LightRAG instance with tenant-specific config and workspace
instance = LightRAG(
working_dir=tenant_working_dir,
workspace=f"{tenant_id}_{kb_id}", # Backward compatible workspace
# Use tenant-specific models and settings
llm_model_name=tenant.config.llm_model,
embedding_func=self._get_embedding_func(tenant),
llm_model_func=self._get_llm_func(tenant),
# ... other tenant-specific configurations ...
)
# Initialize storages
await instance.initialize_storages()
# Cache the instance
if len(self._instances) >= self.max_cached_instances:
# Evict oldest entry
oldest_key = next(iter(self._instances))
await self._instances[oldest_key].finalize_storages()
del self._instances[oldest_key]
self._instances[cache_key] = instance
return instance
async def cleanup_instance(self, tenant_id: str, kb_id: str) -> None:
"""Clean up and remove a cached instance"""
cache_key = (tenant_id, kb_id)
if cache_key in self._instances:
await self._instances[cache_key].finalize_storages()
del self._instances[cache_key]
async def cleanup_all(self) -> None:
"""Clean up all cached instances"""
for instance in self._instances.values():
await instance.finalize_storages()
self._instances.clear()
def _get_embedding_func(self, tenant: TenantConfig):
"""Create embedding function with tenant-specific model"""
# Use tenant's embedding model configuration
# Can be overridden from global config
pass
def _get_llm_func(self, tenant: TenantConfig):
"""Create LLM function with tenant-specific model"""
# Use tenant's LLM model configuration
pass
3.2 Modify LightRAG Query Methods
File: lightrag/lightrag.py (Key modifications)
async def aquery(
self,
query: str,
param: QueryParam,
tenant_context: Optional[TenantContext] = None, # NEW
) -> QueryResult:
"""
Query with optional tenant context for filtering.
Args:
query: The query string
param: Query parameters
tenant_context: Tenant context for data isolation (NEW)
"""
# If tenant context provided, inject it into all storage operations
if tenant_context:
# Temporarily set tenant/kb context on storages
original_tenant = getattr(self, '_tenant_id', None)
original_kb = getattr(self, '_kb_id', None)
self._tenant_id = tenant_context.tenant_id
self._kb_id = tenant_context.kb_id
try:
# Existing query logic
# All storage operations will now respect tenant/kb context
result = await self._execute_query(query, param)
return result
finally:
# Restore original context
if tenant_context:
self._tenant_id = original_tenant
self._kb_id = original_kb
async def ainsert(
self,
file_path: str,
tenant_id: Optional[str] = None, # NEW
kb_id: Optional[str] = None, # NEW
**kwargs,
) -> InsertionResult:
"""Insert documents with optional tenant/KB context"""
if tenant_id:
self._tenant_id = tenant_id
if kb_id:
self._kb_id = kb_id
# Existing insertion logic
# Documents will be stored with tenant/kb metadata
result = await self._process_documents(file_path, **kwargs)
return result
Phase 4: Testing & Deployment (Week 4)
4.1 Unit Tests
File: tests/test_tenant_isolation.py (New)
import pytest
from lightrag.models.tenant import Tenant, KnowledgeBase, TenantContext
from lightrag.services.tenant_service import TenantService
@pytest.mark.asyncio
class TestTenantIsolation:
async def test_tenant_creation(self, tenant_service):
"""Test creating a tenant"""
tenant = await tenant_service.create_tenant("Test Tenant")
assert tenant.tenant_name == "Test Tenant"
assert tenant.is_active is True
async def test_knowledge_base_creation(self, tenant_service):
"""Test creating KB in a tenant"""
tenant = await tenant_service.create_tenant("Tenant 1")
kb = await tenant_service.create_knowledge_base(
tenant.tenant_id,
"KB 1"
)
assert kb.tenant_id == tenant.tenant_id
async def test_cross_tenant_data_isolation(self, tenant_service, rag_manager):
"""Test that data from one tenant cannot be accessed by another"""
# Create two tenants
tenant1 = await tenant_service.create_tenant("Tenant 1")
tenant2 = await tenant_service.create_tenant("Tenant 2")
# Create KBs
kb1 = await tenant_service.create_knowledge_base(tenant1.tenant_id, "KB1")
kb2 = await tenant_service.create_knowledge_base(tenant2.tenant_id, "KB2")
# Add documents to each KB
rag1 = await rag_manager.get_rag_instance(tenant1.tenant_id, kb1.kb_id)
rag2 = await rag_manager.get_rag_instance(tenant2.tenant_id, kb2.kb_id)
# Verify documents are isolated
# Query in tenant2 should not return documents from tenant1
pass
async def test_query_with_tenant_context(self, rag_manager):
"""Test queries include tenant context"""
context = TenantContext(
tenant_id="tenant1",
kb_id="kb1",
user_id="user1",
role="admin"
)
# Execute query with context
# Verify only tenant1/kb1 data returned
pass
4.2 Integration Tests
File: tests/test_api_tenant_routes.py (New)
import pytest
from fastapi.testclient import TestClient
@pytest.mark.asyncio
class TestTenantAPIs:
async def test_create_tenant_endpoint(self, client: TestClient, auth_token):
"""Test POST /api/v1/tenants"""
response = client.post(
"/api/v1/tenants",
json={"tenant_name": "New Tenant"},
headers={"Authorization": f"Bearer {auth_token}"}
)
assert response.status_code == 201
data = response.json()
assert data["status"] == "success"
assert "tenant_id" in data["data"]
async def test_create_knowledge_base_endpoint(self, client: TestClient, tenant_id, auth_token):
"""Test POST /api/v1/tenants/{tenant_id}/knowledge-bases"""
response = client.post(
f"/api/v1/tenants/{tenant_id}/knowledge-bases",
json={"kb_name": "KB 1"},
headers={"Authorization": f"Bearer {auth_token}"}
)
assert response.status_code == 201
data = response.json()
assert "kb_id" in data["data"]
async def test_cross_tenant_access_denied(self, client: TestClient, tenant1_token, tenant2_id):
"""Test accessing tenant2 with tenant1 token fails"""
response = client.get(
f"/api/v1/tenants/{tenant2_id}",
headers={"Authorization": f"Bearer {tenant1_token}"}
)
assert response.status_code == 403
async def test_query_with_tenant_isolation(self, client: TestClient, tenant_id, kb_id, auth_token):
"""Test query is isolated to tenant/KB"""
# Add document to KB
# Query should only search that KB
pass
4.3 Migration Script
File: scripts/migrate_workspace_to_tenant.py (New)
"""
Migration script to convert existing workspaces to multi-tenant architecture.
Creates a default tenant for each workspace.
"""
import asyncio
import argparse
from lightrag.services.tenant_service import TenantService
from lightrag.models.tenant import Tenant
import uuid
async def migrate_workspaces_to_tenants(
working_dir: str,
storage_config: dict
):
"""
Migrate existing workspace-based deployments to multi-tenant.
For each workspace directory:
1. Create a tenant with that workspace name
2. Create a default KB
3. Map workspace data to tenant/KB
"""
tenant_service = TenantService(storage_config)
# Scan working directory for existing workspaces
workspaces = [] # Get from directory structure
for workspace_name in workspaces:
print(f"Migrating workspace: {workspace_name}")
# Create tenant from workspace
tenant = await tenant_service.create_tenant(
tenant_name=workspace_name or "default",
metadata={"migrated_from_workspace": workspace_name}
)
# Create default KB
kb = await tenant_service.create_knowledge_base(
tenant.tenant_id,
kb_name="default",
description="Default knowledge base (migrated from workspace)"
)
# Migrate data from workspace files to tenant/KB storage
# Update storage paths and metadata
print(f" ✓ Created tenant {tenant.tenant_id}")
print(f" ✓ Created KB {kb.kb_id}")
print("\nMigration complete!")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Migrate workspaces to multi-tenant")
parser.add_argument("--working-dir", required=True)
args = parser.parse_args()
asyncio.run(migrate_workspaces_to_tenants(args.working_dir, {}))
4.4 Deployment Checklist
## Pre-Deployment Checklist
### Database & Schema
- [ ] Database migration scripts tested on staging
- [ ] Backup of production database created
- [ ] Index creation verified on prod-like data volume
- [ ] Schema rollback scripts prepared
### Code Changes
- [ ] All unit tests passing (100% coverage of new code)
- [ ] Integration tests passing
- [ ] Load testing completed (1000+ tenant/KB combinations)
- [ ] Security audit completed
- [ ] Code review approved by 2+ team members
### Documentation
- [ ] API documentation updated
- [ ] Migration guide prepared
- [ ] Tenant management guide written
- [ ] Troubleshooting guide created
### Deployment
- [ ] Feature flag to enable multi-tenancy (default: off)
- [ ] Gradual rollout: 10% → 50% → 100%
- [ ] Health checks monitor tenant isolation
- [ ] Rollback plan tested
- [ ] Team trained on new architecture
- [ ] On-call engineer assigned for release window
### Post-Deployment
- [ ] Monitor error rates and latency
- [ ] Verify tenant data isolation (spot checks)
- [ ] Collect feedback from early adopters
- [ ] Performance baseline established
Configuration Examples
Environment Variables
# Tenant Manager Configuration
TENANT_ENABLED=true
MAX_CACHED_INSTANCES=100
TENANT_CONFIG_SYNC_INTERVAL=300
# Storage Configuration (remains the same)
LIGHTRAG_KV_STORAGE=PGKVStorage
LIGHTRAG_VECTOR_STORAGE=PGVectorStorage
LIGHTRAG_GRAPH_STORAGE=PGGraphStorage
# Tenant Service Configuration
TENANT_SERVICE_STORAGE=PostgreSQL
TENANT_DB_HOST=localhost
TENANT_DB_PORT=5432
TENANT_DB_NAME=lightrag_tenants
Python Configuration
# In config.py or app initialization
class TenantConfig:
ENABLED = os.getenv("TENANT_ENABLED", "false").lower() == "true"
MAX_CACHED_INSTANCES = int(os.getenv("MAX_CACHED_INSTANCES", "100"))
SYNC_INTERVAL = int(os.getenv("TENANT_CONFIG_SYNC_INTERVAL", "300"))
# Storage for tenant metadata
STORAGE_TYPE = os.getenv("TENANT_SERVICE_STORAGE", "PostgreSQL")
STORAGE_CONFIG = {
"host": os.getenv("TENANT_DB_HOST"),
"port": int(os.getenv("TENANT_DB_PORT", "5432")),
"database": os.getenv("TENANT_DB_NAME", "lightrag_tenants"),
}
Testing Strategy
Unit Testing (40% of tests)
- Tenant service operations
- Storage isolation logic
- Configuration management
- Authentication/authorization
Integration Testing (40% of tests)
- API endpoint functionality
- Cross-component data flow
- Tenant context propagation
- Error handling
System Testing (20% of tests)
- End-to-end workflows per tenant
- Multi-tenant concurrent operations
- Resource quota enforcement
- Performance under load
Performance Targets
| Metric | Target | Measurement |
|---|---|---|
| Query latency | <10ms overhead | Per query with/without tenant filtering |
| API response time | <200ms p99 | Single query endpoint |
| Storage overhead | <3% | Per-tenant metadata vs. data |
| Memory per instance | <500MB | Per cached LightRAG instance |
| Tenant isolation overhead | <15% | Compare to single-tenant baseline |
Known Limitations & Future Work
Phase 1 Limitations
- No cross-tenant queries or data sharing
- No tenant-to-tenant access delegation
- No per-tenant storage encryption
- No real-time multi-region replication
- No automatic tenant data backup management
Future Enhancements (Phase 2)
- Cross-tenant sharing: Allow tenants to share specific KB data
- Advanced RBAC: Support custom roles and fine-grained permissions
- Encryption at rest: Per-tenant data encryption
- Audit logging: Comprehensive audit trail with retention policies
- Multi-region: Replicate tenant data across regions
- Tenant quotas: Storage, API call, and compute quotas with enforcement
- SSO integration: Enterprise SSO (SAML, OIDC) support
Document Version: 1.0
Last Updated: 2025-11-20
Phase Duration: 3-4 weeks
Estimated Effort: 160 developer hours
Team Size: 2-3 backend engineers