Enable a single LightRAG server instance to serve multiple isolated workspaces via HTTP header-based routing. This allows multi-tenant SaaS deployments where each tenant's data is completely isolated. Key features: - Header-based workspace routing (LIGHTRAG-WORKSPACE, X-Workspace-ID fallback) - Process-local pool of LightRAG instances with LRU eviction - FastAPI dependency (get_rag) for workspace resolution per request - Full backward compatibility - existing deployments work unchanged - Strict multi-tenant mode option (LIGHTRAG_ALLOW_DEFAULT_WORKSPACE=false) - Configurable pool size (LIGHTRAG_MAX_WORKSPACES_IN_POOL) - Graceful shutdown with workspace finalization Configuration: - LIGHTRAG_DEFAULT_WORKSPACE: Default workspace (falls back to WORKSPACE) - LIGHTRAG_ALLOW_DEFAULT_WORKSPACE: Require explicit header when false - LIGHTRAG_MAX_WORKSPACES_IN_POOL: Max concurrent workspace instances (default: 50) Files: - New: lightrag/api/workspace_manager.py (core multi-workspace module) - New: tests/test_multi_workspace_server.py (17 unit tests) - New: render.yaml (Render deployment blueprint) - Modified: All route files to use get_rag dependency - Updated: README.md, env.example with documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
195 lines
7.7 KiB
Markdown
195 lines
7.7 KiB
Markdown
# Research: Multi-Workspace Server Support
|
|
|
|
**Date**: 2025-12-01
|
|
**Feature**: 001-multi-workspace-server
|
|
|
|
## Executive Summary
|
|
|
|
Research confirms that the existing LightRAG codebase provides solid foundation for multi-workspace support at the server level. The core library already has workspace isolation; the gap is purely at the API server layer.
|
|
|
|
## Research Findings
|
|
|
|
### 1. Existing Workspace Support in LightRAG Core
|
|
|
|
**Decision**: Leverage existing `workspace` parameter in `LightRAG` class
|
|
|
|
**Findings**:
|
|
- `LightRAG` class accepts `workspace: str` parameter (default: `os.getenv("WORKSPACE", "")`)
|
|
- Storage implementations use `get_final_namespace(namespace, workspace)` to create isolated keys
|
|
- Namespace format: `"{workspace}:{namespace}"` when workspace is set, else just `"{namespace}"`
|
|
- Pipeline status, locks, and in-memory state are all workspace-aware via `shared_storage.py`
|
|
- `DocumentManager` creates workspace-specific input directories
|
|
|
|
**Evidence**:
|
|
```python
|
|
# lightrag/lightrag.py
|
|
workspace: str = field(default_factory=lambda: os.getenv("WORKSPACE", ""))
|
|
|
|
# lightrag/kg/shared_storage.py
|
|
def get_final_namespace(namespace: str, workspace: str | None = None) -> str:
|
|
if workspace is None:
|
|
workspace = get_default_workspace()
|
|
if not workspace:
|
|
return namespace
|
|
return f"{workspace}:{namespace}"
|
|
```
|
|
|
|
**Implications**: No changes needed to core isolation; just need to instantiate separate `LightRAG` objects with different `workspace` values.
|
|
|
|
### 2. Current Server Architecture
|
|
|
|
**Decision**: Refactor from closure pattern to FastAPI dependency injection
|
|
|
|
**Findings**:
|
|
- Server creates a single global `LightRAG` instance in `create_app(args)`
|
|
- Routes receive the RAG instance via closure (factory function pattern):
|
|
```python
|
|
def create_document_routes(rag: LightRAG, doc_manager, api_key):
|
|
@router.post("/scan")
|
|
async def scan_for_new_documents(...):
|
|
# rag captured from enclosing scope
|
|
```
|
|
- This pattern prevents per-request workspace switching
|
|
|
|
**Alternative Considered**: Keep closure pattern and add workspace switching to existing instance
|
|
- **Rejected Because**: LightRAG instance configuration is immutable after creation; switching workspace would require re-initializing storage connections
|
|
|
|
**Chosen Approach**: Replace closure with FastAPI `Depends()` that resolves workspace → instance
|
|
|
|
### 3. Instance Pool Design
|
|
|
|
**Decision**: Use `asyncio.Lock` protected dictionary with LRU eviction
|
|
|
|
**Findings**:
|
|
- Python's `asyncio.Lock` is appropriate for protecting async operations
|
|
- LRU eviction via `collections.OrderedDict` or manual tracking
|
|
- Instance initialization is async (`await rag.initialize_storages()`)
|
|
- Concurrent requests for same new workspace must share initialization
|
|
|
|
**Pattern**:
|
|
```python
|
|
_instances: dict[str, LightRAG] = {}
|
|
_lock = asyncio.Lock()
|
|
_lru_order: list[str] = [] # Most recent at end
|
|
|
|
async def get_instance(workspace: str) -> LightRAG:
|
|
async with _lock:
|
|
if workspace in _instances:
|
|
# Move to end of LRU list
|
|
_lru_order.remove(workspace)
|
|
_lru_order.append(workspace)
|
|
return _instances[workspace]
|
|
|
|
# Evict if at capacity
|
|
if len(_instances) >= max_pool_size:
|
|
oldest = _lru_order.pop(0)
|
|
await _instances[oldest].finalize_storages()
|
|
del _instances[oldest]
|
|
|
|
# Create and initialize
|
|
instance = LightRAG(workspace=workspace, ...)
|
|
await instance.initialize_storages()
|
|
_instances[workspace] = instance
|
|
_lru_order.append(workspace)
|
|
return instance
|
|
```
|
|
|
|
**Alternative Considered**: Use `async_lru` library or `cachetools.TTLCache`
|
|
- **Rejected Because**: Adds external dependency; simple dict+lock is sufficient and well-understood
|
|
|
|
### 4. Header Routing Strategy
|
|
|
|
**Decision**: `LIGHTRAG-WORKSPACE` primary, `X-Workspace-ID` fallback
|
|
|
|
**Findings**:
|
|
- Custom headers conventionally use `X-` prefix, but this is deprecated per RFC 6648
|
|
- Product-specific headers (e.g., `LIGHTRAG-WORKSPACE`) are clearer and recommended
|
|
- Fallback to common convention (`X-Workspace-ID`) aids adoption
|
|
|
|
**Implementation**:
|
|
```python
|
|
def get_workspace_from_request(request: Request) -> str | None:
|
|
workspace = request.headers.get("LIGHTRAG-WORKSPACE", "").strip()
|
|
if not workspace:
|
|
workspace = request.headers.get("X-Workspace-ID", "").strip()
|
|
return workspace or None
|
|
```
|
|
|
|
### 5. Configuration Schema
|
|
|
|
**Decision**: Three new environment variables
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| `LIGHTRAG_DEFAULT_WORKSPACE` | str | `""` (from `WORKSPACE`) | Default workspace when no header |
|
|
| `LIGHTRAG_ALLOW_DEFAULT_WORKSPACE` | bool | `true` | If false, reject requests without header |
|
|
| `LIGHTRAG_MAX_WORKSPACES_IN_POOL` | int | `50` | Maximum concurrent workspace instances |
|
|
|
|
**Rationale**:
|
|
- `LIGHTRAG_` prefix namespaces new vars to avoid conflicts
|
|
- `ALLOW_DEFAULT_WORKSPACE=false` enables strict multi-tenant mode
|
|
- Default pool size of 50 balances memory vs. reinitialization overhead
|
|
|
|
### 6. Workspace Identifier Validation
|
|
|
|
**Decision**: Alphanumeric, hyphens, underscores; 1-64 characters
|
|
|
|
**Findings**:
|
|
- Must be safe for filesystem paths (workspace creates subdirectories)
|
|
- Must be safe for database keys (used in storage namespacing)
|
|
- Must prevent injection attacks (path traversal, SQL injection)
|
|
|
|
**Validation Regex**: `^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$`
|
|
- Starts with alphanumeric (prevents hidden directories like `.hidden`)
|
|
- Allows hyphens and underscores for readability
|
|
- Max 64 chars (reasonable for identifiers, fits in most DB column sizes)
|
|
|
|
### 7. Error Handling
|
|
|
|
**Decision**: Return 400 for missing/invalid workspace; 503 for initialization failures
|
|
|
|
| Scenario | HTTP Status | Error Message |
|
|
|----------|-------------|---------------|
|
|
| Missing header, default disabled | 400 | `Missing LIGHTRAG-WORKSPACE header` |
|
|
| Invalid workspace identifier | 400 | `Invalid workspace identifier: must be alphanumeric...` |
|
|
| Workspace initialization fails | 503 | `Failed to initialize workspace: {details}` |
|
|
|
|
### 8. Logging Strategy
|
|
|
|
**Decision**: Log workspace identifier at INFO level; never log credentials
|
|
|
|
**Implementation**:
|
|
- Log workspace on request: `logger.info(f"Request to workspace: {workspace}")`
|
|
- Log pool events: `logger.info(f"Initialized workspace: {workspace}")`
|
|
- Log evictions: `logger.info(f"Evicted workspace from pool: {workspace}")`
|
|
- NEVER log: API keys, storage credentials, auth tokens
|
|
|
|
### 9. Test Strategy
|
|
|
|
**Decision**: Pytest with markers following existing patterns
|
|
|
|
**Test Categories**:
|
|
1. **Unit tests** (`@pytest.mark.offline`): Workspace resolution, validation, pool logic
|
|
2. **Integration tests** (`@pytest.mark.integration`): Full request flow with mock LLM/embedding
|
|
3. **Backward compatibility tests** (`@pytest.mark.offline`): Single-workspace mode unchanged
|
|
|
|
**Key Test Scenarios**:
|
|
- Two workspaces → ingest document in A → query from B returns nothing
|
|
- No header + `ALLOW_DEFAULT_WORKSPACE=true` → uses default
|
|
- No header + `ALLOW_DEFAULT_WORKSPACE=false` → returns 400
|
|
- Pool at capacity → evicts LRU → new workspace initializes
|
|
|
|
## Resolved Questions
|
|
|
|
| Question | Resolution |
|
|
|----------|------------|
|
|
| How to handle concurrent init of same workspace? | `asyncio.Lock` ensures single initialization; others wait |
|
|
| Should evicted workspace finalize storage? | Yes, call `finalize_storages()` to release resources |
|
|
| How to share config between instances? | Clone config; only `workspace` differs per instance |
|
|
| Where to put pool management code? | New module `workspace_manager.py` |
|
|
|
|
## Next Steps
|
|
|
|
1. Create `data-model.md` with entity definitions
|
|
2. Document contracts (no new API endpoints; header-based routing is transparent)
|
|
3. Create `quickstart.md` for multi-workspace deployment
|