Clément THOMAS 62b2a71dda feat(api): add multi-workspace server support for multi-tenant deployments

Enable a single LightRAG server instance to serve multiple isolated workspaces
via HTTP header-based routing. This allows multi-tenant SaaS deployments where
each tenant's data is completely isolated.

Key features:
- Header-based workspace routing (LIGHTRAG-WORKSPACE, X-Workspace-ID fallback)
- Process-local pool of LightRAG instances with LRU eviction
- FastAPI dependency (get_rag) for workspace resolution per request
- Full backward compatibility - existing deployments work unchanged
- Strict multi-tenant mode option (LIGHTRAG_ALLOW_DEFAULT_WORKSPACE=false)
- Configurable pool size (LIGHTRAG_MAX_WORKSPACES_IN_POOL)
- Graceful shutdown with workspace finalization

Configuration:
- LIGHTRAG_DEFAULT_WORKSPACE: Default workspace (falls back to WORKSPACE)
- LIGHTRAG_ALLOW_DEFAULT_WORKSPACE: Require explicit header when false
- LIGHTRAG_MAX_WORKSPACES_IN_POOL: Max concurrent workspace instances (default: 50)

Files:
- New: lightrag/api/workspace_manager.py (core multi-workspace module)
- New: tests/test_multi_workspace_server.py (17 unit tests)
- New: render.yaml (Render deployment blueprint)
- Modified: All route files to use get_rag dependency
- Updated: README.md, env.example with documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-01 12:07:22 +01:00

7.7 KiB

Raw Blame History

Research: Multi-Workspace Server Support

Date: 2025-12-01 Feature: 001-multi-workspace-server

Executive Summary

Research confirms that the existing LightRAG codebase provides solid foundation for multi-workspace support at the server level. The core library already has workspace isolation; the gap is purely at the API server layer.

Research Findings

1. Existing Workspace Support in LightRAG Core

Decision: Leverage existing workspace parameter in LightRAG class

Findings:

LightRAG class accepts workspace: str parameter (default: os.getenv("WORKSPACE", ""))
Storage implementations use get_final_namespace(namespace, workspace) to create isolated keys
Namespace format: "{workspace}:{namespace}" when workspace is set, else just "{namespace}"
Pipeline status, locks, and in-memory state are all workspace-aware via shared_storage.py
DocumentManager creates workspace-specific input directories

Evidence:

# lightrag/lightrag.py
workspace: str = field(default_factory=lambda: os.getenv("WORKSPACE", ""))

# lightrag/kg/shared_storage.py
def get_final_namespace(namespace: str, workspace: str | None = None) -> str:
    if workspace is None:
        workspace = get_default_workspace()
    if not workspace:
        return namespace
    return f"{workspace}:{namespace}"

Implications: No changes needed to core isolation; just need to instantiate separate LightRAG objects with different workspace values.

2. Current Server Architecture

Decision: Refactor from closure pattern to FastAPI dependency injection

Findings:

Server creates a single global LightRAG instance in create_app(args)

Routes receive the RAG instance via closure (factory function pattern):

def create_document_routes(rag: LightRAG, doc_manager, api_key):
    @router.post("/scan")
    async def scan_for_new_documents(...):
        # rag captured from enclosing scope

This pattern prevents per-request workspace switching

Alternative Considered: Keep closure pattern and add workspace switching to existing instance

Rejected Because: LightRAG instance configuration is immutable after creation; switching workspace would require re-initializing storage connections

Chosen Approach: Replace closure with FastAPI Depends() that resolves workspace → instance

3. Instance Pool Design

Decision: Use asyncio.Lock protected dictionary with LRU eviction

Findings:

Python's asyncio.Lock is appropriate for protecting async operations
LRU eviction via collections.OrderedDict or manual tracking
Instance initialization is async (await rag.initialize_storages())
Concurrent requests for same new workspace must share initialization

Pattern:

_instances: dict[str, LightRAG] = {}
_lock = asyncio.Lock()
_lru_order: list[str] = []  # Most recent at end

async def get_instance(workspace: str) -> LightRAG:
    async with _lock:
        if workspace in _instances:
            # Move to end of LRU list
            _lru_order.remove(workspace)
            _lru_order.append(workspace)
            return _instances[workspace]

        # Evict if at capacity
        if len(_instances) >= max_pool_size:
            oldest = _lru_order.pop(0)
            await _instances[oldest].finalize_storages()
            del _instances[oldest]

        # Create and initialize
        instance = LightRAG(workspace=workspace, ...)
        await instance.initialize_storages()
        _instances[workspace] = instance
        _lru_order.append(workspace)
        return instance

Alternative Considered: Use async_lru library or cachetools.TTLCache

Rejected Because: Adds external dependency; simple dict+lock is sufficient and well-understood

4. Header Routing Strategy

Decision: LIGHTRAG-WORKSPACE primary, X-Workspace-ID fallback

Findings:

Custom headers conventionally use X- prefix, but this is deprecated per RFC 6648
Product-specific headers (e.g., LIGHTRAG-WORKSPACE) are clearer and recommended
Fallback to common convention (X-Workspace-ID) aids adoption

Implementation:

def get_workspace_from_request(request: Request) -> str | None:
    workspace = request.headers.get("LIGHTRAG-WORKSPACE", "").strip()
    if not workspace:
        workspace = request.headers.get("X-Workspace-ID", "").strip()
    return workspace or None

5. Configuration Schema

Decision: Three new environment variables

Variable	Type	Default	Description
`LIGHTRAG_DEFAULT_WORKSPACE`	str	`""` (from `WORKSPACE`)	Default workspace when no header
`LIGHTRAG_ALLOW_DEFAULT_WORKSPACE`	bool	`true`	If false, reject requests without header
`LIGHTRAG_MAX_WORKSPACES_IN_POOL`	int	`50`	Maximum concurrent workspace instances

Rationale:

LIGHTRAG_ prefix namespaces new vars to avoid conflicts
ALLOW_DEFAULT_WORKSPACE=false enables strict multi-tenant mode
Default pool size of 50 balances memory vs. reinitialization overhead

6. Workspace Identifier Validation

Decision: Alphanumeric, hyphens, underscores; 1-64 characters

Findings:

Must be safe for filesystem paths (workspace creates subdirectories)
Must be safe for database keys (used in storage namespacing)
Must prevent injection attacks (path traversal, SQL injection)

Validation Regex: ^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$

Starts with alphanumeric (prevents hidden directories like .hidden)
Allows hyphens and underscores for readability
Max 64 chars (reasonable for identifiers, fits in most DB column sizes)

7. Error Handling

Decision: Return 400 for missing/invalid workspace; 503 for initialization failures

Scenario	HTTP Status	Error Message
Missing header, default disabled	400	`Missing LIGHTRAG-WORKSPACE header`
Invalid workspace identifier	400	`Invalid workspace identifier: must be alphanumeric...`
Workspace initialization fails	503	`Failed to initialize workspace: {details}`

8. Logging Strategy

Decision: Log workspace identifier at INFO level; never log credentials

Implementation:

Log workspace on request: logger.info(f"Request to workspace: {workspace}")
Log pool events: logger.info(f"Initialized workspace: {workspace}")
Log evictions: logger.info(f"Evicted workspace from pool: {workspace}")
NEVER log: API keys, storage credentials, auth tokens

9. Test Strategy

Decision: Pytest with markers following existing patterns

Test Categories:

Unit tests (@pytest.mark.offline): Workspace resolution, validation, pool logic
Integration tests (@pytest.mark.integration): Full request flow with mock LLM/embedding
Backward compatibility tests (@pytest.mark.offline): Single-workspace mode unchanged

Key Test Scenarios:

Two workspaces → ingest document in A → query from B returns nothing
No header + ALLOW_DEFAULT_WORKSPACE=true → uses default
No header + ALLOW_DEFAULT_WORKSPACE=false → returns 400
Pool at capacity → evicts LRU → new workspace initializes

Resolved Questions

Question	Resolution
How to handle concurrent init of same workspace?	`asyncio.Lock` ensures single initialization; others wait
Should evicted workspace finalize storage?	Yes, call `finalize_storages()` to release resources
How to share config between instances?	Clone config; only `workspace` differs per instance
Where to put pool management code?	New module `workspace_manager.py`

Next Steps

Create data-model.md with entity definitions
Document contracts (no new API endpoints; header-based routing is transparent)
Create quickstart.md for multi-workspace deployment

7.7 KiB Raw Blame History

Research: Multi-Workspace Server Support

Executive Summary

Research Findings

1. Existing Workspace Support in LightRAG Core

2. Current Server Architecture

3. Instance Pool Design

4. Header Routing Strategy

5. Configuration Schema

6. Workspace Identifier Validation

7. Error Handling

8. Logging Strategy

9. Test Strategy

Resolved Questions

Next Steps

7.7 KiB

Raw Blame History