Clément THOMAS 62b2a71dda feat(api): add multi-workspace server support for multi-tenant deployments

Enable a single LightRAG server instance to serve multiple isolated workspaces
via HTTP header-based routing. This allows multi-tenant SaaS deployments where
each tenant's data is completely isolated.

Key features:
- Header-based workspace routing (LIGHTRAG-WORKSPACE, X-Workspace-ID fallback)
- Process-local pool of LightRAG instances with LRU eviction
- FastAPI dependency (get_rag) for workspace resolution per request
- Full backward compatibility - existing deployments work unchanged
- Strict multi-tenant mode option (LIGHTRAG_ALLOW_DEFAULT_WORKSPACE=false)
- Configurable pool size (LIGHTRAG_MAX_WORKSPACES_IN_POOL)
- Graceful shutdown with workspace finalization

Configuration:
- LIGHTRAG_DEFAULT_WORKSPACE: Default workspace (falls back to WORKSPACE)
- LIGHTRAG_ALLOW_DEFAULT_WORKSPACE: Require explicit header when false
- LIGHTRAG_MAX_WORKSPACES_IN_POOL: Max concurrent workspace instances (default: 50)

Files:
- New: lightrag/api/workspace_manager.py (core multi-workspace module)
- New: tests/test_multi_workspace_server.py (17 unit tests)
- New: render.yaml (Render deployment blueprint)
- Modified: All route files to use get_rag dependency
- Updated: README.md, env.example with documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-01 12:07:22 +01:00

11 KiB

Raw Blame History

Feature Specification: Multi-Workspace Server Support

Feature Branch: 001-multi-workspace-server Created: 2025-12-01 Status: Draft Input: Multi-workspace/multi-tenant support at the server level for LightRAG Server with instance pooling and header-based workspace routing

User Scenarios & Testing (mandatory)

User Story 1 - Tenant-Isolated Document Ingestion (Priority: P1)

As a SaaS platform operator, I need each tenant's documents to be stored and indexed completely separately so that one tenant's data never appears in another tenant's queries, ensuring privacy and data isolation for multi-tenant deployments.

Why this priority: This is the core value proposition - without workspace isolation, the feature cannot support multi-tenant use cases. A SaaS operator cannot deploy without this guarantee.

Independent Test: Can be fully tested by ingesting a document for Tenant A, then querying from Tenant B and verifying the document is not accessible. Delivers the fundamental isolation guarantee.

Acceptance Scenarios:

Given a server with multi-workspace enabled, When Tenant A sends a document upload request with workspace header "tenant_a", Then the document is stored in Tenant A's isolated workspace only
Given Tenant A has ingested documents, When Tenant B queries the server with workspace header "tenant_b", Then Tenant B receives no results from Tenant A's documents
Given Tenant A has ingested documents, When Tenant A queries with workspace header "tenant_a", Then Tenant A receives results from their own documents

User Story 2 - Header-Based Workspace Routing (Priority: P1)

As an API client developer, I need to specify which workspace my requests should target by including a header, so that my application can interact with the correct tenant's data without managing multiple server URLs.

Why this priority: This is the mechanism that enables isolation - equally critical as US1. Without header routing, clients cannot target specific workspaces.

Independent Test: Can be fully tested by sending requests with different workspace headers and verifying each targets the correct workspace.

Acceptance Scenarios:

Given a valid request, When the LIGHTRAG-WORKSPACE header is set to "workspace_x", Then the request operates on workspace "workspace_x"
Given a valid request without LIGHTRAG-WORKSPACE header, When the X-Workspace-ID header is set to "workspace_y", Then the request operates on workspace "workspace_y" (fallback)
Given a request with both headers set to different values, When the server receives the request, Then LIGHTRAG-WORKSPACE takes precedence

User Story 3 - Backward Compatible Single-Workspace Mode (Priority: P2)

As an existing LightRAG user, I need my current deployment to continue working without changes, so that upgrading to the new version doesn't break my single-tenant setup or require configuration changes.

Why this priority: Critical for adoption - existing users must not be disrupted. However, new multi-tenant deployments are the primary goal.

Independent Test: Can be fully tested by deploying the new version with existing configuration and verifying all existing functionality works unchanged.

Acceptance Scenarios:

Given an existing deployment using WORKSPACE env var, When no workspace header is sent in requests, Then requests use the configured default workspace
Given an existing deployment, When upgraded to the new version without config changes, Then all existing functionality works identically
Given default workspace is configured, When requests arrive without workspace headers, Then the server serves requests from the default workspace without errors

User Story 4 - Configurable Missing Header Behavior (Priority: P2)

As an operator of a strict multi-tenant deployment, I need to require workspace headers on all requests, so that I can prevent accidental data leakage from misconfigured clients defaulting to a shared workspace.

Why this priority: Important for security-conscious deployments but not required for basic functionality.

Independent Test: Can be fully tested by disabling default workspace and verifying requests without headers are rejected.

Acceptance Scenarios:

Given default workspace is disabled in configuration, When a request arrives without any workspace header, Then the server rejects the request with a clear error message
Given default workspace is enabled in configuration, When a request arrives without any workspace header, Then the request proceeds using the default workspace
Given a rejected request due to missing header, When the client receives the error, Then the error message clearly indicates a workspace header is required

User Story 5 - Workspace Instance Management (Priority: P3)

As an operator of a high-traffic multi-tenant deployment, I need the server to efficiently manage workspace instances, so that the server can handle many tenants without excessive memory usage or startup delays.

Why this priority: Performance optimization - important for scale but basic functionality works without it.

Independent Test: Can be tested by monitoring memory usage as workspaces are created and verifying resource limits are respected.

Acceptance Scenarios:

Given a request for a new workspace, When the workspace has not been accessed before, Then the server initializes it on-demand without blocking other requests
Given the maximum workspace limit is configured, When the limit is reached and a new workspace is requested, Then the least recently used workspace is released to make room
Given multiple concurrent requests for the same new workspace, When processed simultaneously, Then only one initialization occurs and all requests share the same instance

Edge Cases

What happens when workspace identifier contains special characters (slashes, unicode, empty string)?
- System validates identifiers and rejects invalid patterns with clear error messages
How does the system handle concurrent initialization requests for the same workspace?
- System ensures only one initialization occurs; concurrent requests wait for completion
What happens when a workspace initialization fails (storage unavailable)?
- System returns an error for that request without affecting other workspaces
How does the system behave when the instance pool is full?
- System evicts least-recently-used workspace and initializes the new one
What happens if the default workspace is not configured but required?
- System returns a 400 error clearly indicating the missing configuration

Requirements (mandatory)

Functional Requirements

Workspace Routing:

FR-001: System MUST read workspace identifier from the LIGHTRAG-WORKSPACE request header
FR-002: System MUST fall back to X-Workspace-ID header if LIGHTRAG-WORKSPACE is not present
FR-003: System MUST support configuring a default workspace for requests without headers
FR-004: System MUST support rejecting requests without workspace headers (configurable)
FR-005: System MUST validate workspace identifiers (alphanumeric, hyphens, underscores, 1-64 characters)

Instance Management:

FR-006: System MUST maintain separate isolated workspace instances per workspace identifier
FR-007: System MUST initialize workspace instances on first access (lazy initialization)
FR-008: System MUST support configuring a maximum number of concurrent workspace instances
FR-009: System MUST evict least-recently-used instances when the limit is reached
FR-010: System MUST ensure thread-safe workspace instance access under concurrent requests

Data Isolation:

FR-011: System MUST ensure documents ingested in one workspace are not accessible from other workspaces
FR-012: System MUST ensure queries in one workspace only return results from that workspace
FR-013: System MUST ensure graph operations in one workspace do not affect other workspaces

Backward Compatibility:

FR-014: System MUST work unchanged for existing deployments without workspace headers
FR-015: System MUST respect existing WORKSPACE environment variable as default
FR-016: System MUST not change existing request/response formats

Security:

FR-017: System MUST enforce authentication before workspace routing (workspace header does not bypass auth)
FR-018: System MUST log workspace identifiers in access logs for audit purposes
FR-019: System MUST NOT log sensitive configuration values (credentials, API keys)

Configuration:

FR-020: System MUST support LIGHTRAG_DEFAULT_WORKSPACE environment variable
FR-021: System MUST support LIGHTRAG_ALLOW_DEFAULT_WORKSPACE environment variable (true/false)
FR-022: System MUST support LIGHTRAG_MAX_WORKSPACES_IN_POOL environment variable (optional)

Key Entities

Workspace: A logical isolation boundary identified by a unique string. Contains all data (documents, embeddings, graphs) for one tenant. Key attributes: identifier (string), creation time, last access time
Workspace Instance: A running instance serving requests for a specific workspace. Relationship: one-to-one with Workspace when active
Instance Pool: Collection of active workspace instances. Key attributes: maximum size, current size, eviction policy (LRU)

Success Criteria (mandatory)

Measurable Outcomes

SC-001: Existing single-workspace deployments continue working with zero configuration changes after upgrade
SC-002: Data from Workspace A is never returned in queries from Workspace B (100% isolation)
SC-003: First request to a new workspace completes initialization within 5 seconds under normal conditions
SC-004: Workspace switching via header adds less than 10ms overhead per request
SC-005: Server supports at least 50 concurrent workspace instances (configurable)
SC-006: Memory usage per workspace instance remains proportional to single-workspace deployment
SC-007: All multi-workspace functionality is covered by automated tests demonstrating isolation

Assumptions

Workspace identifiers are provided by trusted upstream systems (API gateway, SaaS platform) after authentication
The underlying storage backends (databases, vector stores) support namespace isolation through the existing workspace parameter
Operators will configure appropriate memory limits based on their workload
LRU eviction is acceptable for workspace instance management (frequently accessed workspaces stay loaded)

11 KiB Raw Blame History

Feature Specification: Multi-Workspace Server Support

User Scenarios & Testing (mandatory)

User Story 1 - Tenant-Isolated Document Ingestion (Priority: P1)

User Story 2 - Header-Based Workspace Routing (Priority: P1)

User Story 3 - Backward Compatible Single-Workspace Mode (Priority: P2)

User Story 4 - Configurable Missing Header Behavior (Priority: P2)

User Story 5 - Workspace Instance Management (Priority: P3)

Edge Cases

Requirements (mandatory)

Functional Requirements

Key Entities

Success Criteria (mandatory)

Measurable Outcomes

Assumptions

11 KiB

Raw Blame History