LightRAG/specs/001-multi-workspace-server/spec.md
Clément THOMAS 62b2a71dda feat(api): add multi-workspace server support for multi-tenant deployments
Enable a single LightRAG server instance to serve multiple isolated workspaces
via HTTP header-based routing. This allows multi-tenant SaaS deployments where
each tenant's data is completely isolated.

Key features:
- Header-based workspace routing (LIGHTRAG-WORKSPACE, X-Workspace-ID fallback)
- Process-local pool of LightRAG instances with LRU eviction
- FastAPI dependency (get_rag) for workspace resolution per request
- Full backward compatibility - existing deployments work unchanged
- Strict multi-tenant mode option (LIGHTRAG_ALLOW_DEFAULT_WORKSPACE=false)
- Configurable pool size (LIGHTRAG_MAX_WORKSPACES_IN_POOL)
- Graceful shutdown with workspace finalization

Configuration:
- LIGHTRAG_DEFAULT_WORKSPACE: Default workspace (falls back to WORKSPACE)
- LIGHTRAG_ALLOW_DEFAULT_WORKSPACE: Require explicit header when false
- LIGHTRAG_MAX_WORKSPACES_IN_POOL: Max concurrent workspace instances (default: 50)

Files:
- New: lightrag/api/workspace_manager.py (core multi-workspace module)
- New: tests/test_multi_workspace_server.py (17 unit tests)
- New: render.yaml (Render deployment blueprint)
- Modified: All route files to use get_rag dependency
- Updated: README.md, env.example with documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-01 12:07:22 +01:00

11 KiB

Feature Specification: Multi-Workspace Server Support

Feature Branch: 001-multi-workspace-server Created: 2025-12-01 Status: Draft Input: Multi-workspace/multi-tenant support at the server level for LightRAG Server with instance pooling and header-based workspace routing

User Scenarios & Testing (mandatory)

User Story 1 - Tenant-Isolated Document Ingestion (Priority: P1)

As a SaaS platform operator, I need each tenant's documents to be stored and indexed completely separately so that one tenant's data never appears in another tenant's queries, ensuring privacy and data isolation for multi-tenant deployments.

Why this priority: This is the core value proposition - without workspace isolation, the feature cannot support multi-tenant use cases. A SaaS operator cannot deploy without this guarantee.

Independent Test: Can be fully tested by ingesting a document for Tenant A, then querying from Tenant B and verifying the document is not accessible. Delivers the fundamental isolation guarantee.

Acceptance Scenarios:

  1. Given a server with multi-workspace enabled, When Tenant A sends a document upload request with workspace header "tenant_a", Then the document is stored in Tenant A's isolated workspace only
  2. Given Tenant A has ingested documents, When Tenant B queries the server with workspace header "tenant_b", Then Tenant B receives no results from Tenant A's documents
  3. Given Tenant A has ingested documents, When Tenant A queries with workspace header "tenant_a", Then Tenant A receives results from their own documents

User Story 2 - Header-Based Workspace Routing (Priority: P1)

As an API client developer, I need to specify which workspace my requests should target by including a header, so that my application can interact with the correct tenant's data without managing multiple server URLs.

Why this priority: This is the mechanism that enables isolation - equally critical as US1. Without header routing, clients cannot target specific workspaces.

Independent Test: Can be fully tested by sending requests with different workspace headers and verifying each targets the correct workspace.

Acceptance Scenarios:

  1. Given a valid request, When the LIGHTRAG-WORKSPACE header is set to "workspace_x", Then the request operates on workspace "workspace_x"
  2. Given a valid request without LIGHTRAG-WORKSPACE header, When the X-Workspace-ID header is set to "workspace_y", Then the request operates on workspace "workspace_y" (fallback)
  3. Given a request with both headers set to different values, When the server receives the request, Then LIGHTRAG-WORKSPACE takes precedence

User Story 3 - Backward Compatible Single-Workspace Mode (Priority: P2)

As an existing LightRAG user, I need my current deployment to continue working without changes, so that upgrading to the new version doesn't break my single-tenant setup or require configuration changes.

Why this priority: Critical for adoption - existing users must not be disrupted. However, new multi-tenant deployments are the primary goal.

Independent Test: Can be fully tested by deploying the new version with existing configuration and verifying all existing functionality works unchanged.

Acceptance Scenarios:

  1. Given an existing deployment using WORKSPACE env var, When no workspace header is sent in requests, Then requests use the configured default workspace
  2. Given an existing deployment, When upgraded to the new version without config changes, Then all existing functionality works identically
  3. Given default workspace is configured, When requests arrive without workspace headers, Then the server serves requests from the default workspace without errors

User Story 4 - Configurable Missing Header Behavior (Priority: P2)

As an operator of a strict multi-tenant deployment, I need to require workspace headers on all requests, so that I can prevent accidental data leakage from misconfigured clients defaulting to a shared workspace.

Why this priority: Important for security-conscious deployments but not required for basic functionality.

Independent Test: Can be fully tested by disabling default workspace and verifying requests without headers are rejected.

Acceptance Scenarios:

  1. Given default workspace is disabled in configuration, When a request arrives without any workspace header, Then the server rejects the request with a clear error message
  2. Given default workspace is enabled in configuration, When a request arrives without any workspace header, Then the request proceeds using the default workspace
  3. Given a rejected request due to missing header, When the client receives the error, Then the error message clearly indicates a workspace header is required

User Story 5 - Workspace Instance Management (Priority: P3)

As an operator of a high-traffic multi-tenant deployment, I need the server to efficiently manage workspace instances, so that the server can handle many tenants without excessive memory usage or startup delays.

Why this priority: Performance optimization - important for scale but basic functionality works without it.

Independent Test: Can be tested by monitoring memory usage as workspaces are created and verifying resource limits are respected.

Acceptance Scenarios:

  1. Given a request for a new workspace, When the workspace has not been accessed before, Then the server initializes it on-demand without blocking other requests
  2. Given the maximum workspace limit is configured, When the limit is reached and a new workspace is requested, Then the least recently used workspace is released to make room
  3. Given multiple concurrent requests for the same new workspace, When processed simultaneously, Then only one initialization occurs and all requests share the same instance

Edge Cases

  • What happens when workspace identifier contains special characters (slashes, unicode, empty string)?
    • System validates identifiers and rejects invalid patterns with clear error messages
  • How does the system handle concurrent initialization requests for the same workspace?
    • System ensures only one initialization occurs; concurrent requests wait for completion
  • What happens when a workspace initialization fails (storage unavailable)?
    • System returns an error for that request without affecting other workspaces
  • How does the system behave when the instance pool is full?
    • System evicts least-recently-used workspace and initializes the new one
  • What happens if the default workspace is not configured but required?
    • System returns a 400 error clearly indicating the missing configuration

Requirements (mandatory)

Functional Requirements

Workspace Routing:

  • FR-001: System MUST read workspace identifier from the LIGHTRAG-WORKSPACE request header
  • FR-002: System MUST fall back to X-Workspace-ID header if LIGHTRAG-WORKSPACE is not present
  • FR-003: System MUST support configuring a default workspace for requests without headers
  • FR-004: System MUST support rejecting requests without workspace headers (configurable)
  • FR-005: System MUST validate workspace identifiers (alphanumeric, hyphens, underscores, 1-64 characters)

Instance Management:

  • FR-006: System MUST maintain separate isolated workspace instances per workspace identifier
  • FR-007: System MUST initialize workspace instances on first access (lazy initialization)
  • FR-008: System MUST support configuring a maximum number of concurrent workspace instances
  • FR-009: System MUST evict least-recently-used instances when the limit is reached
  • FR-010: System MUST ensure thread-safe workspace instance access under concurrent requests

Data Isolation:

  • FR-011: System MUST ensure documents ingested in one workspace are not accessible from other workspaces
  • FR-012: System MUST ensure queries in one workspace only return results from that workspace
  • FR-013: System MUST ensure graph operations in one workspace do not affect other workspaces

Backward Compatibility:

  • FR-014: System MUST work unchanged for existing deployments without workspace headers
  • FR-015: System MUST respect existing WORKSPACE environment variable as default
  • FR-016: System MUST not change existing request/response formats

Security:

  • FR-017: System MUST enforce authentication before workspace routing (workspace header does not bypass auth)
  • FR-018: System MUST log workspace identifiers in access logs for audit purposes
  • FR-019: System MUST NOT log sensitive configuration values (credentials, API keys)

Configuration:

  • FR-020: System MUST support LIGHTRAG_DEFAULT_WORKSPACE environment variable
  • FR-021: System MUST support LIGHTRAG_ALLOW_DEFAULT_WORKSPACE environment variable (true/false)
  • FR-022: System MUST support LIGHTRAG_MAX_WORKSPACES_IN_POOL environment variable (optional)

Key Entities

  • Workspace: A logical isolation boundary identified by a unique string. Contains all data (documents, embeddings, graphs) for one tenant. Key attributes: identifier (string), creation time, last access time
  • Workspace Instance: A running instance serving requests for a specific workspace. Relationship: one-to-one with Workspace when active
  • Instance Pool: Collection of active workspace instances. Key attributes: maximum size, current size, eviction policy (LRU)

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Existing single-workspace deployments continue working with zero configuration changes after upgrade
  • SC-002: Data from Workspace A is never returned in queries from Workspace B (100% isolation)
  • SC-003: First request to a new workspace completes initialization within 5 seconds under normal conditions
  • SC-004: Workspace switching via header adds less than 10ms overhead per request
  • SC-005: Server supports at least 50 concurrent workspace instances (configurable)
  • SC-006: Memory usage per workspace instance remains proportional to single-workspace deployment
  • SC-007: All multi-workspace functionality is covered by automated tests demonstrating isolation

Assumptions

  • Workspace identifiers are provided by trusted upstream systems (API gateway, SaaS platform) after authentication
  • The underlying storage backends (databases, vector stores) support namespace isolation through the existing workspace parameter
  • Operators will configure appropriate memory limits based on their workload
  • LRU eviction is acceptable for workspace instance management (frequently accessed workspaces stay loaded)