LightRAG/specs/001-multi-workspace-server/spec.md

# Feature Specification: Multi-Workspace Server Support

**Feature Branch**: `001-multi-workspace-server`
**Created**: 2025-12-01
**Status**: Draft
**Input**: Multi-workspace/multi-tenant support at the server level for LightRAG Server with instance pooling and header-based workspace routing

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Tenant-Isolated Document Ingestion (Priority: P1)

As a SaaS platform operator, I need each tenant's documents to be stored and indexed completely separately so that one tenant's data never appears in another tenant's queries, ensuring privacy and data isolation for multi-tenant deployments.

**Why this priority**: This is the core value proposition - without workspace isolation, the feature cannot support multi-tenant use cases. A SaaS operator cannot deploy without this guarantee.

**Independent Test**: Can be fully tested by ingesting a document for Tenant A, then querying from Tenant B and verifying the document is not accessible. Delivers the fundamental isolation guarantee.

**Acceptance Scenarios**:

1. **Given** a server with multi-workspace enabled, **When** Tenant A sends a document upload request with workspace header "tenant_a", **Then** the document is stored in Tenant A's isolated workspace only
2. **Given** Tenant A has ingested documents, **When** Tenant B queries the server with workspace header "tenant_b", **Then** Tenant B receives no results from Tenant A's documents
3. **Given** Tenant A has ingested documents, **When** Tenant A queries with workspace header "tenant_a", **Then** Tenant A receives results from their own documents

---

### User Story 2 - Header-Based Workspace Routing (Priority: P1)

As an API client developer, I need to specify which workspace my requests should target by including a header, so that my application can interact with the correct tenant's data without managing multiple server URLs.

**Why this priority**: This is the mechanism that enables isolation - equally critical as US1. Without header routing, clients cannot target specific workspaces.

**Independent Test**: Can be fully tested by sending requests with different workspace headers and verifying each targets the correct workspace.

**Acceptance Scenarios**:

1. **Given** a valid request, **When** the `LIGHTRAG-WORKSPACE` header is set to "workspace_x", **Then** the request operates on workspace "workspace_x"
2. **Given** a valid request without `LIGHTRAG-WORKSPACE` header, **When** the `X-Workspace-ID` header is set to "workspace_y", **Then** the request operates on workspace "workspace_y" (fallback)
3. **Given** a request with both headers set to different values, **When** the server receives the request, **Then** `LIGHTRAG-WORKSPACE` takes precedence

---

### User Story 3 - Backward Compatible Single-Workspace Mode (Priority: P2)

As an existing LightRAG user, I need my current deployment to continue working without changes, so that upgrading to the new version doesn't break my single-tenant setup or require configuration changes.

**Why this priority**: Critical for adoption - existing users must not be disrupted. However, new multi-tenant deployments are the primary goal.

**Independent Test**: Can be fully tested by deploying the new version with existing configuration and verifying all existing functionality works unchanged.

**Acceptance Scenarios**:

1. **Given** an existing deployment using `WORKSPACE` env var, **When** no workspace header is sent in requests, **Then** requests use the configured default workspace
2. **Given** an existing deployment, **When** upgraded to the new version without config changes, **Then** all existing functionality works identically
3. **Given** default workspace is configured, **When** requests arrive without workspace headers, **Then** the server serves requests from the default workspace without errors

---

### User Story 4 - Configurable Missing Header Behavior (Priority: P2)

As an operator of a strict multi-tenant deployment, I need to require workspace headers on all requests, so that I can prevent accidental data leakage from misconfigured clients defaulting to a shared workspace.

**Why this priority**: Important for security-conscious deployments but not required for basic functionality.

**Independent Test**: Can be fully tested by disabling default workspace and verifying requests without headers are rejected.

**Acceptance Scenarios**:

1. **Given** default workspace is disabled in configuration, **When** a request arrives without any workspace header, **Then** the server rejects the request with a clear error message
2. **Given** default workspace is enabled in configuration, **When** a request arrives without any workspace header, **Then** the request proceeds using the default workspace
3. **Given** a rejected request due to missing header, **When** the client receives the error, **Then** the error message clearly indicates a workspace header is required

---

### User Story 5 - Workspace Instance Management (Priority: P3)

As an operator of a high-traffic multi-tenant deployment, I need the server to efficiently manage workspace instances, so that the server can handle many tenants without excessive memory usage or startup delays.

**Why this priority**: Performance optimization - important for scale but basic functionality works without it.

**Independent Test**: Can be tested by monitoring memory usage as workspaces are created and verifying resource limits are respected.

**Acceptance Scenarios**:

1. **Given** a request for a new workspace, **When** the workspace has not been accessed before, **Then** the server initializes it on-demand without blocking other requests
2. **Given** the maximum workspace limit is configured, **When** the limit is reached and a new workspace is requested, **Then** the least recently used workspace is released to make room
3. **Given** multiple concurrent requests for the same new workspace, **When** processed simultaneously, **Then** only one initialization occurs and all requests share the same instance

---

### Edge Cases

- What happens when workspace identifier contains special characters (slashes, unicode, empty string)?
  - System validates identifiers and rejects invalid patterns with clear error messages
- How does the system handle concurrent initialization requests for the same workspace?
  - System ensures only one initialization occurs; concurrent requests wait for completion
- What happens when a workspace initialization fails (storage unavailable)?
  - System returns an error for that request without affecting other workspaces
- How does the system behave when the instance pool is full?
  - System evicts least-recently-used workspace and initializes the new one
- What happens if the default workspace is not configured but required?
  - System returns a 400 error clearly indicating the missing configuration

## Requirements *(mandatory)*

### Functional Requirements

**Workspace Routing:**
- **FR-001**: System MUST read workspace identifier from the `LIGHTRAG-WORKSPACE` request header
- **FR-002**: System MUST fall back to `X-Workspace-ID` header if `LIGHTRAG-WORKSPACE` is not present
- **FR-003**: System MUST support configuring a default workspace for requests without headers
- **FR-004**: System MUST support rejecting requests without workspace headers (configurable)
- **FR-005**: System MUST validate workspace identifiers (alphanumeric, hyphens, underscores, 1-64 characters)

**Instance Management:**
- **FR-006**: System MUST maintain separate isolated workspace instances per workspace identifier
- **FR-007**: System MUST initialize workspace instances on first access (lazy initialization)
- **FR-008**: System MUST support configuring a maximum number of concurrent workspace instances
- **FR-009**: System MUST evict least-recently-used instances when the limit is reached
- **FR-010**: System MUST ensure thread-safe workspace instance access under concurrent requests

**Data Isolation:**
- **FR-011**: System MUST ensure documents ingested in one workspace are not accessible from other workspaces
- **FR-012**: System MUST ensure queries in one workspace only return results from that workspace
- **FR-013**: System MUST ensure graph operations in one workspace do not affect other workspaces

**Backward Compatibility:**
- **FR-014**: System MUST work unchanged for existing deployments without workspace headers
- **FR-015**: System MUST respect existing `WORKSPACE` environment variable as default
- **FR-016**: System MUST not change existing request/response formats

**Security:**
- **FR-017**: System MUST enforce authentication before workspace routing (workspace header does not bypass auth)
- **FR-018**: System MUST log workspace identifiers in access logs for audit purposes
- **FR-019**: System MUST NOT log sensitive configuration values (credentials, API keys)

**Configuration:**
- **FR-020**: System MUST support `LIGHTRAG_DEFAULT_WORKSPACE` environment variable
- **FR-021**: System MUST support `LIGHTRAG_ALLOW_DEFAULT_WORKSPACE` environment variable (true/false)
- **FR-022**: System MUST support `LIGHTRAG_MAX_WORKSPACES_IN_POOL` environment variable (optional)

### Key Entities

- **Workspace**: A logical isolation boundary identified by a unique string. Contains all data (documents, embeddings, graphs) for one tenant. Key attributes: identifier (string), creation time, last access time
- **Workspace Instance**: A running instance serving requests for a specific workspace. Relationship: one-to-one with Workspace when active
- **Instance Pool**: Collection of active workspace instances. Key attributes: maximum size, current size, eviction policy (LRU)

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: Existing single-workspace deployments continue working with zero configuration changes after upgrade
- **SC-002**: Data from Workspace A is never returned in queries from Workspace B (100% isolation)
- **SC-003**: First request to a new workspace completes initialization within 5 seconds under normal conditions
- **SC-004**: Workspace switching via header adds less than 10ms overhead per request
- **SC-005**: Server supports at least 50 concurrent workspace instances (configurable)
- **SC-006**: Memory usage per workspace instance remains proportional to single-workspace deployment
- **SC-007**: All multi-workspace functionality is covered by automated tests demonstrating isolation

## Assumptions

- Workspace identifiers are provided by trusted upstream systems (API gateway, SaaS platform) after authentication
- The underlying storage backends (databases, vector stores) support namespace isolation through the existing workspace parameter
- Operators will configure appropriate memory limits based on their workload
- LRU eviction is acceptable for workspace instance management (frequently accessed workspaces stay loaded)