LightRAG/specs/001-multi-workspace-server/data-model.md

# Data Model: Multi-Workspace Server Support

**Date**: 2025-12-01
**Feature**: 001-multi-workspace-server

## Overview

This feature introduces server-level workspace management without adding new persistent data models. The data model focuses on runtime entities that manage workspace instances.

## Entities

### WorkspaceInstance

Represents a running LightRAG instance serving requests for a specific workspace.

| Attribute | Type | Description |
|-----------|------|-------------|
| `workspace_id` | `str` | Unique identifier for the workspace (validated, 1-64 chars) |
| `rag_instance` | `LightRAG` | The initialized LightRAG object |
| `created_at` | `datetime` | When the instance was first created |
| `last_accessed_at` | `datetime` | When the instance was last used (for LRU) |
| `status` | `enum` | `initializing`, `ready`, `finalizing`, `error` |

**Validation Rules**:
- `workspace_id` must match: `^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$`
- `workspace_id` must not be empty string (use explicit default workspace)

**State Transitions**:
```
┌─────────────┐     ┌───────┐     ┌────────────┐
│ initializing│ ──► │ ready │ ──► │ finalizing │
└─────────────┘     └───────┘     └────────────┘
      │                 │
      ▼                 ▼
  ┌───────┐         ┌───────┐
  │ error │         │ error │
  └───────┘         └───────┘
```

### WorkspacePool

Collection managing active WorkspaceInstance objects.

| Attribute | Type | Description |
|-----------|------|-------------|
| `max_size` | `int` | Maximum concurrent instances (from config) |
| `instances` | `dict[str, WorkspaceInstance]` | Active instances by workspace_id |
| `lru_order` | `list[str]` | Workspace IDs ordered by last access |
| `lock` | `asyncio.Lock` | Protects concurrent access |

**Invariants**:
- `len(instances) <= max_size`
- `set(lru_order) == set(instances.keys())`
- Only one instance per workspace_id

**Operations**:

| Operation | Description | Complexity |
|-----------|-------------|------------|
| `get(workspace_id)` | Get or create instance, updates LRU | O(1) amortized |
| `evict_lru()` | Remove least recently used instance | O(1) |
| `finalize_all()` | Clean shutdown of all instances | O(n) |

### WorkspaceConfig

Configuration for multi-workspace behavior (runtime, not persisted).

| Attribute | Type | Default | Description |
|-----------|------|---------|-------------|
| `default_workspace` | `str` | `""` | Workspace when no header present |
| `allow_default_workspace` | `bool` | `true` | Allow requests without header |
| `max_workspaces_in_pool` | `int` | `50` | Pool size limit |

**Sources** (in priority order):
1. Environment variables (`LIGHTRAG_DEFAULT_WORKSPACE`, etc.)
2. Existing `WORKSPACE` env var (backward compatibility)
3. Hardcoded defaults

## Relationships

```
┌─────────────────┐
│ WorkspaceConfig │
└────────┬────────┘
         │ configures
         ▼
┌─────────────────┐       contains        ┌───────────────────┐
│  WorkspacePool  │◄─────────────────────►│ WorkspaceInstance │
└─────────────────┘                       └───────────────────┘
         │                                         │
         │ validates workspace_id                  │ wraps
         ▼                                         ▼
┌─────────────────┐                       ┌───────────────────┐
│ HTTP Request    │                       │ LightRAG (core)   │
│ (workspace hdr) │                       │                   │
└─────────────────┘                       └───────────────────┘
```

## Data Flow

### Request Processing

```
1. HTTP Request arrives
   │
2. Extract workspace from headers
   │  ├─ LIGHTRAG-WORKSPACE header (primary)
   │  └─ X-Workspace-ID header (fallback)
   │
3. If no header:
   │  ├─ allow_default_workspace=true → use default_workspace
   │  └─ allow_default_workspace=false → return 400
   │
4. Validate workspace_id format
   │  └─ Invalid → return 400
   │
5. WorkspacePool.get(workspace_id)
   │  ├─ Instance exists → update LRU, return instance
   │  └─ Instance missing:
   │       ├─ Pool full → evict LRU instance
   │       └─ Create new instance, initialize, add to pool
   │
6. Route handler receives LightRAG instance
   │
7. Process request using instance
   │
8. Return response
```

### Instance Lifecycle

```
1. First request for workspace arrives
   │
2. WorkspacePool creates WorkspaceInstance
   │  status: initializing
   │
3. LightRAG object created with workspace parameter
   │
4. await rag.initialize_storages()
   │
5. Instance status → ready
   │  Added to pool and LRU list
   │
6. Instance serves requests...
   │  last_accessed_at updated on each access
   │
7. Pool reaches max_size, this instance is LRU
   │
8. Instance status → finalizing
   │
9. await rag.finalize_storages()
   │
10. Instance removed from pool
```

## No Persistent Schema Changes

This feature does not modify:
- Storage schemas (KV, vector, graph)
- Database tables
- File formats

Workspace isolation at the data layer is already handled by the LightRAG core using namespace prefixing.