Why this change is needed:
PR #2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.
What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
- Different workspaces can acquire locks in parallel
- Same workspace locks serialize properly
- No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
different workspaces can run concurrently without data interference
Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output
Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.
Impact:
Provides confidence that PR #2366's workspace isolation feature is
production-ready and won't introduce regressions.
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
* Capture max_token_size before decorator
* Apply wrapper after capturing attribute
* Prevent decorator from stripping dataclass
* Ensure token limit is properly set
- Add EMBEDDING_TOKEN_LIMIT env var
- Set max_token_size on embedding func
- Add token limit property to LightRAG
- Validate summary length vs limit
- Log warning when limit exceeded
• Replace truthy checks with `is not None`
• Handle empty dict edge case properly
• Prevent data reload failures
• Add comprehensive test coverage
• Fix JsonKVStorage and DocStatusStorage
• Reload cleaned data after sanitization
• Update shared memory with clean data
• Add specific surrogate char tests
• Test migration sanitization flow
• Prevent dirty data in memory
- Precompile regex pattern at module level
- Zero-copy path for clean strings
- Use C-level regex for performance
- Remove deprecated _sanitize_json_data
- Fast detection for common case
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage
- Bump API version to 0254
- Remove response format UI controls
- Hard-code response_type in query params
- Add migration for version 19
- Clean up settings store structure
• Add _sanitize_json_data helper function
• Recursively clean strings in data
• Sanitize before JSON serialization
• Prevent encoding-related crashes
• Use existing sanitize_text_for_encoding
- Update import from PyPDF2 to pypdf
- Change dependency to pypdf>=6.1.0
- Update all requirements files
- Remove PyPDF2 from lock file
- Use modern pypdf library
- Add Awaitable and Union type imports
- Update chunking_func type annotation
- Handle coroutine results with await
- Add return type validation
- Update docstring for async support
Fixes two compatibility issues in workspace isolation:
1. Problem: lightrag_server.py calls initialize_pipeline_status()
without workspace parameter, causing pipeline to initialize in
global namespace instead of rag's workspace.
Solution: Add set_default_workspace() mechanism in shared_storage.
LightRAG.initialize_storages() now sets default workspace, which
initialize_pipeline_status() uses when called without parameters.
2. Problem: /health endpoint hardcoded to use "pipeline_status",
cannot return workspace-specific status or support frontend
workspace selection.
Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
extracts workspace from header or falls back to server default,
returning correct workspace-specific pipeline status.
Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
update /health endpoint to support LIGHTRAG-WORKSPACE header
Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces
Related: #2353