- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls
(cherry picked from commit 52c812b9a0)
Fixes two compatibility issues in workspace isolation:
1. Problem: lightrag_server.py calls initialize_pipeline_status()
without workspace parameter, causing pipeline to initialize in
global namespace instead of rag's workspace.
Solution: Add set_default_workspace() mechanism in shared_storage.
LightRAG.initialize_storages() now sets default workspace, which
initialize_pipeline_status() uses when called without parameters.
2. Problem: /health endpoint hardcoded to use "pipeline_status",
cannot return workspace-specific status or support frontend
workspace selection.
Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
extracts workspace from header or falls back to server default,
returning correct workspace-specific pipeline status.
Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
update /health endpoint to support LIGHTRAG-WORKSPACE header
Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces
Related: #2353
(cherry picked from commit 18a4870229)
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.
Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
* lightrag.py: process_document_queue(), aremove_document()
* document_routes.py: background_delete_documents(), clear_documents(),
cancel_pipeline(), get_pipeline_status(), delete_documents()
Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants
Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag
Code changes: 4 files, 141 insertions(+), 28 deletions(-)
Testing: All syntax checks passed, comprehensive workspace isolation tests completed
(cherry picked from commit eb52ec94d7)
• Monitor pipeline busy->idle transitions
• Reload labels on dropdown open if needed
• Add onBeforeOpen callback to AsyncSelect
• Clear refresh flags after processing
• Improve label sync with backend state
(cherry picked from commit 58c83f9da5)
- Add cancelPipeline API endpoint
- Add cancel button to status dialog
- Update status response type
- Add cancellation UI translations
- Handle cancellation request states
(cherry picked from commit f89b5ab101)
- Set Monday 2AM for GitHub Actions
- Set Wednesday 2AM for Python deps
- Set Friday 2AM for web UI deps
- Use Asia/Shanghai timezone
- Spread updates across weekdays
(cherry picked from commit 6476021619)
• Pin httpx version in api extra
• Extract test dependencies to new extra
• Move httpx pin from evaluation to api
• Add api dependency to evaluation extra
• Separate test from evaluation concerns
(cherry picked from commit 268e4ff6f1)
• Use different mock LLM per workspace
• Add persistent test directory
• Create workspace-specific responses
• Skip cleanup for inspection
(cherry picked from commit 99262adaaa)
• Add async sleep to mock functions
• Test concurrent ainsert operations
• Use asyncio.gather for parallel exec
• Measure concurrent execution time
(cherry picked from commit 6ae0c14438)
• Remove unused tempfile import
• Use consistent project temp/ structure
• Clean up existing directories first
• Create directories with os.makedirs
• Use descriptive test directory names
(cherry picked from commit 4fef731f37)
- Add GitHub Actions workflow for CI
- Mark integration tests requiring services
- Add offline test markers for isolated tests
- Skip integration tests by default
- Configure pytest markers and collection
(cherry picked from commit 4ea2124001)
- Extract pytest deps to own group
- Reference pytest group in evaluation
- Add pytest config to pyproject.toml
- Update uv.lock with new structure
(cherry picked from commit b7b8d15632)
Why this change is needed:
PR #2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.
What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
- Different workspaces can acquire locks in parallel
- Same workspace locks serialize properly
- No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
different workspaces can run concurrently without data interference
Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output
Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.
Impact:
Provides confidence that PR #2366's workspace isolation feature is
production-ready and won't introduce regressions.
(cherry picked from commit 4742fc8efa)
Add specific content assertions to detect cross-contamination between workspaces.
Previously only checked that workspaces had different data, now verifies:
- Each workspace contains only its own text content
- Each workspace does NOT contain the other workspace's content
- Cross-contamination would be immediately detected
This ensures the test can find problems, not just pass.
Changes:
- Add assertions for "Artificial Intelligence" and "Machine Learning" in project_a
- Add assertions for "Deep Learning" and "Neural Networks" in project_b
- Add negative assertions to verify data leakage doesn't occur
- Add detailed output messages showing what was verified
Testing:
- pytest tests/test_workspace_isolation.py::test_lightrag_end_to_end_workspace_isolation
- Test passes with proper content isolation verified
(cherry picked from commit 3ec736932e)
Implemented two critical test scenarios:
Test 10 - JsonKVStorage Integration Test:
- Instantiate two JsonKVStorage instances with different workspaces
- Write different data to each instance (entity1, entity2)
- Read back and verify complete data isolation
- Verify workspace directories are created correctly
- Result: Data correctly isolated, no mixing between workspaces
Test 11 - LightRAG End-to-End Test:
- Instantiate two LightRAG instances with different workspaces
- Insert different documents to each instance
- Verify workspace directory structure (project_a/, project_b/)
- Verify file separation and data isolation
- Result: All 8 storage files created separately per workspace
- Document data correctly isolated between workspaces
Test Results: 23/23 passed
- 19 unit tests
- 2 integration tests (JsonKVStorage data + file structure)
- 2 E2E tests (LightRAG file structure + data isolation)
Coverage: 100% - Unit, Integration, and E2E validated
(cherry picked from commit 3e759f46d1)
Why this change is needed:
The test file was using a custom TestResults class for tracking test
execution and results, which is not standard practice for pytest-based
test suites. This makes the tests harder to integrate with CI/CD pipelines
and reduces compatibility with pytest plugins and tooling.
How it solves it:
- Removed custom TestResults class and manual result tracking
- Added @pytest.mark.asyncio decorator to all async test functions
- Converted all results.add() calls to standard pytest assert statements
- Added pytest fixture (setup_shared_data) for common test setup
- Removed custom main() runner (pytest handles test discovery/execution)
- Kept all test logic, assertions, and debugging print statements intact
Impact:
- All 11 test functions maintain identical behavior and coverage
- Tests now follow pytest conventions and integrate with pytest ecosystem
- Test output is cleaner and more informative with pytest's reporting
- Easier to run selective tests using pytest's filtering options
Testing:
Verified by running: uv run pytest tests/test_workspace_isolation.py -v
Result: All 11 tests passed in 2.41s
(cherry picked from commit 288498ccdc)
- Add --docling CLI flag for easier setup
- Add numpy version constraints
- Exclude docling on macOS (fork-safety)
(cherry picked from commit c246eff725)
- Add --docling CLI flag for easier setup
- Add numpy version constraints
- Exclude docling on macOS (fork-safety)
(cherry picked from commit a24d8181c2)
Add 5 markdown documents that users can index to reproduce evaluation results.
Changes:
- Add sample_documents/ folder with 5 markdown files covering LightRAG features
- Update sample_dataset.json with 3 improved, specific test questions
- Shorten and correct evaluation README (removed outdated info about mock responses)
- Add sample_documents reference with expected ~95% RAGAS score
Test Results with sample documents:
- Average RAGAS Score: 95.28%
- Faithfulness: 100%, Answer Relevance: 96.67%
- Context Recall: 88.89%, Context Precision: 95.56%
(cherry picked from commit a172cf893d)
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)
**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md
Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.
(cherry picked from commit 5cdb4b0ef2)
This contribution adds optional Langfuse support for LLM observability and tracing.
Langfuse provides a drop-in replacement for the OpenAI client that automatically
tracks all LLM interactions without requiring code changes.
Features:
- Optional Langfuse integration with graceful fallback
- Automatic LLM request/response tracing
- Token usage tracking
- Latency metrics
- Error tracking
- Zero code changes required for existing functionality
Implementation:
- Modified lightrag/llm/openai.py to conditionally use Langfuse's AsyncOpenAI
- Falls back to standard OpenAI client if Langfuse is not installed
- Logs observability status on import
Configuration:
To enable Langfuse tracing, install the observability extras and set environment variables:
```bash
pip install lightrag-hku[observability]
export LANGFUSE_PUBLIC_KEY="your_public_key"
export LANGFUSE_SECRET_KEY="your_secret_key"
export LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted instance
```
If Langfuse is not installed or environment variables are not set, LightRAG
will use the standard OpenAI client without any functionality changes.
Changes:
- Modified lightrag/llm/openai.py (added optional Langfuse import)
- Updated pyproject.toml with optional 'observability' dependencies
Dependencies (optional):
- langfuse>=3.8.1
(cherry picked from commit 626b42bc40)