- Add workspace param to test storage init
- Remove get_nodes_by_chunk_ids tests
- Remove get_edges_by_chunk_ids tests
- Clean up batch operations test function
(cherry picked from commit 6b0f9795be)
Added comprehensive documentation for the new include_chunk_content parameter
that enables retrieval of actual chunk text content in API responses.
Documentation Updates:
- Added "Include Chunk Content in References" section to API README
- Explained use cases: RAG evaluation, debugging, citations, transparency
- Provided JSON request/response examples
- Clarified parameter interaction with include_references
OpenAPI/Swagger Examples:
- Added "Response with chunk content" example to /query endpoint
- Shows complete reference structure with content field
- Demonstrates realistic chunk text content
This makes the feature discoverable through:
1. API documentation (README.md)
2. Interactive Swagger UI (http://localhost:9621/docs)
3. Code examples for developers
(cherry picked from commit 963ad4c637)
- Pass chunk storages to merge function
- Merge relation chunk tracking data
- Merge entity chunk tracking data
- Delete old chunk tracking records
- Persist chunk storage updates
(cherry picked from commit 2c09adb8d3)
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
(cherry picked from commit dc62c78f98)
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation
(cherry picked from commit 4048fc4b89)
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
(cherry picked from commit e8383df3b8)
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check
(cherry picked from commit b6a5a90eaf)
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls
(cherry picked from commit 52c812b9a0)
Fixes two compatibility issues in workspace isolation:
1. Problem: lightrag_server.py calls initialize_pipeline_status()
without workspace parameter, causing pipeline to initialize in
global namespace instead of rag's workspace.
Solution: Add set_default_workspace() mechanism in shared_storage.
LightRAG.initialize_storages() now sets default workspace, which
initialize_pipeline_status() uses when called without parameters.
2. Problem: /health endpoint hardcoded to use "pipeline_status",
cannot return workspace-specific status or support frontend
workspace selection.
Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
extracts workspace from header or falls back to server default,
returning correct workspace-specific pipeline status.
Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
update /health endpoint to support LIGHTRAG-WORKSPACE header
Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces
Related: #2353
(cherry picked from commit 18a4870229)
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.
Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
* lightrag.py: process_document_queue(), aremove_document()
* document_routes.py: background_delete_documents(), clear_documents(),
cancel_pipeline(), get_pipeline_status(), delete_documents()
Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants
Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag
Code changes: 4 files, 141 insertions(+), 28 deletions(-)
Testing: All syntax checks passed, comprehensive workspace isolation tests completed
(cherry picked from commit eb52ec94d7)
• Monitor pipeline busy->idle transitions
• Reload labels on dropdown open if needed
• Add onBeforeOpen callback to AsyncSelect
• Clear refresh flags after processing
• Improve label sync with backend state
(cherry picked from commit 58c83f9da5)
- Add cancelPipeline API endpoint
- Add cancel button to status dialog
- Update status response type
- Add cancellation UI translations
- Handle cancellation request states
(cherry picked from commit f89b5ab101)
- Set Monday 2AM for GitHub Actions
- Set Wednesday 2AM for Python deps
- Set Friday 2AM for web UI deps
- Use Asia/Shanghai timezone
- Spread updates across weekdays
(cherry picked from commit 6476021619)
• Pin httpx version in api extra
• Extract test dependencies to new extra
• Move httpx pin from evaluation to api
• Add api dependency to evaluation extra
• Separate test from evaluation concerns
(cherry picked from commit 268e4ff6f1)
• Use different mock LLM per workspace
• Add persistent test directory
• Create workspace-specific responses
• Skip cleanup for inspection
(cherry picked from commit 99262adaaa)
• Add async sleep to mock functions
• Test concurrent ainsert operations
• Use asyncio.gather for parallel exec
• Measure concurrent execution time
(cherry picked from commit 6ae0c14438)
• Remove unused tempfile import
• Use consistent project temp/ structure
• Clean up existing directories first
• Create directories with os.makedirs
• Use descriptive test directory names
(cherry picked from commit 4fef731f37)
- Add GitHub Actions workflow for CI
- Mark integration tests requiring services
- Add offline test markers for isolated tests
- Skip integration tests by default
- Configure pytest markers and collection
(cherry picked from commit 4ea2124001)