Commit graph

5675 commits

Author SHA1 Message Date
BukeLy
04041e76e0 test: Add comprehensive workspace isolation test suite for PR #2366
Why this change is needed:
PR #2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.

What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
   independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
   - Different workspaces can acquire locks in parallel
   - Same workspace locks serialize properly
   - No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
   continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
   different workspaces can run concurrently without data interference

Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output

Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.

Impact:
Provides confidence that PR #2366's workspace isolation feature is
production-ready and won't introduce regressions.
2025-11-17 11:33:07 +08:00
yangdx
71edb73fd9 Remove manual initialize_pipeline_status() calls across codebase
- Auto-init pipeline status in storages
- Remove redundant import statements
- Simplify initialization pattern
- Update docs and examples
2025-11-17 07:28:41 +08:00
yangdx
b7edb1318b Auto-initialize pipeline status in LightRAG.initialize_storages()
• Remove manual initialize_pipeline_status calls
• Auto-init in initialize_storages method
• Update error messages for clarity
• Warn on workspace conflicts
2025-11-17 07:14:02 +08:00
yangdx
091385798e Fix NamespaceLock context variable timing to prevent lock bricking
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
2025-11-17 06:43:37 +08:00
yangdx
b8fab6c944 Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 06:28:34 +08:00
yangdx
97199b56f9 Fix workspace filtering logic in get_all_update_flags_status
• Handle namespaces with/without prefixes
• Fix workspace matching logic
2025-11-17 06:16:26 +08:00
yangdx
5f80890fcd Fix pipeline status namespace check to handle root case
- Add check for bare "pipeline_status"
- Handle namespace without prefix
2025-11-17 06:01:23 +08:00
yangdx
602e14456a Standardize empty workspace handling from "_" to "" across storage
* Unify empty workspace behavior by changing workspace from "_" to ""
* Fixed incorrect empty workspace detection in get_all_update_flags_status()
2025-11-17 05:58:11 +08:00
yangdx
83cf878548 Fix NamespaceLock concurrent coroutine safety with ContextVar
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check
2025-11-17 05:27:31 +08:00
yangdx
91be4ccecb Refactor storage classes to use namespace instead of final_namespace 2025-11-17 05:07:53 +08:00
yangdx
829087638f Fix missing function call parentheses in get_all_update_flags_status 2025-11-17 04:11:06 +08:00
yangdx
501008c19f Refactor namespace lock to support reusable async context manager
• Add NamespaceLock class wrapper
• Fix lock re-entrance issues
• Enable concurrent lock usage
• Fresh context per async with block
• Update get_namespace_lock API
2025-11-17 04:07:37 +08:00
yangdx
1915d25912 Fix workspace isolation for pipeline status across all operations
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls
2025-11-17 03:45:51 +08:00
yangdx
de404ccff0 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking
2025-11-17 02:32:00 +08:00
yangdx
af3da52c78 Merge branch 'main' into feature/pipeline-workspace-isolation 2025-11-16 20:26:00 +08:00
chengjie
5f153582a9 fix: Add default workspace support for backward compatibility
Fixes two compatibility issues in workspace isolation:

1. Problem: lightrag_server.py calls initialize_pipeline_status()
   without workspace parameter, causing pipeline to initialize in
   global namespace instead of rag's workspace.

   Solution: Add set_default_workspace() mechanism in shared_storage.
   LightRAG.initialize_storages() now sets default workspace, which
   initialize_pipeline_status() uses when called without parameters.

2. Problem: /health endpoint hardcoded to use "pipeline_status",
   cannot return workspace-specific status or support frontend
   workspace selection.

   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
   extracts workspace from header or falls back to server default,
   returning correct workspace-specific pipeline status.

Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
  update /health endpoint to support LIGHTRAG-WORKSPACE header

Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces

Related: #2353
2025-11-15 12:36:03 +08:00
Daniel.y
3b76eea20b
Merge pull request #2359 from danielaskdd/embedding-limit
Refact: Add Embedding Token Limit Configuration and Improve Error Handling
2025-11-15 01:27:26 +08:00
yangdx
8722103550 Update env.example
• Comment out Ollama config
• Set OpenAI as active default
• Add EMBEDDING_TOKEN_LIMIT option
• Add Gemini embedding configuration
2025-11-15 01:25:56 +08:00
yangdx
b5589ce4d5 Merge branch 'main' into embedding-limit 2025-11-15 01:10:34 +08:00
Daniel.y
9a2ddcee31
Merge pull request #2360 from danielaskdd/macos-gunicorn-numpy
Add macOS fork safety check for Gunicorn multi-worker mode
2025-11-15 01:02:41 +08:00
yangdx
4343db753a Add macOS fork safety check for Gunicorn multi-worker mode
• Check OBJC_DISABLE_INITIALIZE_FORK_SAFETY
• Prevent NumPy/Accelerate crashes
• Show detailed error message
• Provide multiple fix options
• Exit early if misconfigured
2025-11-15 00:58:23 +08:00
Daniel.y
c6850ac5ac
Merge pull request #2358 from sleeepyin/main
Update the value corresponding to the extracted entity relationship keywords
2025-11-14 23:47:58 +08:00
yangdx
5dec4deac7 Improve embedding config priority and add debug logging
• Fix embedding_dim priority logic
• Add final config logging
2025-11-14 23:22:44 +08:00
yangdx
de4412dd40 Fix embedding token limit initialization order
* Capture max_token_size before decorator
* Apply wrapper after capturing attribute
* Prevent decorator from stripping dataclass
* Ensure token limit is properly set
2025-11-14 22:56:03 +08:00
yangdx
963a0a5db1 Refactor embedding function creation with proper attribute inheritance
- Extract max_token_size from providers
- Avoid double-wrapping EmbeddingFunc
- Improve configuration priority logic
- Add comprehensive debug logging
- Return complete EmbeddingFunc instance
2025-11-14 22:29:08 +08:00
yangdx
39b49e92ff Convert embedding_token_limit from property to field with __post_init__
• Remove property decorator
• Add field with init=False
• Set value in __post_init__ method
• embedding_token_limit is now in config dictionary
2025-11-14 20:58:41 +08:00
yangdx
ab4d7ac2b0 Add configurable embedding token limit with validation
- Add EMBEDDING_TOKEN_LIMIT env var
- Set max_token_size on embedding func
- Add token limit property to LightRAG
- Validate summary length vs limit
- Log warning when limit exceeded
2025-11-14 19:28:36 +08:00
yangdx
680e36c6eb Improve Bedrock error handling with retry logic and custom exceptions
• Add specific exception types
• Implement proper retry mechanism
• Better error classification
• Enhanced logging and validation
• Enable embedding retry decorator
2025-11-14 18:51:41 +08:00
yangdx
05852e1ab2 Add max_token_size parameter to embedding function decorators
- Add max_token_size=8192 to all embed funcs
- Move siliconcloud to deprecated folder
- Import wrap_embedding_func_with_attrs
- Update EmbeddingFunc docstring
- Fix langfuse import type annotation
2025-11-14 18:41:43 +08:00
Sleeep
b88d785469
Merge branch 'HKUDS:main' into main 2025-11-14 16:49:30 +08:00
Daniel.y
399a23c3a6
Merge pull request #2356 from danielaskdd/improve-error-handling
Fix: Robust error handling for async database operations in graph storage
2025-11-14 11:16:14 +08:00
yangdx
4401f86f07 Refactor exception handling in MemgraphStorage label methods 2025-11-14 11:01:26 +08:00
yangdx
1ccef2b932 Fix null reference errors in graph database error handling
- Initialize result vars to None
- Add null checks before consume calls
- Prevent crashes in except blocks
- Apply fix to both Neo4J and Memgraph
2025-11-14 10:39:04 +08:00
chengjie
2f3620b7e9 feat: Add workspace isolation support for pipeline status
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.

Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
  isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
  * lightrag.py: process_document_queue(), aremove_document()
  * document_routes.py: background_delete_documents(), clear_documents(),
    cancel_pipeline(), get_pipeline_status(), delete_documents()

Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants

Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag

Code changes: 4 files, 141 insertions(+), 28 deletions(-)

Testing: All syntax checks passed, comprehensive workspace isolation tests completed
2025-11-13 22:31:14 +08:00
yangdx
c164c8f631 Merge branch 'main' of github.com:HKUDS/LightRAG 2025-11-13 20:42:47 +08:00
yangdx
1889301597 Merge branch 'feat/add_cloud_ollama_support' 2025-11-13 20:41:58 +08:00
yangdx
77ad906d3a Improve error handling and logging in cloud model detection 2025-11-13 20:41:44 +08:00
Daniel.y
28fba19b11
Merge pull request #2352 from danielaskdd/docling-gunicorn-multi-worker
Refact: Enhance DOCLING integration with lazy loading and macOS safeguards
2025-11-13 20:37:48 +08:00
yangdx
cc031a3db9 Add macOS compatibility check for DOCLING with multi-worker Gunicorn 2025-11-13 19:18:04 +08:00
LacombeLouis
844537e378 Add a better regex 2025-11-13 12:17:51 +01:00
yangdx
a24d8181c2 Improve docling integration with macOS compatibility and CLI flag
- Add --docling CLI flag for easier setup
- Add numpy version constraints
- Exclude docling on macOS (fork-safety)
2025-11-13 18:58:09 +08:00
Daniel.y
76adde3858
Merge pull request #2351 from danielaskdd/lazy-config-loading
Refact: Implement Lazy Configuration Initialization for API Server
2025-11-13 15:55:35 +08:00
Sleeep
89e63aa49b
Update edge keywords extraction in graph visualization
构建neo4j时候 关键字的取值默认为d7
应该为修改后的d9
2025-11-13 15:52:14 +08:00
yangdx
e6588f9119 Update uv.lock 2025-11-13 15:31:51 +08:00
yangdx
746c069ab0 Implement lazy configuration initialization for API server
• Add lazy config initialization
• Maintain backward compatibility
• Support programmatic usage
• Add gunicorn dependency
• Explicit config in entry points
2025-11-13 15:28:05 +08:00
Daniel.y
470e2fd1f9
Merge pull request #2350 from danielaskdd/reduce-dynamic-import
Refactor: Remove blocking dependency installation from document upload handlers
2025-11-13 15:06:05 +08:00
yangdx
4b31942e2a refactor: move document deps to api group, remove dynamic imports
- Merge offline-docs into api extras
- Remove pipmaster dynamic installs
- Add async document processing
- Pre-check docling availability
- Update offline deployment docs
2025-11-13 13:34:09 +08:00
yangdx
8765974467 Merge branch 'tongda/main' 2025-11-13 12:56:28 +08:00
yangdx
c230d1a28d Replace asyncio.iscoroutine with inspect.isawaitable for better detection 2025-11-13 12:56:01 +08:00
yangdx
297e460740 Merge branch 'main' into tongda/main 2025-11-13 12:37:37 +08:00