Commit graph

32 commits

Author SHA1 Message Date
chengjie
9e3c64df03 fix: critical bugs in workspace lock multiprocess synchronization
Bug 1a - RuntimeError when _registry_guard is None:
- Added explicit check for _registry_guard initialization
- Now raises clear RuntimeError instead of cryptic TypeError
- Helps users understand they need to call initialize_share_data() first

Bug 1b - Workspace async_locks not visible across processes:
- Created new _workspace_async_locks dict for per-process storage
- Fixed issue where async_locks modifications in one process were invisible to others
- This is correct design since asyncio.Lock objects cannot be pickled/shared

Why per-process async_locks:
- asyncio.Lock objects cannot be shared across processes
- Each process needs its own asyncio.Lock instances for coroutine sync
- Cross-process sync is handled by Manager.RLock() in _sync_locks
- Within-process async sync is handled by per-process asyncio.Lock

Testing:
- All 17 existing workspace lock tests pass
- Added 3 new tests specifically for bug verification
- Total 20 tests passing

Impact:
- Fixes potential race conditions in multiprocess scenarios
- Ensures proper synchronization both across and within processes
- Maintains backward compatibility
2025-11-11 00:15:06 +08:00
chengjie
27de78113d style: apply code formatting to pass pre-commit checks
- Split long function calls across multiple lines
- Split long function definitions across multiple lines
- Add blank line after docstring in test function

These changes are purely formatting to comply with the project's
linting standards (black/ruff). No functional changes.
2025-11-11 00:10:54 +08:00
chengjie
5d31412bd7 feat: add workspace isolation support to unified lock functions
Why this change is needed:
The current locking system uses global locks shared across all users
and workspaces, causing blocking issues in multi-tenant scenarios.
When one tenant performs document indexing, all other tenants are
blocked waiting for the same global lock. This severely limits
the system's ability to serve multiple users concurrently.

How it solves it:
- Add optional `workspace` parameter to 5 lock functions
- Implement lazy creation of workspace-specific locks with proper synchronization
- Store workspace locks in new `_sync_locks` dictionary
- Support both multi-process (RLock) and single-process (asyncio.Lock) modes
- Empty workspace parameter uses global lock for backward compatibility
- Extract common logic into `_get_workspace_lock()` to eliminate duplication

Impact:
- Enables concurrent operations across different workspaces
- Foundation for PR2 (pipeline status isolation)
- Zero impact on existing code (all parameters optional with defaults)
- Each workspace now has independent lock instances
- Thread-safe lazy creation using _registry_guard in multiprocess mode
- Automatic creation of async_locks for workspace locks in multiprocess mode

Code Quality Improvements (Linus review feedback):
- Fixed race condition: lazy creation protected by _registry_guard
- Eliminated code duplication: common logic extracted to _get_workspace_lock()
- Added async_lock support: workspace locks now have companion async_locks
- Handles None workspace parameter gracefully
- Clear separation of concerns: one function handles all workspace logic

Testing:
- 17 new test cases covering:
  - Basic functionality and naming
  - Workspace isolation and independence
  - Backward compatibility with empty workspace
  - Concurrent operations (3 workspaces in parallel)
  - Performance (1000 workspace lock creation <2s)
  - Edge cases (special characters, unicode, long names)
- All existing tests pass (21/21 excluding env issues)
- Verified lock serialization within workspace
- Verified lock independence across workspaces

Files modified:
- lightrag/kg/shared_storage.py: refactored lock functions + synchronization
- tests/test_workspace_locks.py: comprehensive test suite
2025-11-10 22:51:49 +08:00
yangdx
36501b82f5 Initialize shared storage for all graph storage types in graph unit test 2025-11-06 19:24:12 +08:00
yangdx
0c47d1a2d1 Fix linting 2025-11-06 19:12:40 +08:00
yangdx
f3b2ba8152 Translate graph storage test from Chinese to English 2025-11-06 19:11:35 +08:00
yangdx
6b0f9795be Add workspace parameter and remove chunk-based query unit tests
- Add workspace param to test storage init
- Remove get_nodes_by_chunk_ids tests
- Remove get_edges_by_chunk_ids tests
- Clean up batch operations test function
2025-11-06 18:18:01 +08:00
yangdx
b3ed264707 Refactor PostgreSQL retry config to use centralized configuration
• Move retry config to ClientManager
• Remove env var parsing from PostgreSQLDB
• Add config params to test setup
2025-10-10 03:44:13 +08:00
yangdx
bd535e3e7a Add PostgreSQL connection retry configuration options
- Add retry environment variables
- Fix asyncpg import in retry tests
2025-10-10 03:06:21 +08:00
yangdx
e758204ab2 Add PostgreSQL connection retry mechanism with comprehensive error handling
• Implement connection retry with backoff
• Add transient error detection
• Pool management with timeout guards
2025-10-10 03:06:01 +08:00
yangdx
6190fa8985 Fix linting 2025-10-06 04:57:11 +08:00
yangdx
91387628ff Add test script for aquery_data endpoint validation 2025-10-06 03:59:50 +08:00
yangdx
46187b2507 Fix conditional logic in streaming response parser of unit test
• Change elif to if for response field
• Change elif to if for error field
• Allow multiple data types per chunk
• Fix mutually exclusive conditions
• Enable concurrent field processing
2025-09-27 21:43:46 +08:00
yangdx
bcf30a4c8a Add comprehensive reference testing for query endpoints
- Add reference format validation
- Test streaming response parsing
- Check reference consistency
- Support references enable/disable
- Add --references-only test mode
2025-09-25 16:56:09 +08:00
yangdx
5eb4a4b799 feat: simplify citations, add reference merging, and restructure API response format 2025-09-24 14:30:10 +08:00
yangdx
c0d5abba6b Fix linting 2025-09-15 02:59:21 +08:00
yangdx
b1c8206346 Add aquery_data endpoint for structured retrieval without LLM generation
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
a69194c079 Merge branch 'main' into add-Memgraph-graph-db 2025-07-04 23:53:07 +08:00
yangdx
f15e67c82c Update comments 2025-06-29 21:53:05 +08:00
DavIvek
c0a3638d01 fix memgraph_impl.py according to test_graph_storage.py 2025-06-27 15:35:20 +02:00
Ken Chen
a3865caaea Implement get_nodes_by_chunk_ids and get_edges_by_chunk_ids, 2025-06-25 22:17:17 +08:00
yangdx
e9dcac7caf Update graph db test 2025-04-17 23:09:01 +08:00
yangdx
09cca6dbe6 Update graph db unit test 2025-04-17 22:58:49 +08:00
yangdx
54f720cb27 Fix linting 2025-04-16 14:55:54 +08:00
yangdx
d370c0ae12 Fix graph unit test edge direction problem 2025-04-16 14:33:25 +08:00
yangdx
2a950f3ff9 Fix linting 2025-04-16 14:07:22 +08:00
yangdx
e6b2a035ea Update graph unit test 2025-04-16 14:06:05 +08:00
yangdx
1de74c9228 Fix linting 2025-04-15 12:34:04 +08:00
yangdx
262c93d8da Add batch query unit test for grap storage 2025-04-13 01:07:39 +08:00
yangdx
394a6063ba Fix linting 2025-04-04 03:41:05 +08:00
yangdx
99cce237df Add graph storage unit test 2025-04-04 03:40:46 +08:00
Yannick Stephan
55cd900e8e clean comments and unused libs 2025-02-18 21:12:06 +01:00