Bug 1a - RuntimeError when _registry_guard is None:
- Added explicit check for _registry_guard initialization
- Now raises clear RuntimeError instead of cryptic TypeError
- Helps users understand they need to call initialize_share_data() first
Bug 1b - Workspace async_locks not visible across processes:
- Created new _workspace_async_locks dict for per-process storage
- Fixed issue where async_locks modifications in one process were invisible to others
- This is correct design since asyncio.Lock objects cannot be pickled/shared
Why per-process async_locks:
- asyncio.Lock objects cannot be shared across processes
- Each process needs its own asyncio.Lock instances for coroutine sync
- Cross-process sync is handled by Manager.RLock() in _sync_locks
- Within-process async sync is handled by per-process asyncio.Lock
Testing:
- All 17 existing workspace lock tests pass
- Added 3 new tests specifically for bug verification
- Total 20 tests passing
Impact:
- Fixes potential race conditions in multiprocess scenarios
- Ensures proper synchronization both across and within processes
- Maintains backward compatibility
- Split long function calls across multiple lines
- Split long function definitions across multiple lines
- Add blank line after docstring in test function
These changes are purely formatting to comply with the project's
linting standards (black/ruff). No functional changes.
Why this change is needed:
The current locking system uses global locks shared across all users
and workspaces, causing blocking issues in multi-tenant scenarios.
When one tenant performs document indexing, all other tenants are
blocked waiting for the same global lock. This severely limits
the system's ability to serve multiple users concurrently.
How it solves it:
- Add optional `workspace` parameter to 5 lock functions
- Implement lazy creation of workspace-specific locks with proper synchronization
- Store workspace locks in new `_sync_locks` dictionary
- Support both multi-process (RLock) and single-process (asyncio.Lock) modes
- Empty workspace parameter uses global lock for backward compatibility
- Extract common logic into `_get_workspace_lock()` to eliminate duplication
Impact:
- Enables concurrent operations across different workspaces
- Foundation for PR2 (pipeline status isolation)
- Zero impact on existing code (all parameters optional with defaults)
- Each workspace now has independent lock instances
- Thread-safe lazy creation using _registry_guard in multiprocess mode
- Automatic creation of async_locks for workspace locks in multiprocess mode
Code Quality Improvements (Linus review feedback):
- Fixed race condition: lazy creation protected by _registry_guard
- Eliminated code duplication: common logic extracted to _get_workspace_lock()
- Added async_lock support: workspace locks now have companion async_locks
- Handles None workspace parameter gracefully
- Clear separation of concerns: one function handles all workspace logic
Testing:
- 17 new test cases covering:
- Basic functionality and naming
- Workspace isolation and independence
- Backward compatibility with empty workspace
- Concurrent operations (3 workspaces in parallel)
- Performance (1000 workspace lock creation <2s)
- Edge cases (special characters, unicode, long names)
- All existing tests pass (21/21 excluding env issues)
- Verified lock serialization within workspace
- Verified lock independence across workspaces
Files modified:
- lightrag/kg/shared_storage.py: refactored lock functions + synchronization
- tests/test_workspace_locks.py: comprehensive test suite
• Remove premature ID normalization
• Add lookup mapping for node resolution
• Filter results by requested nodes only
• Improve error logging with workspace
- Batch index existence checks into single query (16+ queries -> 1 query)
- Batch timestamp column checks into single query (8 queries -> 1 query)
- Batch field length checks into single query (5 queries -> 1 query)
Performance improvement: ~70-80% faster initialization (35s -> 5-10s)
Key optimizations:
1. check_tables(): Use ANY($1) to check all indexes at once
2. _migrate_timestamp_columns(): Batch all column type checks
3. _migrate_field_lengths(): Batch all field definition checks
All changes are backward compatible with no schema or API changes.
Reduces database round-trips by batching information_schema queries.
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
Prepared statement caching is disabled by setting
`statement_cache_size=0` in the `asyncpg` connection pool parameters.
This is necessary to prevent
`asyncpg.exceptions.InvalidSQLStatementNameError` when using
transaction-level connection poolers like Supabase Supavisor or
pgbouncer, which do not support prepared statements.
- Store file_path in full_docs storage
- Update PostgreSQL implementation by map file_path to doc_name
- Other storage implementation automatically handles the new field
- Add get_doc_by_file_path to all storages
- Skip processed files in scan operation
- Check duplicates in upload endpoints
- Check duplicates in text insert APIs
- Return status info in duplicate responses
• Add CJK analyzer for Chinese text
• Auto-detect Chinese characters
• Recreate index if needed
• Separate Chinese/Latin search logic
• Improve fallback for Chinese queries
- Add get_popular_labels() method
- Add search_labels() with fuzzy matching
- Use native SQL for better performance
- Include proper scoring and ranking
• Add workspace attribute to storage classes
• Use workspace-specific index names
• Implement Atlas Search with fallbacks
• Add entity search and popular labels
• Improve index migration strategy