LightRAG

Author	SHA1	Message	Date
chengjie	bd423004d9	fix: Resolve pre-commit linting failures in PR1 Why this change is needed: The CI pipeline was failing on the "Run pre-commit" step with 6 F841 errors (unused variables) and 1 formatting issue (missing trailing comma). This was blocking the PR from being merged. How it solves it: 1. Added trailing comma to _get_workspace_lock() function signature to match Python formatting standards 2. Changed unused lock variables (lock1, lock2) to underscore (_) in test_finalization_cleanup.py to indicate intentional disposal of return values (locks are created for their side effects only) 3. Applied ruff-format auto-formatting fixes Impact: - All pre-commit checks now pass locally - CI pipeline should pass on next push - No functional changes, only code style fixes Testing: Verified with: uv run pre-commit run --files <modified files> All checks passed: trailing whitespace, end of files, ruff-format, ruff	2025-11-11 00:52:48 +08:00
chengjie	0bd162a416	fix: ensure finalize_share_data properly cleans up workspace locks Why this change is needed: The finalize_share_data() function was not properly cleaning up workspace lock-related global variables (_sync_locks, _workspace_async_locks, and lock registry variables). This caused stale references to remain after finalization, leading to EOFError or BrokenPipeError when trying to re-initialize or when processes tried to use locks after the Manager was shut down. How it solves it: 1. Added comprehensive cleanup of all Manager.dict proxies before Manager shutdown (_sync_locks, _lock_registry, _lock_registry_count, _lock_cleanup_data) 2. Added cleanup of per-process _workspace_async_locks dictionary 3. Reset all lock-related globals to None at end of finalization: - _workers, _lock_registry, _lock_registry_count, _lock_cleanup_data - _registry_guard, _storage_keyed_lock, _sync_locks - _workspace_async_locks, _earliest_mp_cleanup_time, _last_mp_cleanup_time Impact: - Prevents EOFError/BrokenPipeError in production deployments - Enables safe re-initialization after finalization - Critical for proper resource cleanup in multi-process deployments - Fixes memory leaks from stale lock references Testing: - Added 3 comprehensive tests in test_finalization_cleanup.py - All 23 workspace lock tests pass (17 original + 3 bug fixes + 3 finalization) - Tests verify clean re-initialization after finalization in both single-process and multiprocess modes	2025-11-11 00:23:42 +08:00
chengjie	9e3c64df03	fix: critical bugs in workspace lock multiprocess synchronization Bug 1a - RuntimeError when _registry_guard is None: - Added explicit check for _registry_guard initialization - Now raises clear RuntimeError instead of cryptic TypeError - Helps users understand they need to call initialize_share_data() first Bug 1b - Workspace async_locks not visible across processes: - Created new _workspace_async_locks dict for per-process storage - Fixed issue where async_locks modifications in one process were invisible to others - This is correct design since asyncio.Lock objects cannot be pickled/shared Why per-process async_locks: - asyncio.Lock objects cannot be shared across processes - Each process needs its own asyncio.Lock instances for coroutine sync - Cross-process sync is handled by Manager.RLock() in _sync_locks - Within-process async sync is handled by per-process asyncio.Lock Testing: - All 17 existing workspace lock tests pass - Added 3 new tests specifically for bug verification - Total 20 tests passing Impact: - Fixes potential race conditions in multiprocess scenarios - Ensures proper synchronization both across and within processes - Maintains backward compatibility	2025-11-11 00:15:06 +08:00
chengjie	27de78113d	style: apply code formatting to pass pre-commit checks - Split long function calls across multiple lines - Split long function definitions across multiple lines - Add blank line after docstring in test function These changes are purely formatting to comply with the project's linting standards (black/ruff). No functional changes.	2025-11-11 00:10:54 +08:00
chengjie	5d31412bd7	feat: add workspace isolation support to unified lock functions Why this change is needed: The current locking system uses global locks shared across all users and workspaces, causing blocking issues in multi-tenant scenarios. When one tenant performs document indexing, all other tenants are blocked waiting for the same global lock. This severely limits the system's ability to serve multiple users concurrently. How it solves it: - Add optional `workspace` parameter to 5 lock functions - Implement lazy creation of workspace-specific locks with proper synchronization - Store workspace locks in new `_sync_locks` dictionary - Support both multi-process (RLock) and single-process (asyncio.Lock) modes - Empty workspace parameter uses global lock for backward compatibility - Extract common logic into `_get_workspace_lock()` to eliminate duplication Impact: - Enables concurrent operations across different workspaces - Foundation for PR2 (pipeline status isolation) - Zero impact on existing code (all parameters optional with defaults) - Each workspace now has independent lock instances - Thread-safe lazy creation using _registry_guard in multiprocess mode - Automatic creation of async_locks for workspace locks in multiprocess mode Code Quality Improvements (Linus review feedback): - Fixed race condition: lazy creation protected by _registry_guard - Eliminated code duplication: common logic extracted to _get_workspace_lock() - Added async_lock support: workspace locks now have companion async_locks - Handles None workspace parameter gracefully - Clear separation of concerns: one function handles all workspace logic Testing: - 17 new test cases covering: - Basic functionality and naming - Workspace isolation and independence - Backward compatibility with empty workspace - Concurrent operations (3 workspaces in parallel) - Performance (1000 workspace lock creation <2s) - Edge cases (special characters, unicode, long names) - All existing tests pass (21/21 excluding env issues) - Verified lock serialization within workspace - Verified lock independence across workspaces Files modified: - lightrag/kg/shared_storage.py: refactored lock functions + synchronization - tests/test_workspace_locks.py: comprehensive test suite	2025-11-10 22:51:49 +08:00
yangdx	913fa1e415	Add concurrency warning for JsonKVStorage in cleanup tool	2025-11-09 23:04:04 +08:00
yangdx	1f9d0735c3	Bump API version to 0253	2025-11-09 14:42:22 +08:00
yangdx	37b7118901	Fix table alignment and add validation for empty cleanup selections	2025-11-09 14:17:56 +08:00
yangdx	1485cb82e9	Add LLM query cache cleanup tool for KV storage backends - Interactive cleanup workflow - Supports all KV storage types - Batch deletion with progress - Comprehensive error reporting - Preserves workspace isolation	2025-11-09 13:37:33 +08:00
yangdx	2f16065256	Refactor keyword_extraction from kwargs to explicit parameter • Add keyword_extraction param to functions • Remove kwargs.pop() calls • Update function signatures • Improve parameter documentation • Make parameter handling consistent	2025-11-09 12:02:17 +08:00
yangdx	88ab73f6ae	HotFix: Restore streaming response in OpenAI LLM The stream and timeout parameters were moved from **kwargs to explicit parameters in a previous commit, but were not being passed to the OpenAI API, causing streaming responses to fail and fall back to non-streaming mode.Fixes the issue where stream=True was being silently ignored, resulting in unexpected non-streaming behavior.	2025-11-09 11:52:26 +08:00
yangdx	7bc6ccea19	Add uv package manager support to installation docs	2025-11-09 04:31:07 +08:00
yangdx	754d2ad297	Add documentation for LLM cache migration between storage types	2025-11-09 00:41:07 +08:00
yangdx	a75efb06dc	Fix: prevent source data corruption by target upsert function • Prevent mutations bugs by using copy() when storing cache values • Protect filtered cache data and ensure batch data isolation	2025-11-09 00:02:19 +08:00
yangdx	987bc09cab	Update LLM cache migration docs and improve UX prompts	2025-11-08 23:48:19 +08:00
yangdx	1a91bcdb5f	Improve storage config validation and add config.ini fallback support • Add MongoDB env requirements • Support config.ini fallback • Warn on missing env vars • Check available storage count • Show config source info	2025-11-08 22:48:49 +08:00
yangdx	57ee7d5ac8	Merge branch 'main' into llm-cache-migrate	2025-11-08 22:15:46 +08:00
yangdx	3d9de5ed03	feat: improve Gemini client error handling and retry logic • Add google-api-core dependency • Add specific exception handling • Create InvalidResponseError class • Update retry decorators • Fix empty response handling	2025-11-08 22:10:09 +08:00
yangdx	1864b28242	Add colored output formatting to migration confirmation display	2025-11-08 21:16:41 +08:00
yangdx	e95b02fb55	Refactor storage selection UI with dynamic numbering and inline prompts • Remove standalone get_user_choice method • Add dynamic sequential numbering • Inline choice validation logic • Remove redundant storage type prints • Improve excluded storage handling	2025-11-08 20:42:27 +08:00
yangdx	5be04263b2	Fix deadlock in JSON cache migration and prevent same storage selection - Snapshot JSON data before yielding batches - Release lock during batch processing - Exclude source type from target selection - Add detailed docstring for lock behavior - Filter available storage types properly	2025-11-08 19:58:36 +08:00
yangdx	6b9f13c792	Enhance LLM cache migration tool with streaming and improved UX - Add streaming migration for memory efficiency - Implement graceful exit with Enter/0 - Add progress indicators for counting - Optimize batch processing by storage type - Update docs with new progress displays	2025-11-08 19:38:00 +08:00
yangdx	d0d31e9262	Improve LLM cache migration tool configuration and messaging	2025-11-08 18:52:33 +08:00
yangdx	6fc54d3625	Move LLM cache migration tool to lightrag.tools module - Relocated tool to proper package structure - Updated import paths and documentation - Added shared storage initialization - Fixed module path resolution - Updated usage instructions	2025-11-08 18:33:13 +08:00
yangdx	cf732dbfc6	Bump core version to 1.4.9.9 and API to 0252	2025-11-08 11:27:26 +08:00
yangdx	a624a9508a	Add Gemini to APIs requiring embedding dimension parameter	2025-11-08 03:54:50 +08:00
yangdx	de4ed73652	Add Gemini embedding support - Implement gemini_embed function - Add gemini to embedding binding choices - Add L2 normalization for dims < 3072	2025-11-08 03:34:30 +08:00
yangdx	f83ea3394e	Add section header comment for Gemini binding options	2025-11-08 02:07:31 +08:00
yangdx	0b2a15c452	Centralize embedding_send_dim config through args instead of env var	2025-11-08 01:52:23 +08:00
yangdx	03cc6262c4	Prohibit direct access to internal functions of EmbeddingFunc. • Fix similarity search error in query stage • Remove redundant null checks • Improve log readability	2025-11-08 01:43:36 +08:00
yangdx	ffeeae4208	refactor: simplify jina embedding dimension handling	2025-11-07 22:09:57 +08:00
yangdx	01b07b2be5	Refactor Jina embedding dimension by changing param to optional with default	2025-11-07 22:04:34 +08:00
yangdx	d95efcb9ad	Fix linting	2025-11-07 21:27:54 +08:00
yangdx	ce28f30ca6	Add embedding_dim parameter support to embedding functions • Pass embedding_dim to jina_embed call • Pass embedding_dim to openai_embed call	2025-11-07 21:23:59 +08:00
yangdx	c14f25b7f8	Add mandatory dimension parameter handling for Jina API compliance	2025-11-07 21:08:34 +08:00
yangdx	d8a6355e41	Merge branch 'main' into apply-dim-to-embedding-call	2025-11-07 20:48:22 +08:00
yangdx	33a1482f7f	Add optional embedding dimension parameter control via env var * Add EMBEDDING_SEND_DIM environment variable * Update Jina/OpenAI embed functions * Add send_dimensions to EmbeddingFunc * Auto-inject embedding_dim when enabled * Add parameter validation warnings	2025-11-07 20:46:40 +08:00
yangdx	fc40a36968	Add timeout support to Gemini LLM and improve parameter handling • Add timeout parameter to Gemini client • Convert timeout seconds to milliseconds • Update function signatures consistently • Add Gemini thinking config example • Clean up parameter documentation	2025-11-07 15:50:14 +08:00
yangdx	3cb4eae492	Add Chain of Thought support to Gemini LLM integration - Extract thoughts from response parts - Add COT enable/disable parameter	2025-11-07 15:22:14 +08:00
yangdx	6686edfd35	Update Gemini LLM options: add seed and thinking config, remove MIME type	2025-11-07 14:32:42 +08:00
Yasiru Rangana	d94aae9c5e	Add dimensions parameter support to openai_embed()	2025-11-07 09:55:06 +11:00
yangdx	8c27555358	Fix Gemini response parsing to avoid warnings from non-text parts	2025-11-07 04:00:37 +08:00
yangdx	ea141e2779	Fix: Remove redundant entity/relation chunk deletions	2025-11-07 02:56:16 +08:00
yangdx	5bcd2926ca	Bump API version to 0251	2025-11-06 21:45:47 +08:00
yangdx	04ed709b34	Optimize entity deletion by batching edge queries to avoid N+1 problem • Add batch get_nodes_edges_batch call • Remove individual get_node_edges calls • Improve query performance	2025-11-06 21:34:47 +08:00
yangdx	3276b7a49d	Fix linting	2025-11-06 20:48:51 +08:00
yangdx	155f59759b	Fix node ID normalization and improve batch operation consistency • Remove premature ID normalization • Add lookup mapping for node resolution • Filter results by requested nodes only • Improve error logging with workspace	2025-11-06 20:34:53 +08:00
yangdx	807d2461d3	Remove unused chunk-based node/edge retrieval methods	2025-11-06 18:17:10 +08:00
yangdx	831e658ed8	Update readme	2025-11-06 16:26:07 +08:00
yangdx	6e36ff41e1	Fix linting	2025-11-06 16:01:24 +08:00

1 2 3 4 5 ...

3587 commits