LightRAG

Author	SHA1	Message	Date
yangdx	72f68c2a61	Update env.example	2025-11-17 12:54:32 +08:00
yangdx	a08bc72635	Fix empty dict handling after JSON sanitization • Replace truthy checks with `is not None` • Handle empty dict edge case properly • Prevent data reload failures • Add comprehensive test coverage • Fix JsonKVStorage and DocStatusStorage	2025-11-17 12:54:32 +08:00
yangdx	cca0800ed4	Fix migration to reload sanitized data and prevent memory corruption • Reload cleaned data after sanitization • Update shared memory with clean data • Add specific surrogate char tests • Test migration sanitization flow • Prevent dirty data in memory	2025-11-17 12:54:32 +08:00
yangdx	7f54f47093	Optimize JSON string sanitization with precompiled regex and zero-copy - Precompile regex pattern at module level - Zero-copy path for clean strings - Use C-level regex for performance - Remove deprecated _sanitize_json_data - Fast detection for common case	2025-11-17 12:54:32 +08:00
yangdx	f289cf6225	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage	2025-11-17 12:54:32 +08:00
yangdx	93a3e47134	Remove deprecated response_type parameter from query settings - Bump API version to 0254 - Remove response format UI controls - Hard-code response_type in query params - Add migration for version 19 - Clean up settings store structure	2025-11-17 12:54:32 +08:00
yangdx	abeaac84fa	Improve JSON data sanitization to handle tuples and dict keys - Sanitize dictionary keys - Preserve tuple types - Handle nested structures better	2025-11-17 12:54:32 +08:00
yangdx	5885637ebf	Add specialized JSON string sanitizer to prevent UTF-8 encoding errors • Remove surrogate characters (U+D800-DFFF) • Filter Unicode non-characters • Direct char-by-char filtering	2025-11-17 12:54:32 +08:00
yangdx	23cbb9c9b2	Add data sanitization to JSON writing to prevent UTF-8 encoding errors • Add _sanitize_json_data helper function • Recursively clean strings in data • Sanitize before JSON serialization • Prevent encoding-related crashes • Use existing sanitize_text_for_encoding	2025-11-17 12:54:32 +08:00
yangdx	ff8f158891	Update env.example	2025-11-17 12:54:32 +08:00
yangdx	c434879c7a	Replace PyPDF2 with pypdf for PDF processing - Update import from PyPDF2 to pypdf - Change dependency to pypdf>=6.1.0 - Update all requirements files - Remove PyPDF2 from lock file - Use modern pypdf library	2025-11-17 12:54:32 +08:00
yangdx	af5423919b	Support async chunking functions in LightRAG processing pipeline - Add Awaitable and Union type imports - Update chunking_func type annotation - Handle coroutine results with await - Add return type validation - Update docstring for async support	2025-11-17 12:54:32 +08:00
Tong Da	5016025453	easier version: detect chunking_func result is coroutine or not	2025-11-17 12:54:32 +08:00
Tong Da	7740500693	support async chunking func to improve processing performance when a heavy `chunking_func` is passed in by user	2025-11-17 12:54:32 +08:00
BukeLy	18a4870229	fix: Add default workspace support for backward compatibility Fixes two compatibility issues in workspace isolation: 1. Problem: lightrag_server.py calls initialize_pipeline_status() without workspace parameter, causing pipeline to initialize in global namespace instead of rag's workspace. Solution: Add set_default_workspace() mechanism in shared_storage. LightRAG.initialize_storages() now sets default workspace, which initialize_pipeline_status() uses when called without parameters. 2. Problem: /health endpoint hardcoded to use "pipeline_status", cannot return workspace-specific status or support frontend workspace selection. Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now extracts workspace from header or falls back to server default, returning correct workspace-specific pipeline status. Changes: - lightrag/kg/shared_storage.py: Add set/get_default_workspace() - lightrag/lightrag.py: Call set_default_workspace() in initialize_storages() - lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper, update /health endpoint to support LIGHTRAG-WORKSPACE header Testing: - Backward compatibility: Old code works without modification - Multi-instance safety: Explicit workspace passing preserved - /health endpoint: Supports both default and header-specified workspaces Related: #2353	2025-11-17 12:54:20 +08:00
BukeLy	eb52ec94d7	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed	2025-11-17 12:53:44 +08:00
yangdx	e8f5f57ec7	Update qdrant-client minimum version from 1.7.0 to 1.11.0 • Bump qdrant-client to >=1.11.0 • Update pyproject.toml dependency • Update requirements files • Sync uv.lock with new version • Maintain <2.0.0 upper bound	2025-11-10 11:54:48 +08:00
yangdx	913fa1e415	Add concurrency warning for JsonKVStorage in cleanup tool	2025-11-09 23:04:04 +08:00
yangdx	1f9d0735c3	Bump API version to 0253	2025-11-09 14:42:22 +08:00
Daniel.y	3110ca518b	Merge pull request #2335 from danielaskdd/llm-cache-cleanup Feat: Add LLM Query Cache Cleanup Tool	2025-11-09 14:27:58 +08:00
yangdx	37b7118901	Fix table alignment and add validation for empty cleanup selections	2025-11-09 14:17:56 +08:00
yangdx	1485cb82e9	Add LLM query cache cleanup tool for KV storage backends - Interactive cleanup workflow - Supports all KV storage types - Batch deletion with progress - Comprehensive error reporting - Preserves workspace isolation	2025-11-09 13:37:33 +08:00
Daniel.y	8859eaade7	Merge pull request #2334 from danielaskdd/hotfix-opena-streaming HotFix: Restore OpenAI Streaming Response & Refactor keyword_extraction Parameter	2025-11-09 12:25:20 +08:00
yangdx	2f16065256	Refactor keyword_extraction from kwargs to explicit parameter • Add keyword_extraction param to functions • Remove kwargs.pop() calls • Update function signatures • Improve parameter documentation • Make parameter handling consistent	2025-11-09 12:02:17 +08:00
yangdx	88ab73f6ae	HotFix: Restore streaming response in OpenAI LLM The stream and timeout parameters were moved from **kwargs to explicit parameters in a previous commit, but were not being passed to the OpenAI API, causing streaming responses to fail and fall back to non-streaming mode.Fixes the issue where stream=True was being silently ignored, resulting in unexpected non-streaming behavior.	2025-11-09 11:52:26 +08:00
yangdx	c12bc372dc	Update README	2025-11-09 04:35:41 +08:00
yangdx	7bc6ccea19	Add uv package manager support to installation docs	2025-11-09 04:31:07 +08:00
yangdx	80f2e691fc	Remove redundant i18n import triggered the Vite “dynamic + static import” warning	2025-11-09 02:48:11 +08:00
yangdx	1334b3d896	Update uv.lock	2025-11-09 02:32:30 +08:00
yangdx	754d2ad297	Add documentation for LLM cache migration between storage types	2025-11-09 00:41:07 +08:00
Daniel.y	8adf3180d6	Merge pull request #2330 from danielaskdd/llm-cache-migrate Feat: Add LLM Cache Migration Tool	2025-11-09 00:12:32 +08:00
yangdx	a75efb06dc	Fix: prevent source data corruption by target upsert function • Prevent mutations bugs by using copy() when storing cache values • Protect filtered cache data and ensure batch data isolation	2025-11-09 00:02:19 +08:00
yangdx	987bc09cab	Update LLM cache migration docs and improve UX prompts	2025-11-08 23:48:19 +08:00
yangdx	1a91bcdb5f	Improve storage config validation and add config.ini fallback support • Add MongoDB env requirements • Support config.ini fallback • Warn on missing env vars • Check available storage count • Show config source info	2025-11-08 22:48:49 +08:00
yangdx	57ee7d5ac8	Merge branch 'main' into llm-cache-migrate	2025-11-08 22:15:46 +08:00
Daniel.y	85bb98b307	Merge pull request #2331 from danielaskdd/gemini-retry Fix Gemini driver retry mechanism	2025-11-08 22:14:56 +08:00
yangdx	3d9de5ed03	feat: improve Gemini client error handling and retry logic • Add google-api-core dependency • Add specific exception handling • Create InvalidResponseError class • Update retry decorators • Fix empty response handling	2025-11-08 22:10:09 +08:00
yangdx	1864b28242	Add colored output formatting to migration confirmation display	2025-11-08 21:16:41 +08:00
yangdx	e95b02fb55	Refactor storage selection UI with dynamic numbering and inline prompts • Remove standalone get_user_choice method • Add dynamic sequential numbering • Inline choice validation logic • Remove redundant storage type prints • Improve excluded storage handling	2025-11-08 20:42:27 +08:00
yangdx	b72632e4d4	Add async generator lock management rule to cline extension	2025-11-08 20:03:59 +08:00
yangdx	5be04263b2	Fix deadlock in JSON cache migration and prevent same storage selection - Snapshot JSON data before yielding batches - Release lock during batch processing - Exclude source type from target selection - Add detailed docstring for lock behavior - Filter available storage types properly	2025-11-08 19:58:36 +08:00
yangdx	6b9f13c792	Enhance LLM cache migration tool with streaming and improved UX - Add streaming migration for memory efficiency - Implement graceful exit with Enter/0 - Add progress indicators for counting - Optimize batch processing by storage type - Update docs with new progress displays	2025-11-08 19:38:00 +08:00
yangdx	d0d31e9262	Improve LLM cache migration tool configuration and messaging	2025-11-08 18:52:33 +08:00
yangdx	6fc54d3625	Move LLM cache migration tool to lightrag.tools module - Relocated tool to proper package structure - Updated import paths and documentation - Added shared storage initialization - Fixed module path resolution - Updated usage instructions	2025-11-08 18:33:13 +08:00
yangdx	0f2c0de8df	Fix linting	2025-11-08 18:16:03 +08:00
yangdx	55274dde59	Add LLM cache migration tool for KV storage backends - Supports JSON/Redis/PostgreSQL/MongoDB - Batch migration with error tracking - Workspace-aware data transfer - Memory-efficient pagination - Comprehensive migration reporting	2025-11-08 17:57:22 +08:00
yangdx	cf732dbfc6	Bump core version to 1.4.9.9 and API to 0252	2025-11-08 11:27:26 +08:00
Daniel.y	29a349f25b	Merge pull request #2329 from danielaskdd/gemini-embedding Feat: Add Gemini Embedding Support to LightRAG	2025-11-08 04:10:52 +08:00
yangdx	a624a9508a	Add Gemini to APIs requiring embedding dimension parameter	2025-11-08 03:54:50 +08:00
yangdx	de4ed73652	Add Gemini embedding support - Implement gemini_embed function - Add gemini to embedding binding choices - Add L2 normalization for dims < 3072	2025-11-08 03:34:30 +08:00

1 2 3 4 5 ...

5622 commits