Commit graph

5622 commits

Author SHA1 Message Date
yangdx
72f68c2a61 Update env.example 2025-11-17 12:54:32 +08:00
yangdx
a08bc72635 Fix empty dict handling after JSON sanitization
• Replace truthy checks with `is not None`
• Handle empty dict edge case properly
• Prevent data reload failures
• Add comprehensive test coverage
• Fix JsonKVStorage and DocStatusStorage
2025-11-17 12:54:32 +08:00
yangdx
cca0800ed4 Fix migration to reload sanitized data and prevent memory corruption
• Reload cleaned data after sanitization
• Update shared memory with clean data
• Add specific surrogate char tests
• Test migration sanitization flow
• Prevent dirty data in memory
2025-11-17 12:54:32 +08:00
yangdx
7f54f47093 Optimize JSON string sanitization with precompiled regex and zero-copy
- Precompile regex pattern at module level
- Zero-copy path for clean strings
- Use C-level regex for performance
- Remove deprecated _sanitize_json_data
- Fast detection for common case
2025-11-17 12:54:32 +08:00
yangdx
f289cf6225 Optimize JSON write with fast/slow path to reduce memory usage
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage
2025-11-17 12:54:32 +08:00
yangdx
93a3e47134 Remove deprecated response_type parameter from query settings
- Bump API version to 0254
- Remove response format UI controls
- Hard-code response_type in query params
- Add migration for version 19
- Clean up settings store structure
2025-11-17 12:54:32 +08:00
yangdx
abeaac84fa Improve JSON data sanitization to handle tuples and dict keys
- Sanitize dictionary keys
- Preserve tuple types
- Handle nested structures better
2025-11-17 12:54:32 +08:00
yangdx
5885637ebf Add specialized JSON string sanitizer to prevent UTF-8 encoding errors
• Remove surrogate characters (U+D800-DFFF)
• Filter Unicode non-characters
• Direct char-by-char filtering
2025-11-17 12:54:32 +08:00
yangdx
23cbb9c9b2 Add data sanitization to JSON writing to prevent UTF-8 encoding errors
• Add _sanitize_json_data helper function
• Recursively clean strings in data
• Sanitize before JSON serialization
• Prevent encoding-related crashes
• Use existing sanitize_text_for_encoding
2025-11-17 12:54:32 +08:00
yangdx
ff8f158891 Update env.example 2025-11-17 12:54:32 +08:00
yangdx
c434879c7a Replace PyPDF2 with pypdf for PDF processing
- Update import from PyPDF2 to pypdf
- Change dependency to pypdf>=6.1.0
- Update all requirements files
- Remove PyPDF2 from lock file
- Use modern pypdf library
2025-11-17 12:54:32 +08:00
yangdx
af5423919b Support async chunking functions in LightRAG processing pipeline
- Add Awaitable and Union type imports
- Update chunking_func type annotation
- Handle coroutine results with await
- Add return type validation
- Update docstring for async support
2025-11-17 12:54:32 +08:00
Tong Da
5016025453 easier version: detect chunking_func result is coroutine or not 2025-11-17 12:54:32 +08:00
Tong Da
7740500693 support async chunking func to improve processing performance when a heavy chunking_func is passed in by user 2025-11-17 12:54:32 +08:00
BukeLy
18a4870229 fix: Add default workspace support for backward compatibility
Fixes two compatibility issues in workspace isolation:

1. Problem: lightrag_server.py calls initialize_pipeline_status()
   without workspace parameter, causing pipeline to initialize in
   global namespace instead of rag's workspace.

   Solution: Add set_default_workspace() mechanism in shared_storage.
   LightRAG.initialize_storages() now sets default workspace, which
   initialize_pipeline_status() uses when called without parameters.

2. Problem: /health endpoint hardcoded to use "pipeline_status",
   cannot return workspace-specific status or support frontend
   workspace selection.

   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
   extracts workspace from header or falls back to server default,
   returning correct workspace-specific pipeline status.

Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
  update /health endpoint to support LIGHTRAG-WORKSPACE header

Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces

Related: #2353
2025-11-17 12:54:20 +08:00
BukeLy
eb52ec94d7 feat: Add workspace isolation support for pipeline status
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.

Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
  isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
  * lightrag.py: process_document_queue(), aremove_document()
  * document_routes.py: background_delete_documents(), clear_documents(),
    cancel_pipeline(), get_pipeline_status(), delete_documents()

Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants

Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag

Code changes: 4 files, 141 insertions(+), 28 deletions(-)

Testing: All syntax checks passed, comprehensive workspace isolation tests completed
2025-11-17 12:53:44 +08:00
yangdx
e8f5f57ec7 Update qdrant-client minimum version from 1.7.0 to 1.11.0
• Bump qdrant-client to >=1.11.0
• Update pyproject.toml dependency
• Update requirements files
• Sync uv.lock with new version
• Maintain <2.0.0 upper bound
2025-11-10 11:54:48 +08:00
yangdx
913fa1e415 Add concurrency warning for JsonKVStorage in cleanup tool 2025-11-09 23:04:04 +08:00
yangdx
1f9d0735c3 Bump API version to 0253 2025-11-09 14:42:22 +08:00
Daniel.y
3110ca518b
Merge pull request #2335 from danielaskdd/llm-cache-cleanup
Feat: Add LLM Query Cache Cleanup Tool
2025-11-09 14:27:58 +08:00
yangdx
37b7118901 Fix table alignment and add validation for empty cleanup selections 2025-11-09 14:17:56 +08:00
yangdx
1485cb82e9 Add LLM query cache cleanup tool for KV storage backends
- Interactive cleanup workflow
- Supports all KV storage types
- Batch deletion with progress
- Comprehensive error reporting
- Preserves workspace isolation
2025-11-09 13:37:33 +08:00
Daniel.y
8859eaade7
Merge pull request #2334 from danielaskdd/hotfix-opena-streaming
HotFix: Restore OpenAI Streaming Response & Refactor keyword_extraction Parameter
2025-11-09 12:25:20 +08:00
yangdx
2f16065256 Refactor keyword_extraction from kwargs to explicit parameter
• Add keyword_extraction param to functions
• Remove kwargs.pop() calls
• Update function signatures
• Improve parameter documentation
• Make parameter handling consistent
2025-11-09 12:02:17 +08:00
yangdx
88ab73f6ae HotFix: Restore streaming response in OpenAI LLM
The stream and timeout parameters were moved from **kwargs to explicit
parameters in a previous commit, but were not being passed to the OpenAI
API, causing streaming responses to fail and fall back to non-streaming
mode.Fixes the issue where stream=True was being silently ignored, resulting
in unexpected non-streaming behavior.
2025-11-09 11:52:26 +08:00
yangdx
c12bc372dc Update README 2025-11-09 04:35:41 +08:00
yangdx
7bc6ccea19 Add uv package manager support to installation docs 2025-11-09 04:31:07 +08:00
yangdx
80f2e691fc Remove redundant i18n import triggered the Vite “dynamic + static import” warning 2025-11-09 02:48:11 +08:00
yangdx
1334b3d896 Update uv.lock 2025-11-09 02:32:30 +08:00
yangdx
754d2ad297 Add documentation for LLM cache migration between storage types 2025-11-09 00:41:07 +08:00
Daniel.y
8adf3180d6
Merge pull request #2330 from danielaskdd/llm-cache-migrate
Feat: Add LLM Cache Migration Tool
2025-11-09 00:12:32 +08:00
yangdx
a75efb06dc Fix: prevent source data corruption by target upsert function
• Prevent mutations bugs by using copy() when storing cache values
• Protect filtered cache data and ensure batch data isolation
2025-11-09 00:02:19 +08:00
yangdx
987bc09cab Update LLM cache migration docs and improve UX prompts 2025-11-08 23:48:19 +08:00
yangdx
1a91bcdb5f Improve storage config validation and add config.ini fallback support
• Add MongoDB env requirements
• Support config.ini fallback
• Warn on missing env vars
• Check available storage count
• Show config source info
2025-11-08 22:48:49 +08:00
yangdx
57ee7d5ac8 Merge branch 'main' into llm-cache-migrate 2025-11-08 22:15:46 +08:00
Daniel.y
85bb98b307
Merge pull request #2331 from danielaskdd/gemini-retry
Fix Gemini driver retry mechanism
2025-11-08 22:14:56 +08:00
yangdx
3d9de5ed03 feat: improve Gemini client error handling and retry logic
• Add google-api-core dependency
• Add specific exception handling
• Create InvalidResponseError class
• Update retry decorators
• Fix empty response handling
2025-11-08 22:10:09 +08:00
yangdx
1864b28242 Add colored output formatting to migration confirmation display 2025-11-08 21:16:41 +08:00
yangdx
e95b02fb55 Refactor storage selection UI with dynamic numbering and inline prompts
• Remove standalone get_user_choice method
• Add dynamic sequential numbering
• Inline choice validation logic
• Remove redundant storage type prints
• Improve excluded storage handling
2025-11-08 20:42:27 +08:00
yangdx
b72632e4d4 Add async generator lock management rule to cline extension 2025-11-08 20:03:59 +08:00
yangdx
5be04263b2 Fix deadlock in JSON cache migration and prevent same storage selection
- Snapshot JSON data before yielding batches
- Release lock during batch processing
- Exclude source type from target selection
- Add detailed docstring for lock behavior
- Filter available storage types properly
2025-11-08 19:58:36 +08:00
yangdx
6b9f13c792 Enhance LLM cache migration tool with streaming and improved UX
- Add streaming migration for memory efficiency
- Implement graceful exit with Enter/0
- Add progress indicators for counting
- Optimize batch processing by storage type
- Update docs with new progress displays
2025-11-08 19:38:00 +08:00
yangdx
d0d31e9262 Improve LLM cache migration tool configuration and messaging 2025-11-08 18:52:33 +08:00
yangdx
6fc54d3625 Move LLM cache migration tool to lightrag.tools module
- Relocated tool to proper package structure
- Updated import paths and documentation
- Added shared storage initialization
- Fixed module path resolution
- Updated usage instructions
2025-11-08 18:33:13 +08:00
yangdx
0f2c0de8df Fix linting 2025-11-08 18:16:03 +08:00
yangdx
55274dde59 Add LLM cache migration tool for KV storage backends
- Supports JSON/Redis/PostgreSQL/MongoDB
- Batch migration with error tracking
- Workspace-aware data transfer
- Memory-efficient pagination
- Comprehensive migration reporting
2025-11-08 17:57:22 +08:00
yangdx
cf732dbfc6 Bump core version to 1.4.9.9 and API to 0252 2025-11-08 11:27:26 +08:00
Daniel.y
29a349f25b
Merge pull request #2329 from danielaskdd/gemini-embedding
Feat: Add Gemini Embedding Support to LightRAG
2025-11-08 04:10:52 +08:00
yangdx
a624a9508a Add Gemini to APIs requiring embedding dimension parameter 2025-11-08 03:54:50 +08:00
yangdx
de4ed73652 Add Gemini embedding support
- Implement gemini_embed function
- Add gemini to embedding binding choices
- Add L2 normalization for dims < 3072
2025-11-08 03:34:30 +08:00