yangdx
39b49e92ff
Convert embedding_token_limit from property to field with __post_init__
...
• Remove property decorator
• Add field with init=False
• Set value in __post_init__ method
• embedding_token_limit is now in config dictionary
2025-11-14 20:58:41 +08:00
yangdx
ab4d7ac2b0
Add configurable embedding token limit with validation
...
- Add EMBEDDING_TOKEN_LIMIT env var
- Set max_token_size on embedding func
- Add token limit property to LightRAG
- Validate summary length vs limit
- Log warning when limit exceeded
2025-11-14 19:28:36 +08:00
yangdx
680e36c6eb
Improve Bedrock error handling with retry logic and custom exceptions
...
• Add specific exception types
• Implement proper retry mechanism
• Better error classification
• Enhanced logging and validation
• Enable embedding retry decorator
2025-11-14 18:51:41 +08:00
yangdx
05852e1ab2
Add max_token_size parameter to embedding function decorators
...
- Add max_token_size=8192 to all embed funcs
- Move siliconcloud to deprecated folder
- Import wrap_embedding_func_with_attrs
- Update EmbeddingFunc docstring
- Fix langfuse import type annotation
2025-11-14 18:41:43 +08:00
yangdx
4401f86f07
Refactor exception handling in MemgraphStorage label methods
2025-11-14 11:01:26 +08:00
yangdx
1ccef2b932
Fix null reference errors in graph database error handling
...
- Initialize result vars to None
- Add null checks before consume calls
- Prevent crashes in except blocks
- Apply fix to both Neo4J and Memgraph
2025-11-14 10:39:04 +08:00
yangdx
c164c8f631
Merge branch 'main' of github.com:HKUDS/LightRAG
2025-11-13 20:42:47 +08:00
yangdx
1889301597
Merge branch 'feat/add_cloud_ollama_support'
2025-11-13 20:41:58 +08:00
yangdx
77ad906d3a
Improve error handling and logging in cloud model detection
2025-11-13 20:41:44 +08:00
yangdx
cc031a3db9
Add macOS compatibility check for DOCLING with multi-worker Gunicorn
2025-11-13 19:18:04 +08:00
LacombeLouis
844537e378
Add a better regex
2025-11-13 12:17:51 +01:00
yangdx
a24d8181c2
Improve docling integration with macOS compatibility and CLI flag
...
- Add --docling CLI flag for easier setup
- Add numpy version constraints
- Exclude docling on macOS (fork-safety)
2025-11-13 18:58:09 +08:00
yangdx
746c069ab0
Implement lazy configuration initialization for API server
...
• Add lazy config initialization
• Maintain backward compatibility
• Support programmatic usage
• Add gunicorn dependency
• Explicit config in entry points
2025-11-13 15:28:05 +08:00
yangdx
4b31942e2a
refactor: move document deps to api group, remove dynamic imports
...
- Merge offline-docs into api extras
- Remove pipmaster dynamic installs
- Add async document processing
- Pre-check docling availability
- Update offline deployment docs
2025-11-13 13:34:09 +08:00
yangdx
c230d1a28d
Replace asyncio.iscoroutine with inspect.isawaitable for better detection
2025-11-13 12:56:01 +08:00
yangdx
297e460740
Merge branch 'main' into tongda/main
2025-11-13 12:37:37 +08:00
yangdx
940bec0b31
Support async chunking functions in LightRAG processing pipeline
...
- Add Awaitable and Union type imports
- Update chunking_func type annotation
- Handle coroutine results with await
- Add return type validation
- Update docstring for async support
2025-11-13 12:37:15 +08:00
Louis Lacombe
f7432a260e
Add support for environment variable fallback for API key and default host for cloud models
2025-11-12 16:11:05 +00:00
yangdx
70cc2419f2
Fix empty dict handling after JSON sanitization
...
• Replace truthy checks with `is not None`
• Handle empty dict edge case properly
• Prevent data reload failures
• Add comprehensive test coverage
• Fix JsonKVStorage and DocStatusStorage
2025-11-12 16:40:57 +08:00
yangdx
dcf1d28681
Fix migration to reload sanitized data and prevent memory corruption
...
• Reload cleaned data after sanitization
• Update shared memory with clean data
• Add specific surrogate char tests
• Test migration sanitization flow
• Prevent dirty data in memory
2025-11-12 16:16:28 +08:00
yangdx
6de4123f74
Optimize JSON string sanitization with precompiled regex and zero-copy
...
- Precompile regex pattern at module level
- Zero-copy path for clean strings
- Use C-level regex for performance
- Remove deprecated _sanitize_json_data
- Fast detection for common case
2025-11-12 15:42:07 +08:00
yangdx
777c987371
Optimize JSON write with fast/slow path to reduce memory usage
...
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage
2025-11-12 13:48:56 +08:00
yangdx
8c07c91833
Remove deprecated response_type parameter from query settings
...
- Bump API version to 0254
- Remove response format UI controls
- Hard-code response_type in query params
- Add migration for version 19
- Clean up settings store structure
2025-11-12 12:19:30 +08:00
yangdx
f28a0c25b1
Improve JSON data sanitization to handle tuples and dict keys
...
- Sanitize dictionary keys
- Preserve tuple types
- Handle nested structures better
2025-11-12 00:50:18 +08:00
yangdx
6918a88f92
Add specialized JSON string sanitizer to prevent UTF-8 encoding errors
...
• Remove surrogate characters (U+D800-DFFF)
• Filter Unicode non-characters
• Direct char-by-char filtering
2025-11-12 00:38:47 +08:00
yangdx
d1f4b6e515
Add data sanitization to JSON writing to prevent UTF-8 encoding errors
...
• Add _sanitize_json_data helper function
• Recursively clean strings in data
• Sanitize before JSON serialization
• Prevent encoding-related crashes
• Use existing sanitize_text_for_encoding
2025-11-12 00:11:13 +08:00
yangdx
fdcb4d0b6d
Replace PyPDF2 with pypdf for PDF processing
...
- Update import from PyPDF2 to pypdf
- Change dependency to pypdf>=6.1.0
- Update all requirements files
- Remove PyPDF2 from lock file
- Use modern pypdf library
2025-11-11 01:38:09 +08:00
Tong Da
245df75d9c
easier version: detect chunking_func result is coroutine or not
2025-11-10 20:49:50 +08:00
yangdx
913fa1e415
Add concurrency warning for JsonKVStorage in cleanup tool
2025-11-09 23:04:04 +08:00
Tong Da
d137ba5843
support async chunking func to improve processing performance when a heavy chunking_func is passed in by user
2025-11-09 14:52:42 +08:00
yangdx
1f9d0735c3
Bump API version to 0253
2025-11-09 14:42:22 +08:00
yangdx
37b7118901
Fix table alignment and add validation for empty cleanup selections
2025-11-09 14:17:56 +08:00
yangdx
1485cb82e9
Add LLM query cache cleanup tool for KV storage backends
...
- Interactive cleanup workflow
- Supports all KV storage types
- Batch deletion with progress
- Comprehensive error reporting
- Preserves workspace isolation
2025-11-09 13:37:33 +08:00
yangdx
2f16065256
Refactor keyword_extraction from kwargs to explicit parameter
...
• Add keyword_extraction param to functions
• Remove kwargs.pop() calls
• Update function signatures
• Improve parameter documentation
• Make parameter handling consistent
2025-11-09 12:02:17 +08:00
yangdx
88ab73f6ae
HotFix: Restore streaming response in OpenAI LLM
...
The stream and timeout parameters were moved from **kwargs to explicit
parameters in a previous commit, but were not being passed to the OpenAI
API, causing streaming responses to fail and fall back to non-streaming
mode.Fixes the issue where stream=True was being silently ignored, resulting
in unexpected non-streaming behavior.
2025-11-09 11:52:26 +08:00
yangdx
7bc6ccea19
Add uv package manager support to installation docs
2025-11-09 04:31:07 +08:00
yangdx
754d2ad297
Add documentation for LLM cache migration between storage types
2025-11-09 00:41:07 +08:00
yangdx
a75efb06dc
Fix: prevent source data corruption by target upsert function
...
• Prevent mutations bugs by using copy() when storing cache values
• Protect filtered cache data and ensure batch data isolation
2025-11-09 00:02:19 +08:00
yangdx
987bc09cab
Update LLM cache migration docs and improve UX prompts
2025-11-08 23:48:19 +08:00
yangdx
1a91bcdb5f
Improve storage config validation and add config.ini fallback support
...
• Add MongoDB env requirements
• Support config.ini fallback
• Warn on missing env vars
• Check available storage count
• Show config source info
2025-11-08 22:48:49 +08:00
yangdx
57ee7d5ac8
Merge branch 'main' into llm-cache-migrate
2025-11-08 22:15:46 +08:00
yangdx
3d9de5ed03
feat: improve Gemini client error handling and retry logic
...
• Add google-api-core dependency
• Add specific exception handling
• Create InvalidResponseError class
• Update retry decorators
• Fix empty response handling
2025-11-08 22:10:09 +08:00
yangdx
1864b28242
Add colored output formatting to migration confirmation display
2025-11-08 21:16:41 +08:00
yangdx
e95b02fb55
Refactor storage selection UI with dynamic numbering and inline prompts
...
• Remove standalone get_user_choice method
• Add dynamic sequential numbering
• Inline choice validation logic
• Remove redundant storage type prints
• Improve excluded storage handling
2025-11-08 20:42:27 +08:00
yangdx
5be04263b2
Fix deadlock in JSON cache migration and prevent same storage selection
...
- Snapshot JSON data before yielding batches
- Release lock during batch processing
- Exclude source type from target selection
- Add detailed docstring for lock behavior
- Filter available storage types properly
2025-11-08 19:58:36 +08:00
yangdx
6b9f13c792
Enhance LLM cache migration tool with streaming and improved UX
...
- Add streaming migration for memory efficiency
- Implement graceful exit with Enter/0
- Add progress indicators for counting
- Optimize batch processing by storage type
- Update docs with new progress displays
2025-11-08 19:38:00 +08:00
yangdx
d0d31e9262
Improve LLM cache migration tool configuration and messaging
2025-11-08 18:52:33 +08:00
yangdx
6fc54d3625
Move LLM cache migration tool to lightrag.tools module
...
- Relocated tool to proper package structure
- Updated import paths and documentation
- Added shared storage initialization
- Fixed module path resolution
- Updated usage instructions
2025-11-08 18:33:13 +08:00
yangdx
cf732dbfc6
Bump core version to 1.4.9.9 and API to 0252
2025-11-08 11:27:26 +08:00
yangdx
a624a9508a
Add Gemini to APIs requiring embedding dimension parameter
2025-11-08 03:54:50 +08:00