LightRAG

Author	SHA1	Message	Date
yangdx	9262f66d13	Bump API version to 0255	2025-11-17 17:07:18 +08:00
yangdx	90f52acf0c	Fix linting	2025-11-17 12:28:53 +08:00
yangdx	c13f9116d9	Add embedding dimension validation to EmbeddingFunc wrapper • Validate total elements divisibility • Check vector count matches input count • Raise clear error messages on mismatch • Ensure embedding output correctness • Add docstring for EmbeddingFunc class	2025-11-17 12:26:54 +08:00
yangdx	b5589ce4d5	Merge branch 'main' into embedding-limit	2025-11-15 01:10:34 +08:00
yangdx	4343db753a	Add macOS fork safety check for Gunicorn multi-worker mode • Check OBJC_DISABLE_INITIALIZE_FORK_SAFETY • Prevent NumPy/Accelerate crashes • Show detailed error message • Provide multiple fix options • Exit early if misconfigured	2025-11-15 00:58:23 +08:00
yangdx	5dec4deac7	Improve embedding config priority and add debug logging • Fix embedding_dim priority logic • Add final config logging	2025-11-14 23:22:44 +08:00
yangdx	de4412dd40	Fix embedding token limit initialization order * Capture max_token_size before decorator * Apply wrapper after capturing attribute * Prevent decorator from stripping dataclass * Ensure token limit is properly set	2025-11-14 22:56:03 +08:00
yangdx	963a0a5db1	Refactor embedding function creation with proper attribute inheritance - Extract max_token_size from providers - Avoid double-wrapping EmbeddingFunc - Improve configuration priority logic - Add comprehensive debug logging - Return complete EmbeddingFunc instance	2025-11-14 22:29:08 +08:00
yangdx	39b49e92ff	Convert embedding_token_limit from property to field with __post_init__ • Remove property decorator • Add field with init=False • Set value in __post_init__ method • embedding_token_limit is now in config dictionary	2025-11-14 20:58:41 +08:00
yangdx	ab4d7ac2b0	Add configurable embedding token limit with validation - Add EMBEDDING_TOKEN_LIMIT env var - Set max_token_size on embedding func - Add token limit property to LightRAG - Validate summary length vs limit - Log warning when limit exceeded	2025-11-14 19:28:36 +08:00
yangdx	680e36c6eb	Improve Bedrock error handling with retry logic and custom exceptions • Add specific exception types • Implement proper retry mechanism • Better error classification • Enhanced logging and validation • Enable embedding retry decorator	2025-11-14 18:51:41 +08:00
yangdx	05852e1ab2	Add max_token_size parameter to embedding function decorators - Add max_token_size=8192 to all embed funcs - Move siliconcloud to deprecated folder - Import wrap_embedding_func_with_attrs - Update EmbeddingFunc docstring - Fix langfuse import type annotation	2025-11-14 18:41:43 +08:00
yangdx	4401f86f07	Refactor exception handling in MemgraphStorage label methods	2025-11-14 11:01:26 +08:00
yangdx	1ccef2b932	Fix null reference errors in graph database error handling - Initialize result vars to None - Add null checks before consume calls - Prevent crashes in except blocks - Apply fix to both Neo4J and Memgraph	2025-11-14 10:39:04 +08:00
yangdx	c164c8f631	Merge branch 'main' of github.com:HKUDS/LightRAG	2025-11-13 20:42:47 +08:00
yangdx	1889301597	Merge branch 'feat/add_cloud_ollama_support'	2025-11-13 20:41:58 +08:00
yangdx	77ad906d3a	Improve error handling and logging in cloud model detection	2025-11-13 20:41:44 +08:00
yangdx	cc031a3db9	Add macOS compatibility check for DOCLING with multi-worker Gunicorn	2025-11-13 19:18:04 +08:00
LacombeLouis	844537e378	Add a better regex	2025-11-13 12:17:51 +01:00
yangdx	a24d8181c2	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety)	2025-11-13 18:58:09 +08:00
yangdx	746c069ab0	Implement lazy configuration initialization for API server • Add lazy config initialization • Maintain backward compatibility • Support programmatic usage • Add gunicorn dependency • Explicit config in entry points	2025-11-13 15:28:05 +08:00
yangdx	4b31942e2a	refactor: move document deps to api group, remove dynamic imports - Merge offline-docs into api extras - Remove pipmaster dynamic installs - Add async document processing - Pre-check docling availability - Update offline deployment docs	2025-11-13 13:34:09 +08:00
yangdx	c230d1a28d	Replace asyncio.iscoroutine with inspect.isawaitable for better detection	2025-11-13 12:56:01 +08:00
yangdx	297e460740	Merge branch 'main' into tongda/main	2025-11-13 12:37:37 +08:00
yangdx	940bec0b31	Support async chunking functions in LightRAG processing pipeline - Add Awaitable and Union type imports - Update chunking_func type annotation - Handle coroutine results with await - Add return type validation - Update docstring for async support	2025-11-13 12:37:15 +08:00
Louis Lacombe	f7432a260e	Add support for environment variable fallback for API key and default host for cloud models	2025-11-12 16:11:05 +00:00
yangdx	70cc2419f2	Fix empty dict handling after JSON sanitization • Replace truthy checks with `is not None` • Handle empty dict edge case properly • Prevent data reload failures • Add comprehensive test coverage • Fix JsonKVStorage and DocStatusStorage	2025-11-12 16:40:57 +08:00
yangdx	dcf1d28681	Fix migration to reload sanitized data and prevent memory corruption • Reload cleaned data after sanitization • Update shared memory with clean data • Add specific surrogate char tests • Test migration sanitization flow • Prevent dirty data in memory	2025-11-12 16:16:28 +08:00
yangdx	6de4123f74	Optimize JSON string sanitization with precompiled regex and zero-copy - Precompile regex pattern at module level - Zero-copy path for clean strings - Use C-level regex for performance - Remove deprecated _sanitize_json_data - Fast detection for common case	2025-11-12 15:42:07 +08:00
yangdx	777c987371	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage	2025-11-12 13:48:56 +08:00
yangdx	8c07c91833	Remove deprecated response_type parameter from query settings - Bump API version to 0254 - Remove response format UI controls - Hard-code response_type in query params - Add migration for version 19 - Clean up settings store structure	2025-11-12 12:19:30 +08:00
yangdx	f28a0c25b1	Improve JSON data sanitization to handle tuples and dict keys - Sanitize dictionary keys - Preserve tuple types - Handle nested structures better	2025-11-12 00:50:18 +08:00
yangdx	6918a88f92	Add specialized JSON string sanitizer to prevent UTF-8 encoding errors • Remove surrogate characters (U+D800-DFFF) • Filter Unicode non-characters • Direct char-by-char filtering	2025-11-12 00:38:47 +08:00
yangdx	d1f4b6e515	Add data sanitization to JSON writing to prevent UTF-8 encoding errors • Add _sanitize_json_data helper function • Recursively clean strings in data • Sanitize before JSON serialization • Prevent encoding-related crashes • Use existing sanitize_text_for_encoding	2025-11-12 00:11:13 +08:00
yangdx	fdcb4d0b6d	Replace PyPDF2 with pypdf for PDF processing - Update import from PyPDF2 to pypdf - Change dependency to pypdf>=6.1.0 - Update all requirements files - Remove PyPDF2 from lock file - Use modern pypdf library	2025-11-11 01:38:09 +08:00
Tong Da	245df75d9c	easier version: detect chunking_func result is coroutine or not	2025-11-10 20:49:50 +08:00
yangdx	913fa1e415	Add concurrency warning for JsonKVStorage in cleanup tool	2025-11-09 23:04:04 +08:00
Tong Da	d137ba5843	support async chunking func to improve processing performance when a heavy `chunking_func` is passed in by user	2025-11-09 14:52:42 +08:00
yangdx	1f9d0735c3	Bump API version to 0253	2025-11-09 14:42:22 +08:00
yangdx	37b7118901	Fix table alignment and add validation for empty cleanup selections	2025-11-09 14:17:56 +08:00
yangdx	1485cb82e9	Add LLM query cache cleanup tool for KV storage backends - Interactive cleanup workflow - Supports all KV storage types - Batch deletion with progress - Comprehensive error reporting - Preserves workspace isolation	2025-11-09 13:37:33 +08:00
yangdx	2f16065256	Refactor keyword_extraction from kwargs to explicit parameter • Add keyword_extraction param to functions • Remove kwargs.pop() calls • Update function signatures • Improve parameter documentation • Make parameter handling consistent	2025-11-09 12:02:17 +08:00
yangdx	88ab73f6ae	HotFix: Restore streaming response in OpenAI LLM The stream and timeout parameters were moved from **kwargs to explicit parameters in a previous commit, but were not being passed to the OpenAI API, causing streaming responses to fail and fall back to non-streaming mode.Fixes the issue where stream=True was being silently ignored, resulting in unexpected non-streaming behavior.	2025-11-09 11:52:26 +08:00
yangdx	7bc6ccea19	Add uv package manager support to installation docs	2025-11-09 04:31:07 +08:00
yangdx	754d2ad297	Add documentation for LLM cache migration between storage types	2025-11-09 00:41:07 +08:00
yangdx	a75efb06dc	Fix: prevent source data corruption by target upsert function • Prevent mutations bugs by using copy() when storing cache values • Protect filtered cache data and ensure batch data isolation	2025-11-09 00:02:19 +08:00
yangdx	987bc09cab	Update LLM cache migration docs and improve UX prompts	2025-11-08 23:48:19 +08:00
yangdx	1a91bcdb5f	Improve storage config validation and add config.ini fallback support • Add MongoDB env requirements • Support config.ini fallback • Warn on missing env vars • Check available storage count • Show config source info	2025-11-08 22:48:49 +08:00
yangdx	57ee7d5ac8	Merge branch 'main' into llm-cache-migrate	2025-11-08 22:15:46 +08:00
yangdx	3d9de5ed03	feat: improve Gemini client error handling and retry logic • Add google-api-core dependency • Add specific exception handling • Create InvalidResponseError class • Update retry decorators • Fix empty response handling	2025-11-08 22:10:09 +08:00

1 2 3 4 5 ...

3619 commits