LightRAG

Author	SHA1	Message	Date
yangdx	c87eb2cfcf	Increase timeout buffers for async function calls • Extend execution timeout buffer to 150s • Extend task duration buffer to 180s • Account for low-level retry delays • Improve health check phase handling • Reduce timeout-related failures	2025-09-06 23:56:24 +08:00
yangdx	6be462511f	Add error prefixing for better debugging context in async operations * Add create_prefixed_exception utility * Prefix entity processing errors * Prefix relationship processing errors * Prefix chunk extraction progress info * Maintain original exception chains	2025-09-05 21:28:00 +08:00
yangdx	2c551cb5db	Add support for Chinese book title marks in normalize_extracted_info	2025-09-04 18:51:57 +08:00
yangdx	9b516a8a53	Hot Fix: Preserve whitespace chars in text sanitization • Keep \t, \n, \r in control char removal	2025-09-04 10:58:29 +08:00
yangdx	a25ce7f078	Fix linting	2025-09-03 21:58:30 +08:00
yangdx	7ef2f0dff6	Add VDB error handling with retries for data consistency - Add safe_vdb_operation_with_exception util - Wrap VDB ops in entity/relationship code - Ensure exceptions propagate on failure - Add retry logic with configurable delays	2025-09-03 21:15:09 +08:00
yangdx	5b2deccbef	Improve text normalization and add entity type capitalization - Capitalize entity types with .title() - Add non-breaking space handling - Add narrow non-breaking space regex	2025-09-02 02:51:41 +08:00
yangdx	e95622ca7b	fix(utils): enhance remove_think_tags to handle orphaned </think> closing tags The function now properly handles cases where text contains </think> closing tags without corresponding <think> opening tags, which can occur due to content truncation or processing errors.	2025-09-01 07:17:30 +08:00
yangdx	c8c59c38b0	Fix entity types configuration to support JSON list parsing - Add JSON parsing for list env vars - Update entity types example format - Add list type support to get_env_value	2025-09-01 00:14:57 +08:00
yangdx	1a015a7015	Add queue_name parameter to priority_limit_async_func_call for better logging • Add queue_name parameter to decorator • Update all log messages with queue names • Pass specific names for LLM and embedding	2025-08-31 23:47:22 +08:00
yangdx	b747417961	feat: enhance text extraction text sanitization and normalization - Improve reduntant quotes in entity and relation name, type and keywords - Add HTML tag cleaning and Chinese symbol conversion - Filter out short numeric content and malformed text - Enhance entity type validation with character filtering	2025-08-31 13:17:20 +08:00
yangdx	d4bbc5dea9	refactor: Merge multi-step text sanitization into single function	2025-08-31 10:36:56 +08:00
yangdx	d7e0701b63	Improve logging setup and add error prefixes for LLM functions - Move logger init to top of file - Add console handler by default - Prefix LLM errors with "[LLM func]" - Update timeout log messages - Comment out pypinyin success log	2025-08-29 14:19:13 +08:00
yangdx	925e631a9a	refac: Add robust time out handling for LLM request	2025-08-29 13:50:35 +08:00
yangdx	99e28e815b	fix: prevent document processing failures from UTF-8 surrogate characters - Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders - Add strict UTF-8 cleaning pipeline to entity/relationship extraction - Skip problematic entities/relationships instead of corrupting data Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)	2025-08-27 23:52:39 +08:00
yangdx	bf43e1b8c1	fix: Resolve default rerank config problem when env var missing - Read config from selected_rerank_func when env var missing - Make api_key optional for rerank function - Add response format validation with proper error handling - Update Cohere rerank default to official API endpoint	2025-08-23 01:07:59 +08:00
yangdx	580cb7906c	feat: Add multiple rerank provider support to LightRAG Server by adding new env vars and cli params - Add --enable-rerank CLI argument and ENABLE_RERANK env var - Simplify rerank configuration logic to only check enable flag and binding - Update health endpoint to show enable_rerank and rerank_configured status - Improve logging messages for rerank enable/disable states - Maintain backward compatibility with default value True	2025-08-22 19:29:45 +08:00
yangdx	b5c230abdd	optimize: avoid duplicate embedding calls in _build_query_context Reduces API costs and improves query performance while maintaining backward compatibility.	2025-08-21 16:49:24 +08:00
yangdx	ced3aef7cb	refactor: simplify text encoding by removing redundant safe_encode_for_llm	2025-08-19 19:37:46 +08:00
yangdx	806081645f	Refactor text cleaning to use sanitize_text_for_encoding consistently • Replace clean_text with sanitize_text • Remove deprecated clean_text function • Add whitespace trimming to sanitizer • Improve UTF-8 encoding safety • Consolidate text cleaning logic	2025-08-19 19:20:01 +08:00
yangdx	f9cf544805	Add text sanitization to prevent UTF-8 encoding errors in LLM calls • Remove surrogate characters • Clean control characters • Sanitize input and history messages • Add comprehensive error handling • Log sanitization activities	2025-08-19 18:50:52 +08:00
yangdx	64015548df	Refactor MD5 hash functions and consolidate Unicode error handling	2025-08-19 17:49:23 +08:00
yangdx	64058c771f	Refactor: Harden `compute_args_hash` against Unicode errors	2025-08-19 17:19:39 +08:00
yangdx	d3fde60938	refactor: remove file_path and created_at from context, improve token truncation - Remove file_path and created_at fields from entity and relationship contexts - Update token truncation to include full JSON serialization instead of content only	2025-08-18 18:30:09 +08:00
yangdx	453efeb924	Fix file path length checking to use UTF-8 byte length instead of char count	2025-08-18 13:59:27 +08:00
yangdx	14e083a1a6	fix: replace pyuca with pypinyin for Chinese pinyin sorting and add file_path sort	2025-08-17 15:21:24 +08:00
yangdx	61469c0a56	Add Chinese pinyin sorting support across document operations • Replace pyuca with centralized utils function • Add pinyin sort keys for file paths • Update MongoDB indexes with zh collation • Migrate existing indexes for compatibility • Support Chinese chars in Redis/JSON storage • Keep PostgreSQL sorting order controled by Database Collate order	2025-08-17 12:45:48 +08:00
yangdx	4a19d0de25	Add chunk tracking system to monitor chunk sources and frequencies • Track chunk sources (E/R/C types) • Log frequency and order metadata • Preserve chunk_id through processing • Add debug logging for chunk tracking • Handle rerank and truncation operations	2025-08-14 22:58:26 +08:00
yangdx	a8b7890470	Rename chunk selection functions for better clarity	2025-08-14 16:01:13 +08:00
yangdx	a11e8d77eb	Improve missing-vector warning logic in vector similarity - Check for any missing vectors - Separate no-vector vs partial-vector warnings - Ensure early return on empty vectors	2025-08-14 14:24:15 +08:00
yangdx	2e5487305e	Merge branch 'main' into pick-trunk-by-vector	2025-08-14 03:12:38 +08:00
yangdx	7fb11193b0	Fix linting	2025-08-14 03:07:29 +08:00
yangdx	331dcf0509	Remove query params from cache key generation for keyword extration	2025-08-14 02:57:39 +08:00
yangdx	3343833571	Remove query params from cache key generation for keyword extration	2025-08-14 02:36:01 +08:00
yangdx	f1dafa0d01	feat: KG related chunks selection by vector similarity - Add env switch to toggle weighted polling vs vector-similarity strategy - Implement similarity-based sorting with fallback to weighted - Introduce batch vector read API for vector storage - Implement vector store and retrive funtion for Nanovector DB - Preserve default behavior (weighted polling selection method)	2025-08-13 18:16:42 +08:00
zrguo	f1c7233763	Avoid UTF-8 BOM	2025-08-12 17:06:54 +08:00
yangdx	0463963520	fix: include all query parameters in LLM cache hash key generation - Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions - Add queryparam field to CacheData structure and PostgreSQL storage for debugging - Update PostgreSQL schema with automatic migration for queryparam JSONB column - Prevent incorrect cache hits between queries with different parameters Fixes issue where different query parameters incorrectly shared the same cached results.	2025-08-05 18:03:10 +08:00
yangdx	cb75e6631e	Remove quantized embedding info from LLM cache - Delete quantize_embedding function - Delete dequantize_embedding function - Remove embedding fields from CacheData - Update save_to_cache to exclude embedding data - Clean up unused quantization-related code	2025-08-05 17:58:34 +08:00
yangdx	32af45ff46	refactor: improve JSON parsing reliability with json-repair library Replace regex-based JSON extraction with json-repair for better handling of malformed LLM responses. Remove deprecated JSON parsing utilities and clean up keyword_extraction parameter across LLM providers. - Remove locate_json_string_body_from_string() and convert_response_to_json() - Use json-repair.loads() in extract_keywords_only() for robust parsing - Clean up LLM interfaces and remove unused parameters - Add json-repair dependency	2025-08-01 19:36:20 +08:00
yangdx	2af8a93dc7	fix: resolve _sort_key error in Redis get_docs_paginated function	2025-07-31 02:16:56 +08:00
yangdx	d0bc5e7c4a	Extend path filter to also cover POST requests	2025-07-31 02:06:56 +08:00
yangdx	3e5efd0b27	Add /documents/paginated to filtered logging paths	2025-07-31 02:00:00 +08:00
yangdx	6014b9bf73	feat: add track_id support for document processing progress monitoring - Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON) - Implement automatic track_id generation with upload_/insert_ prefixes - Add /track_status/{track_id} API endpoint for frontend progress queries - Create database indexes for efficient track_id lookups - Enable real-time document processing status tracking across all storage types	2025-07-29 22:24:21 +08:00
yangdx	9923821d75	refactor: Remove deprecated `max_token_size` from embedding configuration This parameter is no longer used. Its removal simplifies the API and clarifies that token length management is handled by upstream text chunking logic rather than the embedding wrapper.	2025-07-29 10:49:35 +08:00
yangdx	e09929b42e	Refine rerank filtering log message for clarity	2025-07-27 16:57:38 +08:00
yangdx	f4bca7bfb2	Fix linting	2025-07-27 16:50:45 +08:00
yangdx	a9565d7379	feat: Skip rerank filtering when `min_rerank_score` is 0.0	2025-07-27 16:50:12 +08:00
yangdx	ebaff228aa	feat: Add rerank score filtering with configurable threshold - Add DEFAULT_MIN_RERANK_SCORE constant (default: 0.0) - Add MIN_RERANK_SCORE environment variable support - Filter chunks with rerank scores below threshold in process_chunks_unified - Add info-level logging for filtering operations - Handle empty results gracefully after filtering - Maintain backward compatibility with non-reranked chunks	2025-07-27 16:37:44 +08:00
yangdx	a67f93acc9	Replace hardcoded max tokens with DEFAULT_MAX_TOTAL_TOKENS constant - Use constant in process_chunks_unified - Update WebUI default to match (32000)	2025-07-26 11:23:54 +08:00
yangdx	7b915b34f6	Refactor: move build_file_path function from operate.py to utils.py	2025-07-26 10:52:59 +08:00

1 2 3 4 5 ...

261 commits