LightRAG

Author	SHA1	Message	Date
yangdx	0221213b9b	Improve entity summarization with JSONL format and fix tuple delimiters • Convert descriptions to JSONL format • Add token-based truncation helper • Enhance entity name consistency rules • Improve summarization prompt clarity • Fix tuple delimiter corruption patterns	2025-09-12 12:32:08 +08:00
yangdx	1892ed23cc	Change tuple delimiter from <\|SEP\|> to <\|S\|> across codebase • Update prompt instruction clarity • Correct utility function examples • Update regex pattern comments	2025-09-12 08:57:46 +08:00
yangdx	c07bcbff44	Fix tuple delimiter corruption patterns and add missing edge cases	2025-09-12 08:35:37 +08:00
yangdx	8660bf34e4	Add timestamp tracking for LLM responses and entity/relationship data - Track timestamps for cache hits/misses - Add timestamp to entity/relationship objects - Sort descriptions by timestamp order - Preserve temporal ordering in merges	2025-09-12 04:34:12 +08:00
yangdx	40688def20	Refactor tuple delimiter corruption fix into reusable utility function - Extract regex fixes to utils module - Add case-insensitive delimiter handling	2025-09-12 04:10:14 +08:00
yangdx	a49c8e4a0d	Refactor JSON serialization to use newline-separated format - Replace json.dumps with line-by-line format - Apply to entities, relations, text units - Update truncation key functions - Maintain ensure_ascii=False setting - Improve context readability	2025-09-10 11:59:25 +08:00
yangdx	2dd143c935	Refactor conversation history handling to use LLM native message format • Remove get_conversation_turns utility • Pass history_messages to LLM directly • Clean up prompt template formatting	2025-09-10 11:56:58 +08:00
yangdx	09abb656b8	Improve log message formatting for better readability	2025-09-09 17:41:09 +08:00
yangdx	d218f15a62	Refactor entity extraction with system prompts and output limits - Add system/user prompt separation - Set max tokens for endless output fix - Improve extraction error logging - Update cache type from extract to summary	2025-09-08 15:20:45 +08:00
yangdx	c87eb2cfcf	Increase timeout buffers for async function calls • Extend execution timeout buffer to 150s • Extend task duration buffer to 180s • Account for low-level retry delays • Improve health check phase handling • Reduce timeout-related failures	2025-09-06 23:56:24 +08:00
yangdx	6be462511f	Add error prefixing for better debugging context in async operations * Add create_prefixed_exception utility * Prefix entity processing errors * Prefix relationship processing errors * Prefix chunk extraction progress info * Maintain original exception chains	2025-09-05 21:28:00 +08:00
yangdx	2c551cb5db	Add support for Chinese book title marks in normalize_extracted_info	2025-09-04 18:51:57 +08:00
yangdx	9b516a8a53	Hot Fix: Preserve whitespace chars in text sanitization • Keep \t, \n, \r in control char removal	2025-09-04 10:58:29 +08:00
yangdx	a25ce7f078	Fix linting	2025-09-03 21:58:30 +08:00
yangdx	7ef2f0dff6	Add VDB error handling with retries for data consistency - Add safe_vdb_operation_with_exception util - Wrap VDB ops in entity/relationship code - Ensure exceptions propagate on failure - Add retry logic with configurable delays	2025-09-03 21:15:09 +08:00
yangdx	5b2deccbef	Improve text normalization and add entity type capitalization - Capitalize entity types with .title() - Add non-breaking space handling - Add narrow non-breaking space regex	2025-09-02 02:51:41 +08:00
yangdx	e95622ca7b	fix(utils): enhance remove_think_tags to handle orphaned </think> closing tags The function now properly handles cases where text contains </think> closing tags without corresponding <think> opening tags, which can occur due to content truncation or processing errors.	2025-09-01 07:17:30 +08:00
yangdx	c8c59c38b0	Fix entity types configuration to support JSON list parsing - Add JSON parsing for list env vars - Update entity types example format - Add list type support to get_env_value	2025-09-01 00:14:57 +08:00
yangdx	1a015a7015	Add queue_name parameter to priority_limit_async_func_call for better logging • Add queue_name parameter to decorator • Update all log messages with queue names • Pass specific names for LLM and embedding	2025-08-31 23:47:22 +08:00
yangdx	b747417961	feat: enhance text extraction text sanitization and normalization - Improve reduntant quotes in entity and relation name, type and keywords - Add HTML tag cleaning and Chinese symbol conversion - Filter out short numeric content and malformed text - Enhance entity type validation with character filtering	2025-08-31 13:17:20 +08:00
yangdx	d4bbc5dea9	refactor: Merge multi-step text sanitization into single function	2025-08-31 10:36:56 +08:00
yangdx	d7e0701b63	Improve logging setup and add error prefixes for LLM functions - Move logger init to top of file - Add console handler by default - Prefix LLM errors with "[LLM func]" - Update timeout log messages - Comment out pypinyin success log	2025-08-29 14:19:13 +08:00
yangdx	925e631a9a	refac: Add robust time out handling for LLM request	2025-08-29 13:50:35 +08:00
yangdx	99e28e815b	fix: prevent document processing failures from UTF-8 surrogate characters - Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders - Add strict UTF-8 cleaning pipeline to entity/relationship extraction - Skip problematic entities/relationships instead of corrupting data Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)	2025-08-27 23:52:39 +08:00
yangdx	bf43e1b8c1	fix: Resolve default rerank config problem when env var missing - Read config from selected_rerank_func when env var missing - Make api_key optional for rerank function - Add response format validation with proper error handling - Update Cohere rerank default to official API endpoint	2025-08-23 01:07:59 +08:00
yangdx	580cb7906c	feat: Add multiple rerank provider support to LightRAG Server by adding new env vars and cli params - Add --enable-rerank CLI argument and ENABLE_RERANK env var - Simplify rerank configuration logic to only check enable flag and binding - Update health endpoint to show enable_rerank and rerank_configured status - Improve logging messages for rerank enable/disable states - Maintain backward compatibility with default value True	2025-08-22 19:29:45 +08:00
yangdx	b5c230abdd	optimize: avoid duplicate embedding calls in _build_query_context Reduces API costs and improves query performance while maintaining backward compatibility.	2025-08-21 16:49:24 +08:00
yangdx	ced3aef7cb	refactor: simplify text encoding by removing redundant safe_encode_for_llm	2025-08-19 19:37:46 +08:00
yangdx	806081645f	Refactor text cleaning to use sanitize_text_for_encoding consistently • Replace clean_text with sanitize_text • Remove deprecated clean_text function • Add whitespace trimming to sanitizer • Improve UTF-8 encoding safety • Consolidate text cleaning logic	2025-08-19 19:20:01 +08:00
yangdx	f9cf544805	Add text sanitization to prevent UTF-8 encoding errors in LLM calls • Remove surrogate characters • Clean control characters • Sanitize input and history messages • Add comprehensive error handling • Log sanitization activities	2025-08-19 18:50:52 +08:00
yangdx	64015548df	Refactor MD5 hash functions and consolidate Unicode error handling	2025-08-19 17:49:23 +08:00
yangdx	64058c771f	Refactor: Harden `compute_args_hash` against Unicode errors	2025-08-19 17:19:39 +08:00
yangdx	d3fde60938	refactor: remove file_path and created_at from context, improve token truncation - Remove file_path and created_at fields from entity and relationship contexts - Update token truncation to include full JSON serialization instead of content only	2025-08-18 18:30:09 +08:00
yangdx	453efeb924	Fix file path length checking to use UTF-8 byte length instead of char count	2025-08-18 13:59:27 +08:00
yangdx	14e083a1a6	fix: replace pyuca with pypinyin for Chinese pinyin sorting and add file_path sort	2025-08-17 15:21:24 +08:00
yangdx	61469c0a56	Add Chinese pinyin sorting support across document operations • Replace pyuca with centralized utils function • Add pinyin sort keys for file paths • Update MongoDB indexes with zh collation • Migrate existing indexes for compatibility • Support Chinese chars in Redis/JSON storage • Keep PostgreSQL sorting order controled by Database Collate order	2025-08-17 12:45:48 +08:00
yangdx	4a19d0de25	Add chunk tracking system to monitor chunk sources and frequencies • Track chunk sources (E/R/C types) • Log frequency and order metadata • Preserve chunk_id through processing • Add debug logging for chunk tracking • Handle rerank and truncation operations	2025-08-14 22:58:26 +08:00
yangdx	a8b7890470	Rename chunk selection functions for better clarity	2025-08-14 16:01:13 +08:00
yangdx	a11e8d77eb	Improve missing-vector warning logic in vector similarity - Check for any missing vectors - Separate no-vector vs partial-vector warnings - Ensure early return on empty vectors	2025-08-14 14:24:15 +08:00
yangdx	2e5487305e	Merge branch 'main' into pick-trunk-by-vector	2025-08-14 03:12:38 +08:00
yangdx	7fb11193b0	Fix linting	2025-08-14 03:07:29 +08:00
yangdx	331dcf0509	Remove query params from cache key generation for keyword extration	2025-08-14 02:57:39 +08:00
yangdx	3343833571	Remove query params from cache key generation for keyword extration	2025-08-14 02:36:01 +08:00
yangdx	f1dafa0d01	feat: KG related chunks selection by vector similarity - Add env switch to toggle weighted polling vs vector-similarity strategy - Implement similarity-based sorting with fallback to weighted - Introduce batch vector read API for vector storage - Implement vector store and retrive funtion for Nanovector DB - Preserve default behavior (weighted polling selection method)	2025-08-13 18:16:42 +08:00
zrguo	f1c7233763	Avoid UTF-8 BOM	2025-08-12 17:06:54 +08:00
yangdx	0463963520	fix: include all query parameters in LLM cache hash key generation - Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions - Add queryparam field to CacheData structure and PostgreSQL storage for debugging - Update PostgreSQL schema with automatic migration for queryparam JSONB column - Prevent incorrect cache hits between queries with different parameters Fixes issue where different query parameters incorrectly shared the same cached results.	2025-08-05 18:03:10 +08:00
yangdx	cb75e6631e	Remove quantized embedding info from LLM cache - Delete quantize_embedding function - Delete dequantize_embedding function - Remove embedding fields from CacheData - Update save_to_cache to exclude embedding data - Clean up unused quantization-related code	2025-08-05 17:58:34 +08:00
yangdx	32af45ff46	refactor: improve JSON parsing reliability with json-repair library Replace regex-based JSON extraction with json-repair for better handling of malformed LLM responses. Remove deprecated JSON parsing utilities and clean up keyword_extraction parameter across LLM providers. - Remove locate_json_string_body_from_string() and convert_response_to_json() - Use json-repair.loads() in extract_keywords_only() for robust parsing - Clean up LLM interfaces and remove unused parameters - Add json-repair dependency	2025-08-01 19:36:20 +08:00
yangdx	2af8a93dc7	fix: resolve _sort_key error in Redis get_docs_paginated function	2025-07-31 02:16:56 +08:00
yangdx	d0bc5e7c4a	Extend path filter to also cover POST requests	2025-07-31 02:06:56 +08:00

1 2 3 4 5

220 commits