Commit graph

239 commits

Author SHA1 Message Date
NeelM0906
f6d1fb98ac Fix Linting errors 2025-10-09 16:52:22 -04:00
yangdx
a528213210 Fix logging filter logic
• Fix boolean operator precedence in filter
• Consolidate GET/POST condition logic
2025-09-26 19:42:33 +08:00
yangdx
8cd4139cbf refactor: fix double query problem by add aquery_llm function for consistent response handling
- Add new aquery_llm/query_llm methods providing structured responses
- Consolidate /query and /query/stream endpoints to use unified aquery_llm
- Optimize cache handling by moving cache checks before LLM calls
2025-09-26 19:05:03 +08:00
yangdx
5eb4a4b799 feat: simplify citations, add reference merging, and restructure API response format 2025-09-24 14:30:10 +08:00
yangdx
6e2eab5c23 Add ID fields to entities, relations, and chunks in raw data query results 2025-09-21 23:31:35 +08:00
yangdx
18e886d7e9 Improve context item identification with meaningful IDs
- Add EN prefix to entitie IDs
- Add RE prefix to relation IDs
-Add DC prefix chunk IDs
- Enhance traceability across contexts
2025-09-21 20:19:14 +08:00
yangdx
37d01e2df8 fix: Ensures complete metadata (source_id, created_at, file_path) is preserved in aquery_data responses 2025-09-15 03:45:09 +08:00
yangdx
e71229698d refactor: centralize metadata generation in query functions
- Remove processing_info generation from _convert_to_user_format function
- Move all metadata generation (keywords, processing_info) to kg_query and naive_query functions
- Simplify _convert_to_user_format to focus only on data format conversion
2025-09-15 03:11:07 +08:00
yangdx
b1c8206346 Add aquery_data endpoint for structured retrieval without LLM generation
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
82a67354d0 Code formatting improvements and style consistency fixes
* Remove trailing whitespace
* Fix function signature ellipsis style
2025-09-14 17:49:02 +08:00
yangdx
87bb8a023b Fix tuple delimiter regex patterns and add debug logging
- Add debug logs for malformed records
- Fix regex for consecutive delimiters
- Handle missing closing brackets
2025-09-14 17:29:27 +08:00
yangdx
70fee5bbeb Fix syntax warning by removin examples from fix_tuple_delimiter_corruption docstring 2025-09-14 12:37:21 +08:00
yangdx
20c5127c7c Merge branch 'optimize-extraction' into return-data-only 2025-09-14 12:33:37 +08:00
yangdx
ff705a2323 Fix tuple delimiter corruption when missing closing bracket, Handle <|#: -> <|#|> pattern 2025-09-14 11:44:21 +08:00
yangdx
1dc96f3959 Merge branch 'optimize-extraction' into return-data-only 2025-09-14 05:37:48 +08:00
yangdx
2686fc526e Change entity type from CreativeWork to Content and update delimiter
• Replace CreativeWork with Content type
• Improve LLM output error messages
• Update prompt for binary relationships
• Fix delimiter corruption examples
2025-09-14 00:55:15 +08:00
yangdx
0ffb5d5f2d Replace search API with aquery_data for consistent raw data retrieval, mirroring aquery results
• Reuse existing query logic paths and remove kg_search function entirely
• Update kg_query/naive_query to return raw data as needed
2025-09-13 15:30:29 +08:00
yangdx
8088b7e07a Fix tuple delimiter corruption handling and update documentation 2025-09-12 18:03:37 +08:00
yangdx
8a3e2c03a9 Fix tuple delimiter corruption patterns with pipes and brackets
- Handle <||S||> malformed delimiters
- Fix <||> empty pipe sequences
- Repair <|| incomplete patterns
- Process ||S|| missing brackets
- Improve delimiter normalization
2025-09-12 17:45:32 +08:00
yangdx
0221213b9b Improve entity summarization with JSONL format and fix tuple delimiters
• Convert descriptions to JSONL format
• Add token-based truncation helper
• Enhance entity name consistency rules
• Improve summarization prompt clarity
• Fix tuple delimiter corruption patterns
2025-09-12 12:32:08 +08:00
yangdx
1892ed23cc Change tuple delimiter from <|SEP|> to <|S|> across codebase
• Update prompt instruction clarity
• Correct utility function examples
• Update regex pattern comments
2025-09-12 08:57:46 +08:00
yangdx
c07bcbff44 Fix tuple delimiter corruption patterns and add missing edge cases 2025-09-12 08:35:37 +08:00
yangdx
8660bf34e4 Add timestamp tracking for LLM responses and entity/relationship data
- Track timestamps for cache hits/misses
- Add timestamp to entity/relationship objects
- Sort descriptions by timestamp order
- Preserve temporal ordering in merges
2025-09-12 04:34:12 +08:00
yangdx
40688def20 Refactor tuple delimiter corruption fix into reusable utility function
- Extract regex fixes to utils module
- Add case-insensitive delimiter handling
2025-09-12 04:10:14 +08:00
yangdx
a49c8e4a0d Refactor JSON serialization to use newline-separated format
- Replace json.dumps with line-by-line format
- Apply to entities, relations, text units
- Update truncation key functions
- Maintain ensure_ascii=False setting
- Improve context readability
2025-09-10 11:59:25 +08:00
yangdx
2dd143c935 Refactor conversation history handling to use LLM native message format
• Remove get_conversation_turns utility
• Pass history_messages to LLM directly
• Clean up prompt template formatting
2025-09-10 11:56:58 +08:00
yangdx
09abb656b8 Improve log message formatting for better readability 2025-09-09 17:41:09 +08:00
yangdx
d218f15a62 Refactor entity extraction with system prompts and output limits
- Add system/user prompt separation
- Set max tokens for endless output fix
- Improve extraction error logging
- Update cache type from extract to summary
2025-09-08 15:20:45 +08:00
yangdx
c87eb2cfcf Increase timeout buffers for async function calls
• Extend execution timeout buffer to 150s
• Extend task duration buffer to 180s
• Account for low-level retry delays
• Improve health check phase handling
• Reduce timeout-related failures
2025-09-06 23:56:24 +08:00
yangdx
6be462511f Add error prefixing for better debugging context in async operations
* Add create_prefixed_exception utility
* Prefix entity processing errors
* Prefix relationship processing errors
* Prefix chunk extraction progress info
* Maintain original exception chains
2025-09-05 21:28:00 +08:00
yangdx
2c551cb5db Add support for Chinese book title marks in normalize_extracted_info 2025-09-04 18:51:57 +08:00
yangdx
9b516a8a53 Hot Fix: Preserve whitespace chars in text sanitization
• Keep \t, \n, \r in control char removal
2025-09-04 10:58:29 +08:00
yangdx
a25ce7f078 Fix linting 2025-09-03 21:58:30 +08:00
yangdx
7ef2f0dff6 Add VDB error handling with retries for data consistency
- Add safe_vdb_operation_with_exception util
- Wrap VDB ops in entity/relationship code
- Ensure exceptions propagate on failure
- Add retry logic with configurable delays
2025-09-03 21:15:09 +08:00
yangdx
5b2deccbef Improve text normalization and add entity type capitalization
- Capitalize entity types with .title()
- Add non-breaking space handling
- Add narrow non-breaking space regex
2025-09-02 02:51:41 +08:00
yangdx
e95622ca7b fix(utils): enhance remove_think_tags to handle orphaned </think> closing tags
The function now properly handles cases where text contains </think> closing tags
without corresponding <think> opening tags, which can occur due to content
truncation or processing errors.
2025-09-01 07:17:30 +08:00
yangdx
c8c59c38b0 Fix entity types configuration to support JSON list parsing
- Add JSON parsing for list env vars
- Update entity types example format
- Add list type support to get_env_value
2025-09-01 00:14:57 +08:00
yangdx
1a015a7015 Add queue_name parameter to priority_limit_async_func_call for better logging
• Add queue_name parameter to decorator
• Update all log messages with queue names
• Pass specific names for LLM and embedding
2025-08-31 23:47:22 +08:00
yangdx
b747417961 feat: enhance text extraction text sanitization and normalization
- Improve reduntant quotes in entity and relation name, type and keywords
- Add HTML tag cleaning and Chinese symbol conversion
- Filter out short numeric content and malformed text
- Enhance entity type validation with character filtering
2025-08-31 13:17:20 +08:00
yangdx
d4bbc5dea9 refactor: Merge multi-step text sanitization into single function 2025-08-31 10:36:56 +08:00
yangdx
d7e0701b63 Improve logging setup and add error prefixes for LLM functions
- Move logger init to top of file
- Add console handler by default
- Prefix LLM errors with "[LLM func]"
- Update timeout log messages
- Comment out pypinyin success log
2025-08-29 14:19:13 +08:00
yangdx
925e631a9a refac: Add robust time out handling for LLM request 2025-08-29 13:50:35 +08:00
yangdx
99e28e815b fix: prevent document processing failures from UTF-8 surrogate characters
- Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders
- Add strict UTF-8 cleaning pipeline to entity/relationship extraction
- Skip problematic entities/relationships instead of corrupting data

Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)
2025-08-27 23:52:39 +08:00
yangdx
bf43e1b8c1 fix: Resolve default rerank config problem when env var missing
- Read config from selected_rerank_func when env var missing
- Make api_key optional for rerank function
- Add response format validation with proper error handling
- Update Cohere rerank default to official API endpoint
2025-08-23 01:07:59 +08:00
yangdx
580cb7906c feat: Add multiple rerank provider support to LightRAG Server by adding new env vars and cli params
- Add --enable-rerank CLI argument and ENABLE_RERANK env var
- Simplify rerank configuration logic to only check enable flag and binding
- Update health endpoint to show enable_rerank and rerank_configured status
- Improve logging messages for rerank enable/disable states
- Maintain backward compatibility with default value True
2025-08-22 19:29:45 +08:00
yangdx
b5c230abdd optimize: avoid duplicate embedding calls in _build_query_context
Reduces API costs and improves query performance while maintaining backward compatibility.
2025-08-21 16:49:24 +08:00
yangdx
ced3aef7cb refactor: simplify text encoding by removing redundant safe_encode_for_llm 2025-08-19 19:37:46 +08:00
yangdx
806081645f Refactor text cleaning to use sanitize_text_for_encoding consistently
• Replace clean_text with sanitize_text
• Remove deprecated clean_text function
• Add whitespace trimming to sanitizer
• Improve UTF-8 encoding safety
• Consolidate text cleaning logic
2025-08-19 19:20:01 +08:00
yangdx
f9cf544805 Add text sanitization to prevent UTF-8 encoding errors in LLM calls
• Remove surrogate characters
• Clean control characters
• Sanitize input and history messages
• Add comprehensive error handling
• Log sanitization activities
2025-08-19 18:50:52 +08:00
yangdx
64015548df Refactor MD5 hash functions and consolidate Unicode error handling 2025-08-19 17:49:23 +08:00