Commit graph

589 commits

Author SHA1 Message Date
yangdx
7c463f0fb5 Change entity type formatting from title case to lowercase without spaces 2025-09-21 00:56:56 +08:00
yangdx
77569ddea2 Add chunk key to entity extraction logging output 2025-09-17 02:21:11 +08:00
yangdx
0e8d973d44 Shorten progress prefix in entity extraction error messages 2025-09-16 15:48:37 +08:00
yangdx
ecaee43788 Add error handling with chunk ID prefixing in entity extraction 2025-09-16 13:41:49 +08:00
yangdx
37d01e2df8 fix: Ensures complete metadata (source_id, created_at, file_path) is preserved in aquery_data responses 2025-09-15 03:45:09 +08:00
yangdx
e71229698d refactor: centralize metadata generation in query functions
- Remove processing_info generation from _convert_to_user_format function
- Move all metadata generation (keywords, processing_info) to kg_query and naive_query functions
- Simplify _convert_to_user_format to focus only on data format conversion
2025-09-15 03:11:07 +08:00
yangdx
c0d5abba6b Fix linting 2025-09-15 02:59:21 +08:00
yangdx
b1c8206346 Add aquery_data endpoint for structured retrieval without LLM generation
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
82a67354d0 Code formatting improvements and style consistency fixes
* Remove trailing whitespace
* Fix function signature ellipsis style
2025-09-14 17:49:02 +08:00
yangdx
87bb8a023b Fix tuple delimiter regex patterns and add debug logging
- Add debug logs for malformed records
- Fix regex for consecutive delimiters
- Handle missing closing brackets
2025-09-14 17:29:27 +08:00
yangdx
4de1473875 Improve entity extraction prompts and error message formatting
• Fix typo in error log message
• Clarify format requirements in prompts
• Make extraction instructions clearer
• Improve user prompt consistency
2025-09-14 13:45:59 +08:00
yangdx
20c5127c7c Merge branch 'optimize-extraction' into return-data-only 2025-09-14 12:33:37 +08:00
yangdx
619553021e Fix delimiter processing and optimize case-sensitive handling
• Fix completion_delimiter reference bug
• Add case check before lowercase conversion
• Improve delimiter corruption handling
• Optimize redundant processing logic
2025-09-14 12:23:48 +08:00
yangdx
fd48afdb00 Use "relation" instead of "relationship" in extration prompt, and support both format for safty 2025-09-14 11:43:35 +08:00
yangdx
1dc96f3959 Merge branch 'optimize-extraction' into return-data-only 2025-09-14 05:37:48 +08:00
yangdx
b820d8d588 Fix entity/relationship record parsing in extraction result processing 2025-09-14 05:35:01 +08:00
yangdx
4f5ad76c2c Add entity vector database upsert for newly added entities by edges upserts 2025-09-14 05:04:45 +08:00
yangdx
7cc2b69bcf Fix linting 2025-09-14 05:02:02 +08:00
yangdx
cddd81a86c Fix LLM output format errors in extraction result processing
- Handle tuple_delimiter as record separator
- Add format validation and correction
- Add warning for format errors
2025-09-14 04:13:01 +08:00
yangdx
2686fc526e Change entity type from CreativeWork to Content and update delimiter
• Replace CreativeWork with Content type
• Improve LLM output error messages
• Update prompt for binary relationships
• Fix delimiter corruption examples
2025-09-14 00:55:15 +08:00
yangdx
0ffb5d5f2d Replace search API with aquery_data for consistent raw data retrieval, mirroring aquery results
• Reuse existing query logic paths and remove kg_search function entirely
• Update kg_query/naive_query to return raw data as needed
2025-09-13 15:30:29 +08:00
yangdx
4ce5f9014c Improve error messages in entity and relationship extraction 2025-09-13 11:20:03 +08:00
yangdx
9a2e8be5a7 Fix extraction validation and delimiter comment accuracy
• Change < to != for exact length check
• Fix entity validation from 4 to exact 4
• Fix relationship validation to exact 5
• Correct delimiter comment example
2025-09-12 18:13:25 +08:00
yangdx
69ca447f45 Sort description by timestamp then description length to improves merge consistency 2025-09-12 13:59:26 +08:00
yangdx
0221213b9b Improve entity summarization with JSONL format and fix tuple delimiters
• Convert descriptions to JSONL format
• Add token-based truncation helper
• Enhance entity name consistency rules
• Improve summarization prompt clarity
• Fix tuple delimiter corruption patterns
2025-09-12 12:32:08 +08:00
yangdx
1892ed23cc Change tuple delimiter from <|SEP|> to <|S|> across codebase
• Update prompt instruction clarity
• Correct utility function examples
• Update regex pattern comments
2025-09-12 08:57:46 +08:00
yangdx
8660bf34e4 Add timestamp tracking for LLM responses and entity/relationship data
- Track timestamps for cache hits/misses
- Add timestamp to entity/relationship objects
- Sort descriptions by timestamp order
- Preserve temporal ordering in merges
2025-09-12 04:34:12 +08:00
yangdx
40688def20 Refactor tuple delimiter corruption fix into reusable utility function
- Extract regex fixes to utils module
- Add case-insensitive delimiter handling
2025-09-12 04:10:14 +08:00
yangdx
b9f80263b8 Simplify tuple delimiter regex patterns for LLM output fixing
• Consolidate 6 regex patterns into 3
• More efficient pattern matching
• Clearer comments and examples
• Same functionality, less code
• Better maintainability
2025-09-12 00:56:40 +08:00
yangdx
78eadc1d6c Rename function to clarify rebuild vs process extraction contexts 2025-09-11 23:21:27 +08:00
yangdx
4ce823b4dd Handle empty context in mix mode and improve query logging 2025-09-11 18:58:37 +08:00
yangdx
c8a17f7ea5 Improve extraction failure log message formatting and consistency 2025-09-11 14:03:21 +08:00
yangdx
7f83a58497 Refactor extraction delimiters from ## to newlines and change tuple delimiter to <|SEP|>
• Add robust delimiter fixing logic
• Update prompts for single-line format
2025-09-11 13:44:44 +08:00
yangdx
7fe47fac84 Fix linting 2025-09-10 18:38:21 +08:00
yangdx
db6bba80c9 Log all merges at appropriate level 2025-09-10 18:37:13 +08:00
yangdx
a4bfdb7ddf Fix logging condition to show merges even when no fragments exist if LLM is used 2025-09-10 18:22:10 +08:00
yangdx
02e7462645 feat: enhance LLM output format tolerance for bracket processing
- Expand bracket tolerance to support additional characters: < > " '
- Implement symmetric handling for both leading and trailing characters
- Replace simple string matching with robust regex-based pattern detection
- Maintain full backward compatibility with existing bracket formats
2025-09-10 18:10:06 +08:00
yangdx
00de0a4be8 Handle backtick-wrapped brackets in extraction result parsing
* Support `( and `( start patterns
* Support )` and )` end patterns
* Graceful fallback to warning logs
* Strip 2 chars for backtick variants
* Maintain existing bracket logic
2025-09-10 17:15:03 +08:00
yangdx
19014c6471 feat: enhance entity/relationship merging with description length comparison
- Implement description length comparison in gleaning merge logic (extract_entities)
- Apply same logic to knowledge graph reconstruction (_rebuild_knowledge_from_chunks)
- Prioritize entities/relationships with longer descriptions for better quality
- Use list() instead of extend() for performance optimization when replacing
2025-09-10 17:06:57 +08:00
yangdx
e3ebf45a18 Add logging for missing brackets in extraction result processing 2025-09-10 16:10:42 +08:00
yangdx
24242c5bb8 Fix indentation for logging and status updates in merge functions 2025-09-10 15:26:35 +08:00
yangdx
c4506438cd Only log merge messages when there are existing fragments to merge 2025-09-10 15:14:33 +08:00
yangdx
a49c8e4a0d Refactor JSON serialization to use newline-separated format
- Replace json.dumps with line-by-line format
- Apply to entities, relations, text units
- Update truncation key functions
- Maintain ensure_ascii=False setting
- Improve context readability
2025-09-10 11:59:25 +08:00
yangdx
2dd143c935 Refactor conversation history handling to use LLM native message format
• Remove get_conversation_turns utility
• Pass history_messages to LLM directly
• Clean up prompt template formatting
2025-09-10 11:56:58 +08:00
yangdx
e078ab7103 Fix cache handling and context return logic for query parameters
• Skip cache when only_need_prompt is set
• Update only_need_context condition logic
• Prevent cache bypass in prompt-only mode
2025-09-10 11:31:48 +08:00
yangdx
6774058670 Merge branch 'main' into tongda/main 2025-09-09 22:43:17 +08:00
yangdx
077d9be5d7 Add Deepseek Style Chain of Thought (CoT) Support for OpenAI Compatible LLM providers
- Add enable_cot parameter to all LLM APIs
- Implement CoT for OpenAI with <think> tags
- Log warnings for unsupported providers
- Enable CoT in query operations
- Handle streaming and non-streaming CoT
2025-09-09 22:34:36 +08:00
yangdx
3477e9f919 Merge branch 'main' into tongda/main 2025-09-09 18:27:56 +08:00
yangdx
09abb656b8 Improve log message formatting for better readability 2025-09-09 17:41:09 +08:00
yangdx
d218f15a62 Refactor entity extraction with system prompts and output limits
- Add system/user prompt separation
- Set max tokens for endless output fix
- Improve extraction error logging
- Update cache type from extract to summary
2025-09-08 15:20:45 +08:00