yangdx
9a2e8be5a7
Fix extraction validation and delimiter comment accuracy
...
• Change < to != for exact length check
• Fix entity validation from 4 to exact 4
• Fix relationship validation to exact 5
• Correct delimiter comment example
2025-09-12 18:13:25 +08:00
yangdx
69ca447f45
Sort description by timestamp then description length to improves merge consistency
2025-09-12 13:59:26 +08:00
yangdx
0221213b9b
Improve entity summarization with JSONL format and fix tuple delimiters
...
• Convert descriptions to JSONL format
• Add token-based truncation helper
• Enhance entity name consistency rules
• Improve summarization prompt clarity
• Fix tuple delimiter corruption patterns
2025-09-12 12:32:08 +08:00
yangdx
1892ed23cc
Change tuple delimiter from <|SEP|> to <|S|> across codebase
...
• Update prompt instruction clarity
• Correct utility function examples
• Update regex pattern comments
2025-09-12 08:57:46 +08:00
yangdx
8660bf34e4
Add timestamp tracking for LLM responses and entity/relationship data
...
- Track timestamps for cache hits/misses
- Add timestamp to entity/relationship objects
- Sort descriptions by timestamp order
- Preserve temporal ordering in merges
2025-09-12 04:34:12 +08:00
yangdx
40688def20
Refactor tuple delimiter corruption fix into reusable utility function
...
- Extract regex fixes to utils module
- Add case-insensitive delimiter handling
2025-09-12 04:10:14 +08:00
yangdx
b9f80263b8
Simplify tuple delimiter regex patterns for LLM output fixing
...
• Consolidate 6 regex patterns into 3
• More efficient pattern matching
• Clearer comments and examples
• Same functionality, less code
• Better maintainability
2025-09-12 00:56:40 +08:00
yangdx
78eadc1d6c
Rename function to clarify rebuild vs process extraction contexts
2025-09-11 23:21:27 +08:00
yangdx
4ce823b4dd
Handle empty context in mix mode and improve query logging
2025-09-11 18:58:37 +08:00
yangdx
c8a17f7ea5
Improve extraction failure log message formatting and consistency
2025-09-11 14:03:21 +08:00
yangdx
7f83a58497
Refactor extraction delimiters from ## to newlines and change tuple delimiter to <|SEP|>
...
• Add robust delimiter fixing logic
• Update prompts for single-line format
2025-09-11 13:44:44 +08:00
yangdx
7fe47fac84
Fix linting
2025-09-10 18:38:21 +08:00
yangdx
db6bba80c9
Log all merges at appropriate level
2025-09-10 18:37:13 +08:00
yangdx
a4bfdb7ddf
Fix logging condition to show merges even when no fragments exist if LLM is used
2025-09-10 18:22:10 +08:00
yangdx
02e7462645
feat: enhance LLM output format tolerance for bracket processing
...
- Expand bracket tolerance to support additional characters: < > " '
- Implement symmetric handling for both leading and trailing characters
- Replace simple string matching with robust regex-based pattern detection
- Maintain full backward compatibility with existing bracket formats
2025-09-10 18:10:06 +08:00
yangdx
00de0a4be8
Handle backtick-wrapped brackets in extraction result parsing
...
* Support `( and `( start patterns
* Support )` and )` end patterns
* Graceful fallback to warning logs
* Strip 2 chars for backtick variants
* Maintain existing bracket logic
2025-09-10 17:15:03 +08:00
yangdx
19014c6471
feat: enhance entity/relationship merging with description length comparison
...
- Implement description length comparison in gleaning merge logic (extract_entities)
- Apply same logic to knowledge graph reconstruction (_rebuild_knowledge_from_chunks)
- Prioritize entities/relationships with longer descriptions for better quality
- Use list() instead of extend() for performance optimization when replacing
2025-09-10 17:06:57 +08:00
yangdx
e3ebf45a18
Add logging for missing brackets in extraction result processing
2025-09-10 16:10:42 +08:00
yangdx
24242c5bb8
Fix indentation for logging and status updates in merge functions
2025-09-10 15:26:35 +08:00
yangdx
c4506438cd
Only log merge messages when there are existing fragments to merge
2025-09-10 15:14:33 +08:00
yangdx
a49c8e4a0d
Refactor JSON serialization to use newline-separated format
...
- Replace json.dumps with line-by-line format
- Apply to entities, relations, text units
- Update truncation key functions
- Maintain ensure_ascii=False setting
- Improve context readability
2025-09-10 11:59:25 +08:00
yangdx
2dd143c935
Refactor conversation history handling to use LLM native message format
...
• Remove get_conversation_turns utility
• Pass history_messages to LLM directly
• Clean up prompt template formatting
2025-09-10 11:56:58 +08:00
yangdx
e078ab7103
Fix cache handling and context return logic for query parameters
...
• Skip cache when only_need_prompt is set
• Update only_need_context condition logic
• Prevent cache bypass in prompt-only mode
2025-09-10 11:31:48 +08:00
yangdx
6774058670
Merge branch 'main' into tongda/main
2025-09-09 22:43:17 +08:00
yangdx
077d9be5d7
Add Deepseek Style Chain of Thought (CoT) Support for OpenAI Compatible LLM providers
...
- Add enable_cot parameter to all LLM APIs
- Implement CoT for OpenAI with <think> tags
- Log warnings for unsupported providers
- Enable CoT in query operations
- Handle streaming and non-streaming CoT
2025-09-09 22:34:36 +08:00
yangdx
3477e9f919
Merge branch 'main' into tongda/main
2025-09-09 18:27:56 +08:00
yangdx
09abb656b8
Improve log message formatting for better readability
2025-09-09 17:41:09 +08:00
yangdx
d218f15a62
Refactor entity extraction with system prompts and output limits
...
- Add system/user prompt separation
- Set max tokens for endless output fix
- Improve extraction error logging
- Update cache type from extract to summary
2025-09-08 15:20:45 +08:00
yangdx
0a62f02e84
Improve edge logging format and exception prefixes
2025-09-06 08:35:52 +08:00
yangdx
6be462511f
Add error prefixing for better debugging context in async operations
...
* Add create_prefixed_exception utility
* Prefix entity processing errors
* Prefix relationship processing errors
* Prefix chunk extraction progress info
* Maintain original exception chains
2025-09-05 21:28:00 +08:00
yangdx
385668dec5
Fix malformed tuple delimiters in extraction result processing
2025-09-05 17:14:42 +08:00
yangdx
83b54975a2
fix: resolve "Task exception was never retrieved" warnings in async task handling
...
- Handle multiple simultaneous exceptions correctly
- Maintain fast-fail behavior while ensuring proper exception cleanup to
prevent asyncio warnings
2025-09-04 12:40:41 +08:00
yangdx
7ef2f0dff6
Add VDB error handling with retries for data consistency
...
- Add safe_vdb_operation_with_exception util
- Wrap VDB ops in entity/relationship code
- Ensure exceptions propagate on failure
- Add retry logic with configurable delays
2025-09-03 21:15:09 +08:00
yangdx
c86f863fa4
feat: optimize entity extraction for smaller LLMs
...
Simplify entity relationship extraction process to improve compatibility
and performance with smaller, less capable language models.
Changes:
- Remove iterative gleaning loop with LLM-based continuation decisions
- Simplify to single gleaning pass when entity_extract_max_gleaning > 0
- Streamline entity extraction prompts with clearer instructions
- Add explicit completion delimiter signals in all examples
2025-09-03 10:33:01 +08:00
yangdx
5b2deccbef
Improve text normalization and add entity type capitalization
...
- Capitalize entity types with .title()
- Add non-breaking space handling
- Add narrow non-breaking space regex
2025-09-02 02:51:41 +08:00
yangdx
3f8a9abe7e
Refactor extraction result processing to reduce code duplication
...
• Extract shared processing logic
• Add delimiter pattern fixes
• Improve bracket standardization
2025-09-02 01:22:29 +08:00
yangdx
3cdc98f366
Improve extraction parsing with better bracket handling and delimiter fixes
...
• Standardize Chinese/English brackets
• Fix incomplete tuple delimiters
• Remove duplicate delimiter fix code
• Support mixed bracket formats
• Enhance record parsing robustness
2025-09-02 00:26:04 +08:00
yangdx
8bbf307aeb
Fix regex to match multiline content in extraction parsing
...
• Remove non-greedy quantifier
• Add DOTALL flag for multiline matching
• Apply to both parsing functions
• Enable cross-line content extraction
2025-09-01 10:35:06 +08:00
yangdx
7baeb186c6
Fix regex to use non-greedy matching for parentheses extraction
2025-09-01 10:10:45 +08:00
Tong Da
dc7ce98c7e
Add search interface to lightrag.
2025-09-01 02:40:40 +08:00
Tong Da
14fe3e4387
remove unused import
2025-09-01 02:24:56 +08:00
Tong Da
a60a8704ba
Add search method to lightrag. Search is for retrieve structured objects (entities, relations, chunks) in their raw data format.
2025-09-01 02:19:58 +08:00
yangdx
5fd7682f16
Fix LLM output instability for <|> tuple delimiter
...
- Replace <||> with <|>
- Replace < | > with <|>
- Apply fix in both functions
- Handle delimiter variations
- Improve parsing reliability
2025-09-01 01:22:27 +08:00
yangdx
4e751e0653
refac: Enhance extraction with improved prompts and parser
...
- **Prompts**: Restructured prompts with clearer steps and quality guidelines. Simplified the relationship tuple by removing `relationship_strength`
- **Model**: Updated default entity types to be more comprehensive and consistently capitalized (e.g., `Location`, `Product`)
2025-08-31 22:24:11 +08:00
yangdx
75de40da41
Fix typo in relationship extraction log messages
2025-08-31 17:45:16 +08:00
yangdx
97c9600085
Improve extraction error handling and field validation
...
• Add field count validation warnings
• Fix relationship field count (5→6)
• Change error logs to warnings
2025-08-31 17:33:42 +08:00
yangdx
b747417961
feat: enhance text extraction text sanitization and normalization
...
- Improve reduntant quotes in entity and relation name, type and keywords
- Add HTML tag cleaning and Chinese symbol conversion
- Filter out short numeric content and malformed text
- Enhance entity type validation with character filtering
2025-08-31 13:17:20 +08:00
yangdx
d4bbc5dea9
refactor: Merge multi-step text sanitization into single function
2025-08-31 10:36:56 +08:00
yangdx
03d0fa3014
perf: add optional query_embedding parameter to avoid redundant embedding calls
2025-08-29 18:15:45 +08:00
yangdx
a923d378dd
Remove deprecated ID-based filtering from vector storage queries
...
- Remove ids param from QueryParam
- Simplify BaseVectorStorage.query signature
- Update all vector storage implementations
- Streamline PostgreSQL query templates
- Remove ID filtering from operate.py calls
2025-08-29 17:06:48 +08:00