Raphaël MANSUY
0c46370940
cherry-pick 90f52acf
2025-12-04 19:19:04 +08:00
Raphaël MANSUY
44a05f7a25
cherry-pick c14f25b7
2025-12-04 19:19:01 +08:00
Raphaël MANSUY
b38177de80
cherry-pick a9fec267
2025-12-04 19:18:36 +08:00
Raphaël MANSUY
2846a18ab3
cherry-pick f5558240
2025-12-04 19:18:36 +08:00
Raphaël MANSUY
a3d7f4b985
cherry-pick 17c2a929
2025-12-04 19:18:35 +08:00
Raphaël MANSUY
d85c5a5875
cherry-pick 4e740af7
2025-12-04 19:18:16 +08:00
Raphaël MANSUY
7ffecec08e
cherry-pick f28a0c25
2025-12-04 19:15:03 +08:00
Raphaël MANSUY
e73248eb24
cherry-pick d1f4b6e5
2025-12-04 19:15:03 +08:00
Raphaël MANSUY
1368d3a1fe
cherry-pick abeaac84
2025-12-04 19:15:03 +08:00
Raphaël MANSUY
9ba9254cfb
cherry-pick 6de4123f
2025-12-04 19:15:02 +08:00
Raphaël MANSUY
4ec5073aaa
cherry-pick 23cbb9c9
2025-12-04 19:15:02 +08:00
Raphaël MANSUY
93778770ab
fix: sync core modules with upstream after Wave 2
2025-12-04 19:14:52 +08:00
Raphaël MANSUY
a0514eec1a
cherry-pick 0c4cba38
2025-12-04 19:14:28 +08:00
Raphaël MANSUY
7fa455ff07
cherry-pick c13f9116
2025-12-04 19:14:28 +08:00
Raphaël MANSUY
3558adae47
cherry-pick 05852e1a
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
cacea8ab56
cherry-pick 33a1482f
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
f7f9a9e6cf
fix: sync all core modules with upstream after Wave 1
2025-12-04 19:13:48 +08:00
DivinesLight
b9fc6f19dd
Quick fix to limit source_id ballooning while inserting nodes
...
(cherry picked from commit 54f0a7d1ca )
2025-12-04 19:11:23 +08:00
yangdx
7e0f12c28e
Enhance entity/relation editing with chunk tracking synchronization
...
• Add chunk storage sync to edit ops
• Implement incremental chunk ID updates
• Support entity renaming migrations
• Normalize relation keys consistently
• Preserve chunk references on edits
(cherry picked from commit 3fbd704bf9 )
2025-12-04 19:11:19 +08:00
yangdx
cb5451faf8
Add entity/relation chunk tracking with configurable source ID limits
...
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
(cherry picked from commit dc62c78f98 )
2025-12-04 19:11:19 +08:00
yangdx
322ff19f72
Remove ascii_colors dependency and fix stream handling errors
...
• Remove ascii_colors.trace_exception calls
• Add SafeStreamHandler for closed streams
• Patch ascii_colors console handler
• Prevent ValueError on stream close
• Improve logging error handling
(cherry picked from commit 0fb2925c6a )
2025-12-04 19:11:13 +08:00
yangdx
ed79218550
Optimize JSON write with fast/slow path to reduce memory usage
...
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage
(cherry picked from commit 777c987371 )
2025-12-04 19:09:04 +08:00
yangdx
a528213210
Fix logging filter logic
...
• Fix boolean operator precedence in filter
• Consolidate GET/POST condition logic
2025-09-26 19:42:33 +08:00
yangdx
8cd4139cbf
refactor: fix double query problem by add aquery_llm function for consistent response handling
...
- Add new aquery_llm/query_llm methods providing structured responses
- Consolidate /query and /query/stream endpoints to use unified aquery_llm
- Optimize cache handling by moving cache checks before LLM calls
2025-09-26 19:05:03 +08:00
yangdx
5eb4a4b799
feat: simplify citations, add reference merging, and restructure API response format
2025-09-24 14:30:10 +08:00
yangdx
6e2eab5c23
Add ID fields to entities, relations, and chunks in raw data query results
2025-09-21 23:31:35 +08:00
yangdx
18e886d7e9
Improve context item identification with meaningful IDs
...
- Add EN prefix to entitie IDs
- Add RE prefix to relation IDs
-Add DC prefix chunk IDs
- Enhance traceability across contexts
2025-09-21 20:19:14 +08:00
yangdx
37d01e2df8
fix: Ensures complete metadata (source_id, created_at, file_path) is preserved in aquery_data responses
2025-09-15 03:45:09 +08:00
yangdx
e71229698d
refactor: centralize metadata generation in query functions
...
- Remove processing_info generation from _convert_to_user_format function
- Move all metadata generation (keywords, processing_info) to kg_query and naive_query functions
- Simplify _convert_to_user_format to focus only on data format conversion
2025-09-15 03:11:07 +08:00
yangdx
b1c8206346
Add aquery_data endpoint for structured retrieval without LLM generation
...
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
82a67354d0
Code formatting improvements and style consistency fixes
...
* Remove trailing whitespace
* Fix function signature ellipsis style
2025-09-14 17:49:02 +08:00
yangdx
87bb8a023b
Fix tuple delimiter regex patterns and add debug logging
...
- Add debug logs for malformed records
- Fix regex for consecutive delimiters
- Handle missing closing brackets
2025-09-14 17:29:27 +08:00
yangdx
70fee5bbeb
Fix syntax warning by removin examples from fix_tuple_delimiter_corruption docstring
2025-09-14 12:37:21 +08:00
yangdx
20c5127c7c
Merge branch 'optimize-extraction' into return-data-only
2025-09-14 12:33:37 +08:00
yangdx
ff705a2323
Fix tuple delimiter corruption when missing closing bracket, Handle <|#: -> <|#|> pattern
2025-09-14 11:44:21 +08:00
yangdx
1dc96f3959
Merge branch 'optimize-extraction' into return-data-only
2025-09-14 05:37:48 +08:00
yangdx
2686fc526e
Change entity type from CreativeWork to Content and update delimiter
...
• Replace CreativeWork with Content type
• Improve LLM output error messages
• Update prompt for binary relationships
• Fix delimiter corruption examples
2025-09-14 00:55:15 +08:00
yangdx
0ffb5d5f2d
Replace search API with aquery_data for consistent raw data retrieval, mirroring aquery results
...
• Reuse existing query logic paths and remove kg_search function entirely
• Update kg_query/naive_query to return raw data as needed
2025-09-13 15:30:29 +08:00
yangdx
8088b7e07a
Fix tuple delimiter corruption handling and update documentation
2025-09-12 18:03:37 +08:00
yangdx
8a3e2c03a9
Fix tuple delimiter corruption patterns with pipes and brackets
...
- Handle <||S||> malformed delimiters
- Fix <||> empty pipe sequences
- Repair <|| incomplete patterns
- Process ||S|| missing brackets
- Improve delimiter normalization
2025-09-12 17:45:32 +08:00
yangdx
0221213b9b
Improve entity summarization with JSONL format and fix tuple delimiters
...
• Convert descriptions to JSONL format
• Add token-based truncation helper
• Enhance entity name consistency rules
• Improve summarization prompt clarity
• Fix tuple delimiter corruption patterns
2025-09-12 12:32:08 +08:00
yangdx
1892ed23cc
Change tuple delimiter from <|SEP|> to <|S|> across codebase
...
• Update prompt instruction clarity
• Correct utility function examples
• Update regex pattern comments
2025-09-12 08:57:46 +08:00
yangdx
c07bcbff44
Fix tuple delimiter corruption patterns and add missing edge cases
2025-09-12 08:35:37 +08:00
yangdx
8660bf34e4
Add timestamp tracking for LLM responses and entity/relationship data
...
- Track timestamps for cache hits/misses
- Add timestamp to entity/relationship objects
- Sort descriptions by timestamp order
- Preserve temporal ordering in merges
2025-09-12 04:34:12 +08:00
yangdx
40688def20
Refactor tuple delimiter corruption fix into reusable utility function
...
- Extract regex fixes to utils module
- Add case-insensitive delimiter handling
2025-09-12 04:10:14 +08:00
yangdx
a49c8e4a0d
Refactor JSON serialization to use newline-separated format
...
- Replace json.dumps with line-by-line format
- Apply to entities, relations, text units
- Update truncation key functions
- Maintain ensure_ascii=False setting
- Improve context readability
2025-09-10 11:59:25 +08:00
yangdx
2dd143c935
Refactor conversation history handling to use LLM native message format
...
• Remove get_conversation_turns utility
• Pass history_messages to LLM directly
• Clean up prompt template formatting
2025-09-10 11:56:58 +08:00
yangdx
09abb656b8
Improve log message formatting for better readability
2025-09-09 17:41:09 +08:00
yangdx
d218f15a62
Refactor entity extraction with system prompts and output limits
...
- Add system/user prompt separation
- Set max tokens for endless output fix
- Improve extraction error logging
- Update cache type from extract to summary
2025-09-08 15:20:45 +08:00
yangdx
c87eb2cfcf
Increase timeout buffers for async function calls
...
• Extend execution timeout buffer to 150s
• Extend task duration buffer to 180s
• Account for low-level retry delays
• Improve health check phase handling
• Reduce timeout-related failures
2025-09-06 23:56:24 +08:00