Commit graph

570 commits

Author SHA1 Message Date
Raphaël MANSUY
93778770ab fix: sync core modules with upstream after Wave 2 2025-12-04 19:14:52 +08:00
Raphaël MANSUY
f5e653451a cherry-pick 37e8898c 2025-12-04 19:14:28 +08:00
Raphaël MANSUY
f7f9a9e6cf fix: sync all core modules with upstream after Wave 1 2025-12-04 19:13:48 +08:00
yangdx
2ea1fccf1a Refactor deduplication calculation and remove unused variables
(cherry picked from commit 1154c5683f)
2025-12-04 19:11:23 +08:00
DivinesLight
b9fc6f19dd Quick fix to limit source_id ballooning while inserting nodes
(cherry picked from commit 54f0a7d1ca)
2025-12-04 19:11:23 +08:00
yangdx
7f7574c8b7 Add token limit validation for character-only chunking
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks

(cherry picked from commit f988a22652)
2025-12-04 19:11:22 +08:00
yangdx
6e3ff18570 Adjust chunking parameters to match the default environment variable settings
(cherry picked from commit e77340d4a1)
2025-12-04 19:11:21 +08:00
EightyOliveira
b8dc5de81a refactor(chunking): rename params and improve docstring for chunking_by_token_size
(cherry picked from commit dacca334e0)
2025-12-04 19:11:21 +08:00
yangdx
cb5451faf8 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage

(cherry picked from commit dc62c78f98)
2025-12-04 19:11:19 +08:00
yangdx
687d2b6b13 Improve error handling and add cancellation checks in pipeline
(cherry picked from commit 77336e50b6)
2025-12-04 19:11:15 +08:00
yangdx
a471f1ca0e Add pipeline cancellation feature for graceful processing termination
• Add cancel_pipeline API endpoint
• Implement PipelineCancelledException
• Add cancellation checks in main loop
• Handle task cancellation gracefully
• Mark cancelled docs as FAILED

(cherry picked from commit 743aefc655)
2025-12-04 19:11:15 +08:00
yangdx
37d48bafb6 Simplify skip logging and reduce pipeline status updates
(cherry picked from commit a5253244f9)
2025-12-04 19:11:14 +08:00
Raphaël MANSUY
ed73def994 fix: sync core modules with upstream for compatibility 2025-12-04 19:10:46 +08:00
yangdx
a42222d7f9 Resolve lock leakage issue during user cancellation handling
• Change default log level to INFO
• Force enable error logging output
• Add lock cleanup rollback protection
• Handle LLM cache persistence errors
• Fix async task exception handling

(cherry picked from commit a9ec15e669)
2025-12-04 19:09:01 +08:00
yangdx
e4be3549c3 Improve entity identifier truncation warning message format
(cherry picked from commit 00aa5e53a7)
2025-12-04 19:09:00 +08:00
yangdx
6de4bb9113 Fix logging message formatting
(cherry picked from commit e0fd31a60d)
2025-12-04 19:08:46 +08:00
yangdx
dbb0b3afb4 Fix hl_keywords and ll_keywords cache logic
- Remove hl_keywords and ll_keywords from keywork extracht cache
- Add hl_keywords and ll_keywords to LLM query cache
2025-09-27 15:26:52 +08:00
yangdx
8cd4139cbf refactor: fix double query problem by add aquery_llm function for consistent response handling
- Add new aquery_llm/query_llm methods providing structured responses
- Consolidate /query and /query/stream endpoints to use unified aquery_llm
- Optimize cache handling by moving cache checks before LLM calls
2025-09-26 19:05:03 +08:00
yangdx
cbdc4c4bdf Refactor prompts and context building for better maintainability
- Extract context templates to PROMPTS
- Unify token calculation logic
- Simplify user_prompt formatting
- Reduce code duplication
- Improve prompt structure consistency
2025-09-26 12:39:06 +08:00
yangdx
fba2356c81 Move user_prompt to system prompt
- Refactor query prompt handling to separate user prompts in system context
- Simplify user_query to only contain query
- Apply changes to both kg_query and naive_query
2025-09-26 10:02:01 +08:00
yangdx
b848ca49e6 Fix linting 2025-09-25 16:22:00 +08:00
yangdx
b08b8a6a6a Add reference list support to query API endpoints with unified result handling
• Add include_references param to QueryRequest
• Extend QueryResponse with references field
• Create unified QueryResult data structures
• Refactor kg_query and naive_query functions
• Update streaming to send references first
2025-09-25 16:21:42 +08:00
yangdx
5eb4a4b799 feat: simplify citations, add reference merging, and restructure API response format 2025-09-24 14:30:10 +08:00
yangdx
367f3df038 Fix log message 2025-09-23 11:25:55 +08:00
yangdx
a4442a8613 Optimize log message 2025-09-23 11:22:14 +08:00
yangdx
86186c0c85 Update log message 2025-09-23 11:08:33 +08:00
yangdx
6e2eab5c23 Add ID fields to entities, relations, and chunks in raw data query results 2025-09-21 23:31:35 +08:00
yangdx
18e886d7e9 Improve context item identification with meaningful IDs
- Add EN prefix to entitie IDs
- Add RE prefix to relation IDs
-Add DC prefix chunk IDs
- Enhance traceability across contexts
2025-09-21 20:19:14 +08:00
yangdx
8f0fb3c9eb Include user query in prompt returns 2025-09-21 15:24:20 +08:00
yangdx
6eb37e270a Refactor query handling and improve RAG response prompts
- Move user_prompt to query concatenation
- Remove DEFAULT_USER_PROMPT constant
- Enhance prompt clarity and structure
- Standardize citation formatting
- Improve step-by-step instructions
2025-09-21 15:16:24 +08:00
yangdx
523028f8d0 Remove deprecated truncated fields from token truncation return
• Drop truncated_entities field
• Drop truncated_relations field
2025-09-21 11:00:48 +08:00
yangdx
7c463f0fb5 Change entity type formatting from title case to lowercase without spaces 2025-09-21 00:56:56 +08:00
yangdx
77569ddea2 Add chunk key to entity extraction logging output 2025-09-17 02:21:11 +08:00
yangdx
0e8d973d44 Shorten progress prefix in entity extraction error messages 2025-09-16 15:48:37 +08:00
yangdx
ecaee43788 Add error handling with chunk ID prefixing in entity extraction 2025-09-16 13:41:49 +08:00
yangdx
37d01e2df8 fix: Ensures complete metadata (source_id, created_at, file_path) is preserved in aquery_data responses 2025-09-15 03:45:09 +08:00
yangdx
e71229698d refactor: centralize metadata generation in query functions
- Remove processing_info generation from _convert_to_user_format function
- Move all metadata generation (keywords, processing_info) to kg_query and naive_query functions
- Simplify _convert_to_user_format to focus only on data format conversion
2025-09-15 03:11:07 +08:00
yangdx
c0d5abba6b Fix linting 2025-09-15 02:59:21 +08:00
yangdx
b1c8206346 Add aquery_data endpoint for structured retrieval without LLM generation
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
82a67354d0 Code formatting improvements and style consistency fixes
* Remove trailing whitespace
* Fix function signature ellipsis style
2025-09-14 17:49:02 +08:00
yangdx
87bb8a023b Fix tuple delimiter regex patterns and add debug logging
- Add debug logs for malformed records
- Fix regex for consecutive delimiters
- Handle missing closing brackets
2025-09-14 17:29:27 +08:00
yangdx
4de1473875 Improve entity extraction prompts and error message formatting
• Fix typo in error log message
• Clarify format requirements in prompts
• Make extraction instructions clearer
• Improve user prompt consistency
2025-09-14 13:45:59 +08:00
yangdx
20c5127c7c Merge branch 'optimize-extraction' into return-data-only 2025-09-14 12:33:37 +08:00
yangdx
619553021e Fix delimiter processing and optimize case-sensitive handling
• Fix completion_delimiter reference bug
• Add case check before lowercase conversion
• Improve delimiter corruption handling
• Optimize redundant processing logic
2025-09-14 12:23:48 +08:00
yangdx
fd48afdb00 Use "relation" instead of "relationship" in extration prompt, and support both format for safty 2025-09-14 11:43:35 +08:00
yangdx
1dc96f3959 Merge branch 'optimize-extraction' into return-data-only 2025-09-14 05:37:48 +08:00
yangdx
b820d8d588 Fix entity/relationship record parsing in extraction result processing 2025-09-14 05:35:01 +08:00
yangdx
4f5ad76c2c Add entity vector database upsert for newly added entities by edges upserts 2025-09-14 05:04:45 +08:00
yangdx
7cc2b69bcf Fix linting 2025-09-14 05:02:02 +08:00
yangdx
cddd81a86c Fix LLM output format errors in extraction result processing
- Handle tuple_delimiter as record separator
- Add format validation and correction
- Add warning for format errors
2025-09-14 04:13:01 +08:00