Commit graph

590 commits

Author SHA1 Message Date
Raphaël MANSUY
817b5dbdb2 cherry-pick 4f12fe12 2025-12-04 19:19:22 +08:00
Raphaël MANSUY
c53b7cba76 cherry-pick ec2ea4fd 2025-12-04 19:19:00 +08:00
Raphaël MANSUY
f7c8804a52 cherry-pick 3fa79026 2025-12-04 19:18:40 +08:00
Raphaël MANSUY
d8e98ca362 cherry-pick 29c4a91d 2025-12-04 19:18:39 +08:00
Raphaël MANSUY
803315e60c cherry-pick 97a2ee4e 2025-12-04 19:18:38 +08:00
Raphaël MANSUY
458c3aa38a cherry-pick 5ee9a2f8 2025-12-04 19:18:38 +08:00
Raphaël MANSUY
09bab5f49f cherry-pick 78ad8873 2025-12-04 19:18:38 +08:00
Raphaël MANSUY
77a715f61b cherry-pick 904b1f46 2025-12-04 19:18:37 +08:00
Raphaël MANSUY
4231e38281 cherry-pick fe890fca 2025-12-04 19:18:37 +08:00
Raphaël MANSUY
2054c35d15 cherry-pick cd1c48be 2025-12-04 19:18:37 +08:00
Raphaël MANSUY
18a8f57b89 cherry-pick be3d274a 2025-12-04 19:18:37 +08:00
Raphaël MANSUY
ef2355a7ac cherry-pick a809245a 2025-12-04 19:18:37 +08:00
Raphaël MANSUY
646b1fad38 cherry-pick 80668aae 2025-12-04 19:18:37 +08:00
Raphaël MANSUY
b5d68c1756 cherry-pick 665f60b9 2025-12-04 19:18:37 +08:00
Raphaël MANSUY
ab6e8a9cf4 cherry-pick 3ed2abd8 2025-12-04 19:18:37 +08:00
Raphaël MANSUY
5ac376ed63 cherry-pick e01c998e 2025-12-04 19:18:36 +08:00
Raphaël MANSUY
b38177de80 cherry-pick a9fec267 2025-12-04 19:18:36 +08:00
Raphaël MANSUY
9b1579f2df cherry-pick 29bac49f 2025-12-04 19:18:35 +08:00
Raphaël MANSUY
a3d7f4b985 cherry-pick 17c2a929 2025-12-04 19:18:35 +08:00
Raphaël MANSUY
d85c5a5875 cherry-pick 4e740af7 2025-12-04 19:18:16 +08:00
Raphaël MANSUY
93778770ab fix: sync core modules with upstream after Wave 2 2025-12-04 19:14:52 +08:00
Raphaël MANSUY
f5e653451a cherry-pick 37e8898c 2025-12-04 19:14:28 +08:00
Raphaël MANSUY
f7f9a9e6cf fix: sync all core modules with upstream after Wave 1 2025-12-04 19:13:48 +08:00
yangdx
2ea1fccf1a Refactor deduplication calculation and remove unused variables
(cherry picked from commit 1154c5683f)
2025-12-04 19:11:23 +08:00
DivinesLight
b9fc6f19dd Quick fix to limit source_id ballooning while inserting nodes
(cherry picked from commit 54f0a7d1ca)
2025-12-04 19:11:23 +08:00
yangdx
7f7574c8b7 Add token limit validation for character-only chunking
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks

(cherry picked from commit f988a22652)
2025-12-04 19:11:22 +08:00
yangdx
6e3ff18570 Adjust chunking parameters to match the default environment variable settings
(cherry picked from commit e77340d4a1)
2025-12-04 19:11:21 +08:00
EightyOliveira
b8dc5de81a refactor(chunking): rename params and improve docstring for chunking_by_token_size
(cherry picked from commit dacca334e0)
2025-12-04 19:11:21 +08:00
yangdx
cb5451faf8 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage

(cherry picked from commit dc62c78f98)
2025-12-04 19:11:19 +08:00
yangdx
687d2b6b13 Improve error handling and add cancellation checks in pipeline
(cherry picked from commit 77336e50b6)
2025-12-04 19:11:15 +08:00
yangdx
a471f1ca0e Add pipeline cancellation feature for graceful processing termination
• Add cancel_pipeline API endpoint
• Implement PipelineCancelledException
• Add cancellation checks in main loop
• Handle task cancellation gracefully
• Mark cancelled docs as FAILED

(cherry picked from commit 743aefc655)
2025-12-04 19:11:15 +08:00
yangdx
37d48bafb6 Simplify skip logging and reduce pipeline status updates
(cherry picked from commit a5253244f9)
2025-12-04 19:11:14 +08:00
Raphaël MANSUY
ed73def994 fix: sync core modules with upstream for compatibility 2025-12-04 19:10:46 +08:00
yangdx
a42222d7f9 Resolve lock leakage issue during user cancellation handling
• Change default log level to INFO
• Force enable error logging output
• Add lock cleanup rollback protection
• Handle LLM cache persistence errors
• Fix async task exception handling

(cherry picked from commit a9ec15e669)
2025-12-04 19:09:01 +08:00
yangdx
e4be3549c3 Improve entity identifier truncation warning message format
(cherry picked from commit 00aa5e53a7)
2025-12-04 19:09:00 +08:00
yangdx
6de4bb9113 Fix logging message formatting
(cherry picked from commit e0fd31a60d)
2025-12-04 19:08:46 +08:00
yangdx
dbb0b3afb4 Fix hl_keywords and ll_keywords cache logic
- Remove hl_keywords and ll_keywords from keywork extracht cache
- Add hl_keywords and ll_keywords to LLM query cache
2025-09-27 15:26:52 +08:00
yangdx
8cd4139cbf refactor: fix double query problem by add aquery_llm function for consistent response handling
- Add new aquery_llm/query_llm methods providing structured responses
- Consolidate /query and /query/stream endpoints to use unified aquery_llm
- Optimize cache handling by moving cache checks before LLM calls
2025-09-26 19:05:03 +08:00
yangdx
cbdc4c4bdf Refactor prompts and context building for better maintainability
- Extract context templates to PROMPTS
- Unify token calculation logic
- Simplify user_prompt formatting
- Reduce code duplication
- Improve prompt structure consistency
2025-09-26 12:39:06 +08:00
yangdx
fba2356c81 Move user_prompt to system prompt
- Refactor query prompt handling to separate user prompts in system context
- Simplify user_query to only contain query
- Apply changes to both kg_query and naive_query
2025-09-26 10:02:01 +08:00
yangdx
b848ca49e6 Fix linting 2025-09-25 16:22:00 +08:00
yangdx
b08b8a6a6a Add reference list support to query API endpoints with unified result handling
• Add include_references param to QueryRequest
• Extend QueryResponse with references field
• Create unified QueryResult data structures
• Refactor kg_query and naive_query functions
• Update streaming to send references first
2025-09-25 16:21:42 +08:00
yangdx
5eb4a4b799 feat: simplify citations, add reference merging, and restructure API response format 2025-09-24 14:30:10 +08:00
yangdx
367f3df038 Fix log message 2025-09-23 11:25:55 +08:00
yangdx
a4442a8613 Optimize log message 2025-09-23 11:22:14 +08:00
yangdx
86186c0c85 Update log message 2025-09-23 11:08:33 +08:00
yangdx
6e2eab5c23 Add ID fields to entities, relations, and chunks in raw data query results 2025-09-21 23:31:35 +08:00
yangdx
18e886d7e9 Improve context item identification with meaningful IDs
- Add EN prefix to entitie IDs
- Add RE prefix to relation IDs
-Add DC prefix chunk IDs
- Enhance traceability across contexts
2025-09-21 20:19:14 +08:00
yangdx
8f0fb3c9eb Include user query in prompt returns 2025-09-21 15:24:20 +08:00
yangdx
6eb37e270a Refactor query handling and improve RAG response prompts
- Move user_prompt to query concatenation
- Remove DEFAULT_USER_PROMPT constant
- Enhance prompt clarity and structure
- Standardize citation formatting
- Improve step-by-step instructions
2025-09-21 15:16:24 +08:00