Commit graph

4769 commits

Author SHA1 Message Date
yangdx
d9aa021682 Update env.example 2025-08-30 11:02:53 +08:00
Daniel.y
0c41be6f8f
Merge pull request #2026 from pedrofs/pedro/fix-env.example
fix: adjust the EMBEDDING_BINDING_HOST for openai in the env.example
2025-08-29 22:52:29 +08:00
Pedro Fernandes Steimbruch
8430e1a051 fix: adjust the EMBEDDING_BINDING_HOST for openai in the env.example 2025-08-29 09:48:42 -03:00
yangdx
43f32e8d97 Bump api version to 0209 2025-08-29 19:42:06 +08:00
Daniel.y
163ec26e10
Merge pull request #2025 from danielaskdd/remove-ids-filter
refac: Remove deprecated doc-id based filtering from vector storage queries
2025-08-29 19:39:42 +08:00
yangdx
f3989548b9 Fix MongoDB vector query embedding format compatibility
* Convert numpy arrays to lists
* Ensure MongoDB compatibility
2025-08-29 18:51:53 +08:00
yangdx
03d0fa3014 perf: add optional query_embedding parameter to avoid redundant embedding calls 2025-08-29 18:15:45 +08:00
yangdx
a923d378dd Remove deprecated ID-based filtering from vector storage queries
- Remove ids param from QueryParam
- Simplify BaseVectorStorage.query signature
- Update all vector storage implementations
- Streamline PostgreSQL query templates
- Remove ID filtering from operate.py calls
2025-08-29 17:06:48 +08:00
Daniel.y
20b800d694
Merge pull request #2024 from danielaskdd/llm-error-handling
refac: Enhanced Timeout Handling for LLM Priority Queue
2025-08-29 15:26:30 +08:00
yangdx
d39afcb831 Add temperature guidance for Qwen3 models in env example 2025-08-29 15:13:52 +08:00
yangdx
d7e0701b63 Improve logging setup and add error prefixes for LLM functions
- Move logger init to top of file
- Add console handler by default
- Prefix LLM errors with "[LLM func]"
- Update timeout log messages
- Comment out pypinyin success log
2025-08-29 14:19:13 +08:00
yangdx
925e631a9a refac: Add robust time out handling for LLM request 2025-08-29 13:50:35 +08:00
yangdx
ac2db35160 Update env.example 2025-08-29 10:18:12 +08:00
Daniel.y
e51fa2439d
Merge pull request #2021 from SandmeyerX/docs/config-fix-env-comment-typos
docs(config): fix typo in .env comments
2025-08-28 23:05:19 +08:00
Sandmeyer
1cd27dc048
docs(config): fix typo in .env comments 2025-08-28 20:23:51 +08:00
Daniel.y
57ba2cabcb
Merge pull request #2017 from danielaskdd/improve-text-sanitize
Fix UTF-8 Encoding Issues Causing Document Processing Failures
2025-08-28 00:21:44 +08:00
yangdx
99e28e815b fix: prevent document processing failures from UTF-8 surrogate characters
- Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders
- Add strict UTF-8 cleaning pipeline to entity/relationship extraction
- Skip problematic entities/relationships instead of corrupting data

Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)
2025-08-27 23:52:39 +08:00
yangdx
4dfbe5e2db Rename workflow and remove latest tag from Docker build
• Rename docker-build-main to manual
• Remove latest tag from metadata
2025-08-27 15:14:23 +08:00
yangdx
6a2a592224 Fix linting 2025-08-27 12:51:50 +08:00
yangdx
8a0d06e557 Restore default entity types 2025-08-27 12:51:18 +08:00
yangdx
28e07c89f9 Fix linting 2025-08-27 12:35:51 +08:00
yangdx
2ccc39de9a Fix language fallback in summarize error 2025-08-27 12:34:27 +08:00
yangdx
0be4f0144b Merge branch 'entityTypesServerSupport' 2025-08-27 12:23:58 +08:00
yangdx
ff0a18e08c Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method 2025-08-27 12:23:22 +08:00
Daniel.y
4edb0011b9
Merge pull request #2009 from LinkinPony/main
fix mismatch of 'error' and 'error_msg' in MongoDB
2025-08-27 11:45:45 +08:00
yangdx
194f46f239 Add json_repair dependency to project requirements 2025-08-27 11:14:09 +08:00
yangdx
cb0a035076 Update env.example 2025-08-27 11:12:52 +08:00
LinkinPony
45da0385eb
Merge branch 'HKUDS:main' into main 2025-08-27 09:22:39 +08:00
Thibo Rosemplatt
c3aabfc251 Merge branch 'main' into entityTypesServerSupport 2025-08-26 21:48:20 +02:00
Daniel.y
c975263fbc
Merge pull request #2013 from danielaskdd/fix-file-order
fix(webui): resolve document status grouping issue in DocumentManager
2025-08-26 23:57:21 +08:00
Daniel.y
82f72521f5
Merge pull request #2006 from danielaskdd/optimize-merge-stage
refac: Refactor LLM Summary Generation Algorithm
2025-08-26 23:56:56 +08:00
yangdx
c259b8f22c Update webui assets and bump aip verion to 0208 2025-08-26 23:05:00 +08:00
yangdx
7db788aa66 fix(webui): resolve document status grouping issue in DocumentManager
- Fix documents being grouped by status after pagination and sorting
- Use backend-sorted data directly from currentPageDocs instead of re-grouping
- Preserve backend sort order to prevent status-based grouping
- Maintain backward compatibility with legacy docs structure
- Ensure all sorting fields (file name, dates, ID) work correctly without status grouping

The issue occurred because the frontend was re-grouping already-sorted data
from the backend by status, breaking the intended sort order. Now documents
are displayed in the exact order returned by the backend API.

Fixes: Document list sorting by file name was grouping by status instead of
maintaining proper sort order across all documents.
2025-08-26 23:03:41 +08:00
yangdx
d3623cc9ae fix: resolve infinite loop risk in _handle_entity_relation_summary
- Ensure oversized descriptions are force-merged with subsequent ones
- Add len(current_list) <= 2 termination condition to guarantee convergence
- Implement token-based truncation in _summarize_descriptions to prevent overflow
2025-08-26 21:58:31 +08:00
yangdx
e0a755e42c Refactor prompt instructions to emphasize depth and completeness 2025-08-26 18:28:57 +08:00
yangdx
79e0226b2b Refactor: move force_llm_summary_on_merge to global_config access
- Remove parameter from function signature
- Access from global_config instead
- Improve code consistency
2025-08-26 18:02:39 +08:00
yangdx
01a2c79f29 Standardize prompt formatting and section headers across templates
- Remove hash delimiters
- Consistent section headers
- Add "Output:" labels
- Clean up example formatting
2025-08-26 14:42:52 +08:00
yangdx
6bcfe696ee feat: add output length recommendation and description type to LLM summary
- Add SUMMARY_LENGTH_RECOMMENDED parameter (600 tokens)
- Optimize prompt temple for LLM summary
2025-08-26 14:41:12 +08:00
LinkinPony
ff4c747a2a fix mismatch of 'error' and 'error_msg' in MongoDB 2025-08-26 10:43:56 +08:00
yangdx
025f70089a Simplify status messages in knowledge rebuild operations 2025-08-26 04:26:15 +08:00
yangdx
84416d104d Increase default LLM summary merge threshold from 4 to 8 for reducing summary trigger frequency 2025-08-26 03:57:35 +08:00
yangdx
9eb2be79b8 feat: track actual LLM usage in entity/relation merging
- Modified _handle_entity_relation_summary to return tuple[str, bool]
- Updated merge functions to log "LLMmerg" vs "Merging" based on actual LLM usage
- Replaced hardcoded fragment count prediction with real-time LLM usage tracking
2025-08-26 03:56:18 +08:00
yangdx
cb0fe38b9a Fix linting 2025-08-26 02:22:34 +08:00
yangdx
de2daf6565 refac: Rename summary_max_tokens to summary_context_size, comprehensive parameter validation for summary configuration
- Update algorithm logic in operate.py for better token management
- Fix health endpoint to use correct parameter names
2025-08-26 01:35:50 +08:00
yangdx
91767ffcee Improve warning message formatting in entity/relationship rebuild 2025-08-25 21:55:29 +08:00
yangdx
15cdd0dd8f fix: Sort cached extraction results by the create_time within each chunk
This ensures the KG rebuilds maintain the original creation order of the first extraction result for each chunk.
2025-08-25 21:41:33 +08:00
yangdx
882d6857d8 feat: Implement map-reduce summarization to handle large humber of description merging 2025-08-25 21:03:16 +08:00
yangdx
0b1b264a5d refactor: optimize graph lock scope in document deletion
- Move dependency analysis outside graph database lock
- Add persistence call before lock release to prevent dirty reads
2025-08-25 17:46:32 +08:00
yangdx
cac8e189e7 Remove redundant entity vector deletion before upsert 2025-08-25 17:18:51 +08:00
yangdx
9b6de7512d Optimize the stability of description merging order 2025-08-25 17:10:51 +08:00