Commit graph

4675 commits

Author SHA1 Message Date
hzywhite
e07d4bb70b merge 2025-09-05 15:04:04 +08:00
hzywhite
482a09d397 merge 2025-09-05 15:03:19 +08:00
hzywhite
8d800239d6 merge 2025-09-05 15:02:49 +08:00
hzywhite
e3ea87da24 merge 2025-09-05 15:01:50 +08:00
hzywhite
2a453fbe37 webui 2025-09-04 11:24:06 +08:00
hzywhite
7c8db78057 merge 2025-09-04 11:05:22 +08:00
hzywhite
82a0f8cc1f merge 2025-09-04 10:57:41 +08:00
hzywhite
e27031587d merge 2025-09-04 10:27:38 +08:00
hzywhite
bd533783e1 Update document_routes.py 2025-09-02 06:51:32 +08:00
hzywhite
cb003593df Update document_routes.py 2025-09-02 06:50:12 +08:00
hzywhite
745aa085db summary 2025-09-02 06:21:08 +08:00
hzywhite
36c81039b1 Summary 2025-09-02 06:15:29 +08:00
hzywhite
d8b2264d8b summary 2025-09-02 03:54:20 +08:00
yangdx
9b7ed84e05 Improve document deletion error handling and message consistency
- Standardize deletion log messages
- Add try-catch for file operations
- Improve enqueued file error handling
2025-08-20 11:01:24 +08:00
yangdx
a4c4b1182a Fix logging level usage in Redis retry decorator
* Replace string with logging.WARNING constant
2025-08-20 05:21:15 +08:00
yangdx
485c4b7de7 Change document deletion warnings to info level logging 2025-08-20 03:28:42 +08:00
Daniel.y
ac9647d117
Merge pull request #1983 from danielaskdd/santitize-text
Fix: resolved UTF-8 encoding error during document processing
2025-08-20 02:52:19 +08:00
Daniel.y
a98b814df5
Merge pull request #1982 from danielaskdd/pipeline-remove-enqueued-file
Fix(UI): Implement XLSX format upload support for web UI
2025-08-19 19:58:18 +08:00
yangdx
ced3aef7cb refactor: simplify text encoding by removing redundant safe_encode_for_llm 2025-08-19 19:37:46 +08:00
yangdx
806081645f Refactor text cleaning to use sanitize_text_for_encoding consistently
• Replace clean_text with sanitize_text
• Remove deprecated clean_text function
• Add whitespace trimming to sanitizer
• Improve UTF-8 encoding safety
• Consolidate text cleaning logic
2025-08-19 19:20:01 +08:00
yangdx
f9cf544805 Add text sanitization to prevent UTF-8 encoding errors in LLM calls
• Remove surrogate characters
• Clean control characters
• Sanitize input and history messages
• Add comprehensive error handling
• Log sanitization activities
2025-08-19 18:50:52 +08:00
yangdx
64015548df Refactor MD5 hash functions and consolidate Unicode error handling 2025-08-19 17:49:23 +08:00
yangdx
64058c771f Refactor: Harden compute_args_hash against Unicode errors 2025-08-19 17:19:39 +08:00
yangdx
2603e99005 Enhance file deletion to remove files from both input and enqueued dirs 2025-08-19 17:13:58 +08:00
yangdx
1f86543772 Update i18n translation and webui assets 2025-08-19 16:23:05 +08:00
yangdx
c6b30f1a03 Fix file type mappings for proper MIME type handling 2025-08-19 15:26:21 +08:00
yangdx
950221db59 Refactor keyword extraction rules and remove overlap constraint
• Require content in both keyword categories
• Remove no-overlap rule between lists
• Simplify edge case handling
• Clarify source of truth requirement
2025-08-19 15:12:15 +08:00
yangdx
0aa1bc8bf9 Update webui assets and bump api version to 0205 2025-08-19 15:11:34 +08:00
yangdx
e38df464ea Ensure front-end file type uploads are synchronized with back-end 2025-08-19 15:10:13 +08:00
yangdx
ac33cf693d Refactor keyword extraction rules and remove overlap constraint
• Require content in both keyword categories
• Remove no-overlap rule between lists
• Simplify edge case handling
• Clarify source of truth requirement
2025-08-19 15:07:40 +08:00
yangdx
9ed5b93467 Add [File Extraction] prefix to error messages and logs 2025-08-19 11:33:28 +08:00
Daniel.y
ce35b1dfd4
Merge pull request #1977 from danielaskdd/keywork-extract
Optimize keyword extraction prompt, and remove conversation history from keyword extraction
2025-08-19 00:47:02 +08:00
yangdx
92c0ad0076 Fix linting 2025-08-19 00:45:29 +08:00
yangdx
23334e7e51 Update prompt.py 2025-08-19 00:29:33 +08:00
yangdx
2a7fec2873 Optimize keyword extraction prompt, and remove conversation history from keywork extraction.
- Remove history context processing
- Update prompt to focus on single query
- Clarify high/low level keyword types
- Improve JSON output instructions
- Add edge case handling guidance
2025-08-18 23:35:04 +08:00
yangdx
ee15629f26 Merge branch 'pg-optimization' 2025-08-18 22:34:08 +08:00
yangdx
cdfbd2114f Merge branch 'main' into pg-optimization 2025-08-18 22:24:37 +08:00
yangdx
d54c8f973b Merge branch 'Matt23-star/main' into pg-optimization 2025-08-18 22:23:47 +08:00
yangdx
1c4d6fde58 Change log level from info to debug for document storage message 2025-08-18 20:04:29 +08:00
Daniel.y
5fc2400a70
Merge pull request #1976 from danielaskdd/kg-context-file-path
Refactor: Remove file_path and created_at from entity and relation query context send to LLM
2025-08-18 19:40:54 +08:00
yangdx
368d2b00d6 Update webui assets and bump api version to 0204 2025-08-18 19:33:46 +08:00
yangdx
d5e8f1e860 Update default query parameters for better performance
- Increase chunk_top_k from 10 to 20
- Reduce max_entity_tokens to 6000
- Reduce max_relation_tokens to 8000
- Update web UI default values
- Fix max_total_tokens to 30000
2025-08-18 19:32:11 +08:00
yangdx
8d7a7e4ad6 Refactor prompt templates with improved guidelines and citation formats 2025-08-18 19:14:32 +08:00
yangdx
d3fde60938 refactor: remove file_path and created_at from context, improve token truncation
- Remove file_path and created_at fields from entity and relationship contexts
- Update token truncation to include full JSON serialization instead of content only
2025-08-18 18:30:09 +08:00
Daniel.y
1484c4adfa
Merge pull request #1975 from danielaskdd/milvus-file-path-len
Refac: Increase file_path field length to 32768 and add schema migration for Milvus DB
2025-08-18 17:17:00 +08:00
yangdx
a9d6807432 Fix query windows size limitation for Milvus data migration 2025-08-18 16:29:03 +08:00
yangdx
47b8caaf64 Stop execution on validation errors in Milvus storage
• Stop execution on validation errors to prevent potential data loss
2025-08-18 14:15:07 +08:00
yangdx
453efeb924 Fix file path length checking to use UTF-8 byte length instead of char count 2025-08-18 13:59:27 +08:00
Daniel.y
b27664298a
Merge pull request #1971 from danielaskdd/failed-2-pending
Change the status from PROCESSING/FAILED to PENDING at the beginning of document processing pipeline
2025-08-18 12:03:49 +08:00
yangdx
dcec511f72 feat: increase file path length limit to 32768 and add schema migration for Milvus DB
- Bump path limit to 32768 chars
- Add migration detection logic
- Implement dual-client migration
- Auto-migrate old collections
2025-08-18 04:37:12 +08:00