Commit graph

4626 commits

Author SHA1 Message Date
yangdx
d3fde60938 refactor: remove file_path and created_at from context, improve token truncation
- Remove file_path and created_at fields from entity and relationship contexts
- Update token truncation to include full JSON serialization instead of content only
2025-08-18 18:30:09 +08:00
Daniel.y
1484c4adfa
Merge pull request #1975 from danielaskdd/milvus-file-path-len
Refac: Increase file_path field length to 32768 and add schema migration for Milvus DB
2025-08-18 17:17:00 +08:00
yangdx
a9d6807432 Fix query windows size limitation for Milvus data migration 2025-08-18 16:29:03 +08:00
yangdx
47b8caaf64 Stop execution on validation errors in Milvus storage
• Stop execution on validation errors to prevent potential data loss
2025-08-18 14:15:07 +08:00
yangdx
453efeb924 Fix file path length checking to use UTF-8 byte length instead of char count 2025-08-18 13:59:27 +08:00
Daniel.y
b27664298a
Merge pull request #1971 from danielaskdd/failed-2-pending
Change the status from PROCESSING/FAILED to PENDING at the beginning of document processing pipeline
2025-08-18 12:03:49 +08:00
yangdx
dcec511f72 feat: increase file path length limit to 32768 and add schema migration for Milvus DB
- Bump path limit to 32768 chars
- Add migration detection logic
- Implement dual-client migration
- Auto-migrate old collections
2025-08-18 04:37:12 +08:00
yangdx
377f1a022e fix: reset PROCESSING/FAILED docs to PENDING at the beginging of document processing pipeline
- Reset documents with PROCESSING/FAILED status to PENDING when they pass consistency checks
- Update doc_status storage and clear error messages/metadata on reset
2025-08-18 00:49:52 +08:00
yangdx
add8b07a21 Improve logging messages for document processing clarity 2025-08-18 00:22:04 +08:00
yangdx
14e083a1a6 fix: replace pyuca with pypinyin for Chinese pinyin sorting and add file_path sort 2025-08-17 15:21:24 +08:00
yangdx
1941df9cf6 Simplify warning message format for document deletion 2025-08-17 13:30:55 +08:00
Daniel.y
9cc9d62c89
Merge pull request #1967 from danielaskdd/pinyin-sort
Add Chinese pinyin sorting support across document operations
2025-08-17 13:18:59 +08:00
yangdx
d84715bae7 Improve MongoDB index migration with better conflict detection
• Enhanced conflict detection logic
• Improved index comparison method
2025-08-17 12:53:05 +08:00
yangdx
61469c0a56 Add Chinese pinyin sorting support across document operations
• Replace pyuca with centralized utils function
• Add pinyin sort keys for file paths
• Update MongoDB indexes with zh collation
• Migrate existing indexes for compatibility
• Support Chinese chars in Redis/JSON storage
• Keep PostgreSQL sorting order controled by Database Collate order
2025-08-17 12:45:48 +08:00
Daniel.y
a635d0625e
Merge pull request #1966 from danielaskdd/fix-select-all
Fix Document Selection Issues After Pagination Implementation
2025-08-17 10:54:19 +08:00
yangdx
6196bab00a Update webui assets and bump api version to 0203 2025-08-17 10:39:16 +08:00
yangdx
1af0803c62 fix(ui): fix selection state management in paginated views
- Replace DeselectDocumentsDialog with smart selection button
- Auto-reset selection on page/filter changes
- Remove deletion restrictions and update i18n
2025-08-17 10:38:12 +08:00
yangdx
3e4214cef3 Standardize document deletion warning messages for consistency 2025-08-17 09:35:46 +08:00
yangdx
3a7310873c Merge branch 'bedrock-support' 2025-08-17 02:23:44 +08:00
yangdx
da7e4b79e5 Update documentation in README files 2025-08-17 02:23:14 +08:00
yangdx
1ed77a2e53 Remove openai-ollama binding from LightRAG level args 2025-08-17 02:13:50 +08:00
Daniel.y
459b0e4c44
Merge pull request #1965 from danielaskdd/rm-enqueued-file
Feat: Optimize error handling for document processing pipeline
2025-08-17 01:59:33 +08:00
yangdx
301acfc274 Update webui assets 2025-08-17 01:54:39 +08:00
yangdx
bd8ed905e8 Translate Chinese comments to English in ClearDocumentsDialog 2025-08-17 01:53:37 +08:00
yangdx
e566267a20 Implement smart polling recovery after document scan completion
• Add 15-second recovery timer
• Restore intelligent intervals
2025-08-17 01:51:11 +08:00
yangdx
e064534941 feat(ui): enhance ClearDocumentsDialog with loading spinner and timeout protection
- Add loading spinner animation during document clearing operation
- Implement 30-second timeout protection to prevent hanging operations
- Disable all interactive controls during clearing to prevent duplicate requests
- Add comprehensive error handling with automatic state reset
2025-08-17 01:33:39 +08:00
yangdx
45365ff6ef Bump api version to 0202 2025-08-16 23:53:01 +08:00
yangdx
cceb46b320 fix: subdirectories are no longer processed during file scans
• Change rglob to glob for file scanning
• Simplify error logging messages
2025-08-16 23:46:33 +08:00
yangdx
f5b0c3d38c feat: Recording file extraction error status to document pipeline
- Add apipeline_enqueue_error_documents function to LightRAG class for recording file processing errors in doc_status storage
- Enhance pipeline_enqueue_file with detailed error handling for all file processing stages:
  * File access errors (permissions, not found)
  * UTF-8 encoding errors
  * Format-specific processing errors (PDF, DOCX, PPTX, XLSX)
  * Content validation errors
  * Unsupported file type errors

This implementation ensures all file extraction failures are properly tracked and recorded in the doc_status storage system, providing better visibility into document processing issues and enabling improved error monitoring and debugging capabilities.
2025-08-16 23:08:52 +08:00
yangdx
ca4c18baaa Preserve failed documents during data consistency validation for manual review 2025-08-16 22:29:46 +08:00
yangdx
e1310c5262 Optimize document processing pipeline by removing duplicate step 2025-08-16 17:23:01 +08:00
yangdx
5591ef3ac8 Fix document filtering logic and improve logging for ignored docs 2025-08-16 17:22:08 +08:00
yangdx
5d00c4c7a8 feat: move processed files to __enqueued__ directory after processing with filename conflicts handling 2025-08-16 13:19:20 +08:00
SJ
f7ca9ae16a Ruff formatted 2025-08-15 22:21:34 +00:00
yangdx
dc7a6e1c5b Update README 2025-08-16 06:15:27 +08:00
SJ
3aa3332505
Merge pull request #1 from HKUDS/main
merge
2025-08-15 17:09:03 -05:00
Daniel.y
bdd1169cfb
Merge pull request #1959 from danielaskdd/pick-trunk-by-vector
Feat: add KG related chunks selection by vector similarity
2025-08-15 19:33:51 +08:00
yangdx
2a781dfb91 Update Neo4j database naming in env.example 2025-08-15 19:14:38 +08:00
yangdx
3a227e37b8 Add get_vectors_by_ids method to MongoVectorDBStorage 2025-08-15 16:53:14 +08:00
yangdx
7a7385a200 Add efficient vector retrieval by IDs to PGVectorStorage 2025-08-15 16:51:41 +08:00
yangdx
8f7031b882 Add get_vectors_by_ids method to QdrantVectorDBStorage 2025-08-15 16:46:52 +08:00
yangdx
a71499a180 Add get_vectors_by_ids method to MilvusVectorDBStorage 2025-08-15 16:36:50 +08:00
yangdx
1e2d5252d7 Add get_vectors_by_ids method and filter out vector data from query results 2025-08-15 16:32:26 +08:00
yangdx
6cab68bb47 Improve KG chunk selection documentation and configuration clarity 2025-08-15 10:09:44 +08:00
yangdx
3acb32f547 Add comments explaining chunk deduplication behavior in query context 2025-08-15 02:19:01 +08:00
yangdx
0b45d463df Add .clinerules to .gitignore 2025-08-15 00:43:45 +08:00
yangdx
f733ac829c Remove debug logging statements from query context building 2025-08-14 23:44:34 +08:00
yangdx
4a19d0de25 Add chunk tracking system to monitor chunk sources and frequencies
• Track chunk sources (E/R/C types)
• Log frequency and order metadata
• Preserve chunk_id through processing
• Add debug logging for chunk tracking
• Handle rerank and truncation operations
2025-08-14 22:58:26 +08:00
yangdx
a8b7890470 Rename chunk selection functions for better clarity 2025-08-14 16:01:13 +08:00
yangdx
a11e8d77eb Improve missing-vector warning logic in vector similarity
- Check for any missing vectors
- Separate no-vector vs partial-vector warnings
- Ensure early return on empty vectors
2025-08-14 14:24:15 +08:00