LightRAG

Author	SHA1	Message	Date
yangdx	d5e8f1e860	Update default query parameters for better performance - Increase chunk_top_k from 10 to 20 - Reduce max_entity_tokens to 6000 - Reduce max_relation_tokens to 8000 - Update web UI default values - Fix max_total_tokens to 30000	2025-08-18 19:32:11 +08:00
yangdx	8d7a7e4ad6	Refactor prompt templates with improved guidelines and citation formats	2025-08-18 19:14:32 +08:00
yangdx	d3fde60938	refactor: remove file_path and created_at from context, improve token truncation - Remove file_path and created_at fields from entity and relationship contexts - Update token truncation to include full JSON serialization instead of content only	2025-08-18 18:30:09 +08:00
yangdx	a9d6807432	Fix query windows size limitation for Milvus data migration	2025-08-18 16:29:03 +08:00
yangdx	47b8caaf64	Stop execution on validation errors in Milvus storage • Stop execution on validation errors to prevent potential data loss	2025-08-18 14:15:07 +08:00
yangdx	453efeb924	Fix file path length checking to use UTF-8 byte length instead of char count	2025-08-18 13:59:27 +08:00
yangdx	dcec511f72	feat: increase file path length limit to 32768 and add schema migration for Milvus DB - Bump path limit to 32768 chars - Add migration detection logic - Implement dual-client migration - Auto-migrate old collections	2025-08-18 04:37:12 +08:00
yangdx	377f1a022e	fix: reset PROCESSING/FAILED docs to PENDING at the beginging of document processing pipeline - Reset documents with PROCESSING/FAILED status to PENDING when they pass consistency checks - Update doc_status storage and clear error messages/metadata on reset	2025-08-18 00:49:52 +08:00
yangdx	add8b07a21	Improve logging messages for document processing clarity	2025-08-18 00:22:04 +08:00
yangdx	14e083a1a6	fix: replace pyuca with pypinyin for Chinese pinyin sorting and add file_path sort	2025-08-17 15:21:24 +08:00
yangdx	1941df9cf6	Simplify warning message format for document deletion	2025-08-17 13:30:55 +08:00
yangdx	d84715bae7	Improve MongoDB index migration with better conflict detection • Enhanced conflict detection logic • Improved index comparison method	2025-08-17 12:53:05 +08:00
yangdx	61469c0a56	Add Chinese pinyin sorting support across document operations • Replace pyuca with centralized utils function • Add pinyin sort keys for file paths • Update MongoDB indexes with zh collation • Migrate existing indexes for compatibility • Support Chinese chars in Redis/JSON storage • Keep PostgreSQL sorting order controled by Database Collate order	2025-08-17 12:45:48 +08:00
yangdx	6196bab00a	Update webui assets and bump api version to 0203	2025-08-17 10:39:16 +08:00
yangdx	3e4214cef3	Standardize document deletion warning messages for consistency	2025-08-17 09:35:46 +08:00
yangdx	3a7310873c	Merge branch 'bedrock-support'	2025-08-17 02:23:44 +08:00
yangdx	da7e4b79e5	Update documentation in README files	2025-08-17 02:23:14 +08:00
yangdx	1ed77a2e53	Remove openai-ollama binding from LightRAG level args	2025-08-17 02:13:50 +08:00
yangdx	301acfc274	Update webui assets	2025-08-17 01:54:39 +08:00
yangdx	45365ff6ef	Bump api version to 0202	2025-08-16 23:53:01 +08:00
yangdx	cceb46b320	fix: subdirectories are no longer processed during file scans • Change rglob to glob for file scanning • Simplify error logging messages	2025-08-16 23:46:33 +08:00
yangdx	f5b0c3d38c	feat: Recording file extraction error status to document pipeline - Add apipeline_enqueue_error_documents function to LightRAG class for recording file processing errors in doc_status storage - Enhance pipeline_enqueue_file with detailed error handling for all file processing stages: * File access errors (permissions, not found) * UTF-8 encoding errors * Format-specific processing errors (PDF, DOCX, PPTX, XLSX) * Content validation errors * Unsupported file type errors This implementation ensures all file extraction failures are properly tracked and recorded in the doc_status storage system, providing better visibility into document processing issues and enabling improved error monitoring and debugging capabilities.	2025-08-16 23:08:52 +08:00
yangdx	ca4c18baaa	Preserve failed documents during data consistency validation for manual review	2025-08-16 22:29:46 +08:00
yangdx	e1310c5262	Optimize document processing pipeline by removing duplicate step	2025-08-16 17:23:01 +08:00
yangdx	5591ef3ac8	Fix document filtering logic and improve logging for ignored docs	2025-08-16 17:22:08 +08:00
yangdx	5d00c4c7a8	feat: move processed files to __enqueued__ directory after processing with filename conflicts handling	2025-08-16 13:19:20 +08:00
SJ	f7ca9ae16a	Ruff formatted	2025-08-15 22:21:34 +00:00
yangdx	dc7a6e1c5b	Update README	2025-08-16 06:15:27 +08:00
SJ	3aa3332505	Merge pull request #1 from HKUDS/main merge	2025-08-15 17:09:03 -05:00
yangdx	2a781dfb91	Update Neo4j database naming in env.example	2025-08-15 19:14:38 +08:00
yangdx	3a227e37b8	Add get_vectors_by_ids method to MongoVectorDBStorage	2025-08-15 16:53:14 +08:00
yangdx	7a7385a200	Add efficient vector retrieval by IDs to PGVectorStorage	2025-08-15 16:51:41 +08:00
yangdx	8f7031b882	Add get_vectors_by_ids method to QdrantVectorDBStorage	2025-08-15 16:46:52 +08:00
yangdx	a71499a180	Add get_vectors_by_ids method to MilvusVectorDBStorage	2025-08-15 16:36:50 +08:00
yangdx	1e2d5252d7	Add get_vectors_by_ids method and filter out vector data from query results	2025-08-15 16:32:26 +08:00
yangdx	6cab68bb47	Improve KG chunk selection documentation and configuration clarity	2025-08-15 10:09:44 +08:00
yangdx	3acb32f547	Add comments explaining chunk deduplication behavior in query context	2025-08-15 02:19:01 +08:00
yangdx	f733ac829c	Remove debug logging statements from query context building	2025-08-14 23:44:34 +08:00
yangdx	4a19d0de25	Add chunk tracking system to monitor chunk sources and frequencies • Track chunk sources (E/R/C types) • Log frequency and order metadata • Preserve chunk_id through processing • Add debug logging for chunk tracking • Handle rerank and truncation operations	2025-08-14 22:58:26 +08:00
yangdx	a8b7890470	Rename chunk selection functions for better clarity	2025-08-14 16:01:13 +08:00
yangdx	a11e8d77eb	Improve missing-vector warning logic in vector similarity - Check for any missing vectors - Separate no-vector vs partial-vector warnings - Ensure early return on empty vectors	2025-08-14 14:24:15 +08:00
yangdx	5c7ae8721b	Merge branch 'main' into pick-trunk-by-vector	2025-08-14 13:11:14 +08:00
yangdx	3bba5fc506	Fix linting	2025-08-14 13:03:23 +08:00
yangdx	772f981e7e	fix: check and process queued docs even when upload directory is empty	2025-08-14 12:35:39 +08:00
yangdx	65a4437f78	Fix: Persist document data immediately after index update	2025-08-14 12:33:36 +08:00
yangdx	28fc075c59	Simplify inconsistency logging and cleanup messages	2025-08-14 11:49:58 +08:00
yangdx	17faeb2fb8	refactor: integrate document consistency validation into pipeline processing This ensures data consistency validation is part of the main processing pipeline and provides better monitoring of inconsistent document cleanup operations.	2025-08-14 11:38:36 +08:00
yangdx	a3f7bc5b7e	Merge branch 'main' into pick-trunk-by-vector	2025-08-14 06:19:57 +08:00
yangdx	b5ae84fac6	fix: Add data consistency validation to document processing pipeline - Add _validate_and_fix_document_consistency() method to detect and fix documents with missing content in full_docs storage - Integrate consistency check into apipeline_process_enqueue_documents() to automatically mark inconsistent documents as FAILED before processing - Prevent processing errors caused by documents having status records but missing actual content data	2025-08-14 06:18:34 +08:00
yangdx	cb122c63e4	Merge branch 'main' into pick-trunk-by-vector	2025-08-14 05:34:15 +08:00

1 2 3 4 5 ...

2984 commits