yangdx
5eb4a4b799
feat: simplify citations, add reference merging, and restructure API response format
2025-09-24 14:30:10 +08:00
yangdx
c0d5abba6b
Fix linting
2025-09-15 02:59:21 +08:00
yangdx
b1c8206346
Add aquery_data endpoint for structured retrieval without LLM generation
...
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
82a67354d0
Code formatting improvements and style consistency fixes
...
* Remove trailing whitespace
* Fix function signature ellipsis style
2025-09-14 17:49:02 +08:00
yangdx
0ffb5d5f2d
Replace search API with aquery_data for consistent raw data retrieval, mirroring aquery results
...
• Reuse existing query logic paths and remove kg_search function entirely
• Update kg_query/naive_query to return raw data as needed
2025-09-13 15:30:29 +08:00
yangdx
6774058670
Merge branch 'main' into tongda/main
2025-09-09 22:43:17 +08:00
yangdx
077d9be5d7
Add Deepseek Style Chain of Thought (CoT) Support for OpenAI Compatible LLM providers
...
- Add enable_cot parameter to all LLM APIs
- Implement CoT for OpenAI with <think> tags
- Log warnings for unsupported providers
- Enable CoT in query operations
- Handle streaming and non-streaming CoT
2025-09-09 22:34:36 +08:00
yangdx
3477e9f919
Merge branch 'main' into tongda/main
2025-09-09 18:27:56 +08:00
yangdx
3059089e7d
Fix logging order in pipeline history trimming
2025-09-08 23:00:44 +08:00
yangdx
9437df83cc
Add memory management for pipeline history messages
...
- Trim history at 10k messages
- Keep latest 5k messages
- Prevent memory growth
- Add logging for trim events
2025-09-08 15:56:35 +08:00
yangdx
387d817fc2
Remove trailing colons from queue names in function wrappers
2025-09-06 00:53:05 +08:00
yangdx
de972f6222
Rename method for clarity and improve code readability
...
- Rename _process_entity_relation_graph to _process_extract_entities
2025-09-04 11:48:31 +08:00
Tong Da
dc7ce98c7e
Add search interface to lightrag.
2025-09-01 02:40:40 +08:00
yangdx
1a015a7015
Add queue_name parameter to priority_limit_async_func_call for better logging
...
• Add queue_name parameter to decorator
• Update all log messages with queue names
• Pass specific names for LLM and embedding
2025-08-31 23:47:22 +08:00
yangdx
925e631a9a
refac: Add robust time out handling for LLM request
2025-08-29 13:50:35 +08:00
yangdx
ff0a18e08c
Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method
2025-08-27 12:23:22 +08:00
Thibo Rosemplatt
c3aabfc251
Merge branch 'main' into entityTypesServerSupport
2025-08-26 21:48:20 +02:00
yangdx
d3623cc9ae
fix: resolve infinite loop risk in _handle_entity_relation_summary
...
- Ensure oversized descriptions are force-merged with subsequent ones
- Add len(current_list) <= 2 termination condition to guarantee convergence
- Implement token-based truncation in _summarize_descriptions to prevent overflow
2025-08-26 21:58:31 +08:00
yangdx
6bcfe696ee
feat: add output length recommendation and description type to LLM summary
...
- Add SUMMARY_LENGTH_RECOMMENDED parameter (600 tokens)
- Optimize prompt temple for LLM summary
2025-08-26 14:41:12 +08:00
yangdx
cb0fe38b9a
Fix linting
2025-08-26 02:22:34 +08:00
yangdx
de2daf6565
refac: Rename summary_max_tokens to summary_context_size, comprehensive parameter validation for summary configuration
...
- Update algorithm logic in operate.py for better token management
- Fix health endpoint to use correct parameter names
2025-08-26 01:35:50 +08:00
yangdx
0b1b264a5d
refactor: optimize graph lock scope in document deletion
...
- Move dependency analysis outside graph database lock
- Add persistence call before lock release to prevent dirty reads
2025-08-25 17:46:32 +08:00
Thibo Rosemplatt
d054ec5d00
Added entity_types as a user defined variable (via .env)
2025-08-23 20:16:11 +02:00
yangdx
bf43e1b8c1
fix: Resolve default rerank config problem when env var missing
...
- Read config from selected_rerank_func when env var missing
- Make api_key optional for rerank function
- Add response format validation with proper error handling
- Update Cohere rerank default to official API endpoint
2025-08-23 01:07:59 +08:00
yangdx
0e67ead8fa
Rename MAX_TOKENS to SUMMARY_MAX_TOKENS for clarity
2025-08-21 10:15:20 +08:00
yangdx
9b7ed84e05
Improve document deletion error handling and message consistency
...
- Standardize deletion log messages
- Add try-catch for file operations
- Improve enqueued file error handling
2025-08-20 11:01:24 +08:00
yangdx
485c4b7de7
Change document deletion warnings to info level logging
2025-08-20 03:28:42 +08:00
yangdx
806081645f
Refactor text cleaning to use sanitize_text_for_encoding consistently
...
• Replace clean_text with sanitize_text
• Remove deprecated clean_text function
• Add whitespace trimming to sanitizer
• Improve UTF-8 encoding safety
• Consolidate text cleaning logic
2025-08-19 19:20:01 +08:00
yangdx
e38df464ea
Ensure front-end file type uploads are synchronized with back-end
2025-08-19 15:10:13 +08:00
yangdx
1c4d6fde58
Change log level from info to debug for document storage message
2025-08-18 20:04:29 +08:00
yangdx
377f1a022e
fix: reset PROCESSING/FAILED docs to PENDING at the beginging of document processing pipeline
...
- Reset documents with PROCESSING/FAILED status to PENDING when they pass consistency checks
- Update doc_status storage and clear error messages/metadata on reset
2025-08-18 00:49:52 +08:00
yangdx
add8b07a21
Improve logging messages for document processing clarity
2025-08-18 00:22:04 +08:00
yangdx
1941df9cf6
Simplify warning message format for document deletion
2025-08-17 13:30:55 +08:00
yangdx
3e4214cef3
Standardize document deletion warning messages for consistency
2025-08-17 09:35:46 +08:00
yangdx
cceb46b320
fix: subdirectories are no longer processed during file scans
...
• Change rglob to glob for file scanning
• Simplify error logging messages
2025-08-16 23:46:33 +08:00
yangdx
f5b0c3d38c
feat: Recording file extraction error status to document pipeline
...
- Add apipeline_enqueue_error_documents function to LightRAG class for recording file processing errors in doc_status storage
- Enhance pipeline_enqueue_file with detailed error handling for all file processing stages:
* File access errors (permissions, not found)
* UTF-8 encoding errors
* Format-specific processing errors (PDF, DOCX, PPTX, XLSX)
* Content validation errors
* Unsupported file type errors
This implementation ensures all file extraction failures are properly tracked and recorded in the doc_status storage system, providing better visibility into document processing issues and enabling improved error monitoring and debugging capabilities.
2025-08-16 23:08:52 +08:00
yangdx
ca4c18baaa
Preserve failed documents during data consistency validation for manual review
2025-08-16 22:29:46 +08:00
yangdx
e1310c5262
Optimize document processing pipeline by removing duplicate step
2025-08-16 17:23:01 +08:00
yangdx
5591ef3ac8
Fix document filtering logic and improve logging for ignored docs
2025-08-16 17:22:08 +08:00
yangdx
5c7ae8721b
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 13:11:14 +08:00
yangdx
3bba5fc506
Fix linting
2025-08-14 13:03:23 +08:00
yangdx
65a4437f78
Fix: Persist document data immediately after index update
2025-08-14 12:33:36 +08:00
yangdx
28fc075c59
Simplify inconsistency logging and cleanup messages
2025-08-14 11:49:58 +08:00
yangdx
17faeb2fb8
refactor: integrate document consistency validation into pipeline processing
...
This ensures data consistency validation is part of the main processing pipeline and provides better monitoring of inconsistent document cleanup operations.
2025-08-14 11:38:36 +08:00
yangdx
a3f7bc5b7e
Merge branch 'main' into pick-trunk-by-vector
2025-08-14 06:19:57 +08:00
yangdx
b5ae84fac6
fix: Add data consistency validation to document processing pipeline
...
- Add _validate_and_fix_document_consistency() method to detect and fix documents with missing content in full_docs storage
- Integrate consistency check into apipeline_process_enqueue_documents() to automatically mark inconsistent documents as FAILED before processing
- Prevent processing errors caused by documents having status records but missing actual content data
2025-08-14 06:18:34 +08:00
yangdx
f1dafa0d01
feat: KG related chunks selection by vector similarity
...
- Add env switch to toggle weighted polling vs vector-similarity strategy
- Implement similarity-based sorting with fallback to weighted
- Introduce batch vector read API for vector storage
- Implement vector store and retrive funtion for Nanovector DB
- Preserve default behavior (weighted polling selection method)
2025-08-13 18:16:42 +08:00
yangdx
0b2c3d06c7
- Remove redundant collection listing check
2025-08-12 15:24:06 +08:00
yangdx
fc8ca1a706
Fix: add muti-process lock for initialize and drop method for all storage
2025-08-12 04:25:09 +08:00
yangdx
44204abef7
Fix linting
2025-08-10 10:59:32 +08:00