Commit graph

193 commits

Author SHA1 Message Date
Jon
cf2a024e37 feat: Add endpoint and UI to retry failed documents
Add a new `/documents/reprocess_failed` API endpoint and corresponding
UI button to retry processing of failed and pending documents. This
addresses a common recovery scenario when document processing fails due
to server crashes, network errors, or LLM service outages.

Backend changes:
- Add ReprocessResponse model with status, message, and track_id fields
- Add POST /documents/reprocess_failed endpoint that triggers background
  reprocessing of FAILED, PENDING, and interrupted PROCESSING documents
- Reuses existing apipeline_process_enqueue_documents for consistency
- Includes comprehensive docstring and logging for observability

Frontend changes:
- Add TypeScript types and API function for the new endpoint
- Add retry handler with intelligent polling (fast refresh → normal)
- Add "Retry Failed" button in Documents page toolbar
- Button disabled when pipeline is busy to prevent duplicate operations
- Complete i18n support (English and Chinese translations)

This feature provides a convenient way to recover from processing
failures without requiring a full filesystem rescan.
2025-10-04 16:46:29 -04:00
yangdx
83d99e1424 fix(OllamaAPI): Add validation to ensure last message is from user role
• Validate last message role is "user"
• Raise 400 error for invalid role
• Improve API request validation
• Prevent invalid message sequences
2025-10-01 20:48:37 +08:00
yangdx
df43afc89b Relax conversation history role validation requirements
• Remove strict role value checking
• Allow any non-empty string roles
2025-09-29 13:10:15 +08:00
yangdx
7cba458f22 Limit deprecated documents endpoint to 1000 records with fair distribution 2025-09-28 11:18:10 +08:00
yangdx
91be53ffd2 Fix linting 2025-09-27 22:36:38 +08:00
yangdx
e0ac05db90 Simplify query route documentation and clarify conversation history 2025-09-27 22:36:16 +08:00
yangdx
f66a0aad8b Update query streaming endpoint docs to clarify behavior 2025-09-27 22:27:49 +08:00
yangdx
e7948df541 Fix linting 2025-09-27 15:13:07 +08:00
yangdx
1766cddd6c Fix mode parameter serialization error in Ollama chat API
• Use mode.value for API requests
• Add debug logging in aquery_llm
2025-09-27 15:11:51 +08:00
yangdx
81caee3498 Enhance query API with streaming control and comprehensive documentation
- Add stream parameter to QueryRequest
- Support non-streaming in /query/stream
- Add detailed OpenAPI response schemas
- Expand endpoint documentation
- Include usage examples and error handling
2025-09-27 11:53:31 +08:00
yangdx
a528213210 Fix logging filter logic
• Fix boolean operator precedence in filter
• Consolidate GET/POST condition logic
2025-09-26 19:42:33 +08:00
yangdx
3ba06478a8 fix http log message order for streaming respond
- Move aquery_llm call outside generator
- Execute query before stream starts
2025-09-26 19:27:44 +08:00
yangdx
8cd4139cbf refactor: fix double query problem by add aquery_llm function for consistent response handling
- Add new aquery_llm/query_llm methods providing structured responses
- Consolidate /query and /query/stream endpoints to use unified aquery_llm
- Optimize cache handling by moving cache checks before LLM calls
2025-09-26 19:05:03 +08:00
yangdx
b848ca49e6 Fix linting 2025-09-25 16:22:00 +08:00
yangdx
b08b8a6a6a Add reference list support to query API endpoints with unified result handling
• Add include_references param to QueryRequest
• Extend QueryResponse with references field
• Create unified QueryResult data structures
• Refactor kg_query and naive_query functions
• Update streaming to send references first
2025-09-25 16:21:42 +08:00
yangdx
699ca3ba00 Remove deprecated history_turns and ids parameters from query API endpoint
• Update QueryParam documentation
• Mark history_turns as deprecated
• Clean up splash screen display
• Clarify conversation_history usage
2025-09-25 04:58:57 +08:00
yangdx
5eb4a4b799 feat: simplify citations, add reference merging, and restructure API response format 2025-09-24 14:30:10 +08:00
yangdx
2adb8efdc7 Add duplicate document detection and skip processed files in scanning
- Add get_doc_by_file_path to all storages
- Skip processed files in scan operation
- Check duplicates in upload endpoints
- Check duplicates in text insert APIs
- Return status info in duplicate responses
2025-09-23 17:30:54 +08:00
yangdx
26c9ba4cb5 Make graph label methods required in BaseGraphStorage interface
• Remove fallback compatibility code
• Add get_popular_labels to ABC
• Add search_labels to ABC
• Enforce consistent implementation
• Clean up error handling paths
2025-09-20 12:40:36 +08:00
yangdx
9db8f2fce5 feat: Add popular labels and search APIs with history management
- Add popular/search label endpoints
- Implement SearchHistoryManager utility
- Replace client-side with server search
- Add graph data version tracking
- Update UI for better label discovery
2025-09-20 02:03:47 +08:00
yangdx
8f6287e27e Add path traversal security validation for file deletion operations
• Add validate_file_path_security function
• Prevent path traversal attacks
• Validate file paths before deletion
• Check both input and enqueued dirs
• Log security violations
2025-09-17 01:12:44 +08:00
yangdx
c0d5abba6b Fix linting 2025-09-15 02:59:21 +08:00
yangdx
b1c8206346 Add aquery_data endpoint for structured retrieval without LLM generation
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
17d665c9f3 Limit history messages to latest 1000 entries with truncation indicator
• Limit history to 1000 latest messages
• Add truncation message when needed
• Show count of truncated messages
• Update API documentation
• Prevent memory issues with large logs
2025-09-05 12:31:36 +08:00
yangdx
25b5d176cd Fix label selection with leading/trailing whitespace
• Fix AsyncSelect value trimming issue
• Preserve whitespace in label display
• Use safe keys for command items
• Add GraphControl dependency fix
• Add debug logging for graph labels
2025-08-31 02:54:39 +08:00
yangdx
3d5e6226a9 Refactored rerank_example file to utilize the updated rerank function. 2025-08-23 22:51:41 +08:00
yangdx
9b7ed84e05 Improve document deletion error handling and message consistency
- Standardize deletion log messages
- Add try-catch for file operations
- Improve enqueued file error handling
2025-08-20 11:01:24 +08:00
yangdx
2603e99005 Enhance file deletion to remove files from both input and enqueued dirs 2025-08-19 17:13:58 +08:00
yangdx
9ed5b93467 Add [File Extraction] prefix to error messages and logs 2025-08-19 11:33:28 +08:00
yangdx
377f1a022e fix: reset PROCESSING/FAILED docs to PENDING at the beginging of document processing pipeline
- Reset documents with PROCESSING/FAILED status to PENDING when they pass consistency checks
- Update doc_status storage and clear error messages/metadata on reset
2025-08-18 00:49:52 +08:00
yangdx
add8b07a21 Improve logging messages for document processing clarity 2025-08-18 00:22:04 +08:00
yangdx
14e083a1a6 fix: replace pyuca with pypinyin for Chinese pinyin sorting and add file_path sort 2025-08-17 15:21:24 +08:00
yangdx
61469c0a56 Add Chinese pinyin sorting support across document operations
• Replace pyuca with centralized utils function
• Add pinyin sort keys for file paths
• Update MongoDB indexes with zh collation
• Migrate existing indexes for compatibility
• Support Chinese chars in Redis/JSON storage
• Keep PostgreSQL sorting order controled by Database Collate order
2025-08-17 12:45:48 +08:00
yangdx
cceb46b320 fix: subdirectories are no longer processed during file scans
• Change rglob to glob for file scanning
• Simplify error logging messages
2025-08-16 23:46:33 +08:00
yangdx
f5b0c3d38c feat: Recording file extraction error status to document pipeline
- Add apipeline_enqueue_error_documents function to LightRAG class for recording file processing errors in doc_status storage
- Enhance pipeline_enqueue_file with detailed error handling for all file processing stages:
  * File access errors (permissions, not found)
  * UTF-8 encoding errors
  * Format-specific processing errors (PDF, DOCX, PPTX, XLSX)
  * Content validation errors
  * Unsupported file type errors

This implementation ensures all file extraction failures are properly tracked and recorded in the doc_status storage system, providing better visibility into document processing issues and enabling improved error monitoring and debugging capabilities.
2025-08-16 23:08:52 +08:00
yangdx
5d00c4c7a8 feat: move processed files to __enqueued__ directory after processing with filename conflicts handling 2025-08-16 13:19:20 +08:00
yangdx
3bba5fc506 Fix linting 2025-08-14 13:03:23 +08:00
yangdx
772f981e7e fix: check and process queued docs even when upload directory is empty 2025-08-14 12:35:39 +08:00
yangdx
fd0ae4646f Fixes crash when processing files with UTF-8 encoding error
- Fix TypeError "cannot unpack non-iterable bool object" in document processing
- Change all error returns from `False` to `(False, "")` for consistency
- Ensure pipeline_enqueue_file always returns tuple (bool, str)
- Add missing return statement for no-content-extracted case
- Improve error handling for UTF-8 encoding issues and unsupported file types
2025-08-14 05:31:38 +08:00
yangdx
c22315ea6d refactor: remove selective LLM cache clearing functionality
- Remove optional 'modes' parameter from aclear_cache() and clear_cache() methods
- Replace deprecated drop_cache_by_modes() with drop() method for complete cache clearing
- Update API endpoint to ignore mode-specific parameters and clear all cache
- Simplify frontend clearCache() function to send empty request body

This change ensures all LLM cache is cleared together.
2025-08-05 23:51:51 +08:00
yangdx
e04d8ed8a7 Improved storage drop logging with namespace details
- Added namespace and workspace to drop logs
2025-08-04 00:56:39 +08:00
yangdx
7505195303 fix: add full_entities and full_relations to clear_documents storage list 2025-08-03 23:02:58 +08:00
yangdx
0eac1a883a Feat: add file path sorting for document manager
- Add file_path sorting support to all database backends (JSON, Redis, PostgreSQL, MongoDB)
- Implement smart column header switching between "ID" and "File Name" based on display mode
- Add automatic sort field switching when toggling between ID and file name display
- Create composite indexes for workspace+file_path in PostgreSQL and MongoDB for better query performance
- Update frontend to maintain sort state when switching display modes
- Add internationalization support for "fileName" in English and Chinese locales

This enhancement improves user experience by providing intuitive file-based sorting
while maintaining performance through optimized database indexes.
2025-07-30 18:46:55 +08:00
yangdx
74eecc46e5 feat(pagination): Implement document list pagination backends and frontend UI
- Add pagination support to BaseDocStatusStorage interface and all implementations (PostgreSQL, MongoDB, Redis, JSON)
- Implement RESTful API endpoints for paginated document queries and status counts
- Create reusable pagination UI components with internationalization support
- Optimize performance with database-level pagination and efficient in-memory processing
- Maintain backward compatibility while adding configurable page sizes (10-200 items)
2025-07-30 17:58:32 +08:00
yangdx
c24c2ff2f6 Remove deprecated temp file saving function
- Delete unused save_temp_file function
2025-07-30 14:23:08 +08:00
yangdx
29e829113b Fix status key serialization issue in get_rack_status 2025-07-30 04:45:48 +08:00
yangdx
7207598fc4 Fix track_id bugs and add track_id to scanning response 2025-07-30 03:06:20 +08:00
yangdx
6f958d5aee feat: add metadata timestamps to document processing and update frontend compatibility
- Add metadata field to doc_status storage with Unix timestamps for processing start/end times
- Update frontend API types: error -> error_msg, add track_id and metadata support
- Add getTrackStatus API method for document tracking functionality
- Fix frontend DocumentManager to use error_msg field for proper error display
- Ensure full compatibility between backend metadata changes and frontend UI
2025-07-30 00:04:27 +08:00
yangdx
6014b9bf73 feat: add track_id support for document processing progress monitoring
- Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON)
- Implement automatic track_id generation with upload_/insert_ prefixes
- Add /track_status/{track_id} API endpoint for frontend progress queries
- Create database indexes for efficient track_id lookups
- Enable real-time document processing status tracking across all storage types
2025-07-29 22:24:21 +08:00
yangdx
f2ffff063b feat: refactor ollama server configuration management
- Add ollama_server_infos attribute to LightRAG class with default initialization
- Move default values to constants.py for centralized configuration
- Refactor OllamaServerInfos class with property accessors and CLI support
- Update OllamaAPI to get configuration through rag object instead of direct import
- Add command line arguments for simulated model name and tag
- Fix type imports to avoid circular dependencies
2025-07-28 01:38:35 +08:00