LightRAG

Author	SHA1	Message	Date
yangdx	19c16bc464	Add content deduplication check for document insertion endpoints • Check content hash before insertion • Return duplicated status if exists • Use sanitized text for hash computation • Apply to both single and batch inserts • Prevent duplicate content processing	2025-12-02 17:49:48 +08:00
yangdx	8d28b95966	Fix duplicate document responses to return original track_id - Return existing track_id for duplicates - Remove track_id generation in reprocess - Update reprocess response documentation - Clarify track_id behavior in comments - Update API response examples	2025-12-02 14:32:28 +08:00
yangdx	b7de694f48	Add comprehensive error logging across API routes - Add error logs to Ollama API endpoints - Replace logging with unified logger - Log streaming query errors - Add data query error logging - Include stack traces for debugging	2025-11-19 22:50:06 +08:00
yangdx	0fb2925c6a	Remove ascii_colors dependency and fix stream handling errors • Remove ascii_colors.trace_exception calls • Add SafeStreamHandler for closed streams • Patch ascii_colors console handler • Prevent ValueError on stream close • Improve logging error handling	2025-11-19 21:38:17 +08:00
yangdx	95cd0ece74	Fix DOCX table extraction by escaping special characters in cells - Add escape_cell() function - Escape backslashes first - Handle tabs and newlines - Preserve tab-delimited format - Prevent double-escaping issues	2025-11-19 09:54:35 +08:00
yangdx	87de2b3e9e	Update XLSX extraction documentation to reflect current implementation	2025-11-19 04:26:41 +08:00
yangdx	0244699d81	Optimize XLSX extraction by using sheet.max_column instead of two-pass scan • Remove two-pass row scanning approach • Use built-in sheet.max_column property • Simplify column width detection logic • Improve memory efficiency • Maintain column alignment preservation	2025-11-19 04:02:39 +08:00
yangdx	2b16016312	Optimize XLSX extraction to avoid storing all rows in memory • Remove intermediate row storage • Use iterator twice instead of list() • Preserve column alignment logic • Reduce memory footprint • Maintain same output format	2025-11-19 03:48:36 +08:00
yangdx	ef659a1e09	Preserve column alignment in XLSX extraction with two-pass processing • Two-pass approach for consistent width • Maintain tabular structure integrity • Determine max columns first pass • Extract with alignment second pass • Prevent column misalignment issues	2025-11-19 03:34:22 +08:00
yangdx	3efb1716b4	Enhance XLSX extraction with structured tab-delimited format and escaping - Add clear sheet separators - Escape special characters - Trim trailing empty columns - Preserve row structure - Single-pass optimization	2025-11-19 03:06:29 +08:00
yangdx	e7d2803a65	Remove text stripping in DOCX extraction to preserve whitespace • Keep original paragraph spacing • Preserve cell whitespace in tables • Maintain document formatting • Don't strip leading/trailing spaces	2025-11-19 02:12:27 +08:00
yangdx	186c8f0e16	Preserve blank paragraphs in DOCX extraction to maintain spacing • Remove text emptiness check • Always append paragraph text • Maintain document formatting • Preserve original spacing	2025-11-19 02:03:10 +08:00
yangdx	fa887d811b	Fix table column structure preservation in DOCX extraction • Always append cell text to maintain columns • Preserve empty cells in table structure • Check for any content before adding rows • Use tab separation for proper alignment • Improve table formatting consistency	2025-11-19 01:52:02 +08:00
yangdx	4438ba41a3	Enhance DOCX extraction to preserve document order with tables • Include tables in extracted content • Maintain original document order • Add spacing around tables • Use tabs to separate table cells • Process all body elements sequentially	2025-11-19 01:31:33 +08:00
yangdx	702cfd2981	Fix document deletion concurrency control and validation logic • Clarify job naming for single vs batch deletion • Update job name validation in busy pipeline check	2025-11-18 13:59:24 +08:00
yangdx	1745b30a5f	Fix missing workspace parameter in update flags status call	2025-11-18 12:55:48 +08:00
yangdx	52c812b9a0	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls	2025-11-17 12:54:33 +08:00
yangdx	926960e957	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking	2025-11-17 12:54:33 +08:00
yangdx	c246eff725	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety)	2025-11-17 12:54:32 +08:00
yangdx	69a0b74ce7	refactor: move document deps to api group, remove dynamic imports - Merge offline-docs into api extras - Remove pipmaster dynamic installs - Add async document processing - Pre-check docling availability - Update offline deployment docs	2025-11-17 12:54:32 +08:00
yangdx	c434879c7a	Replace PyPDF2 with pypdf for PDF processing - Update import from PyPDF2 to pypdf - Change dependency to pypdf>=6.1.0 - Update all requirements files - Remove PyPDF2 from lock file - Use modern pypdf library	2025-11-17 12:54:32 +08:00
BukeLy	eb52ec94d7	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed	2025-11-17 12:53:44 +08:00
anouarbm	c9e1c6c1c2	fix(api): change content field to list in query responses BREAKING CHANGE: content field is now List[str] instead of str - Add ReferenceItem Pydantic model for type safety - Update /query and /query/stream to return content as list - Update OpenAPI schema and examples - Add migration guide to API README - Fix RAGAS evaluation to handle list format Addresses PR #2297 feedback. Tested with RAGAS: 97.37% score.	2025-11-03 04:57:08 +01:00
anouarbm	9d69e8d776	fix(api): Change content field from string to list in query responses BREAKING CHANGE: The `content` field in query response references is now an array of strings instead of a concatenated string. This preserves individual chunk boundaries when a single file has multiple chunks. Changes: - Update QueryResponse Pydantic model to accept List[str] for content - Modify query_text endpoint to return content as list (query_routes.py:425) - Modify query_text_stream endpoint to support chunk content enrichment - Update OpenAPI schema and examples to reflect array structure - Update API README with breaking change notice and migration guide - Fix RAGAS evaluation to flatten chunk content lists	2025-11-03 04:37:09 +01:00
anouarbm	0b5e3f9dc4	Use logger in RAG evaluation and optimize reference content joins	2025-11-02 18:43:53 +01:00
anouarbm	963ad4c637	docs: Add documentation and examples for include_chunk_content parameter Added comprehensive documentation for the new include_chunk_content parameter that enables retrieval of actual chunk text content in API responses. Documentation Updates: - Added "Include Chunk Content in References" section to API README - Explained use cases: RAG evaluation, debugging, citations, transparency - Provided JSON request/response examples - Clarified parameter interaction with include_references OpenAPI/Swagger Examples: - Added "Response with chunk content" example to /query endpoint - Shows complete reference structure with content field - Demonstrates realistic chunk text content This makes the feature discoverable through: 1. API documentation (README.md) 2. Interactive Swagger UI (http://localhost:9621/docs) 3. Code examples for developers	2025-11-02 17:53:27 +01:00
anouarbm	0bbef9814e	Optimize RAGAS evaluation with parallel execution and chunk content enrichment Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking. Key Features: - Single API call per evaluation (2x faster than before) - Parallel evaluation based on MAX_ASYNC environment variable - Chunk content enrichment in /query endpoint responses - Comprehensive benchmark statistics (moyennes) - NaN-safe metric calculations API Changes: - Added include_chunk_content parameter to QueryRequest (backward compatible) - /query endpoint enriches references with actual chunk content when requested - No breaking changes - default behavior unchanged Evaluation Improvements: - Parallel execution using asyncio.Semaphore (respects MAX_ASYNC) - Shared HTTP client with connection pooling - Proper timeout handling (3min connect, 5min read) - Debug output for context retrieval verification - Benchmark statistics with averages, min/max scores Results: - Moyenne RAGAS Score: 0.9772 - Perfect Faithfulness: 1.0000 - Perfect Context Recall: 1.0000 - Perfect Context Precision: 1.0000 - Excellent Answer Relevance: 0.9087	2025-11-02 17:39:43 +01:00
yangdx	61b57cbb5d	Add PDF decryption support for password-protected files • Add PDF_DECRYPT_PASSWORD env variable • Check encryption status before reading • Handle decrypt errors gracefully • Log detailed error messages • Support both encrypted/plain PDFs	2025-11-01 15:01:17 +08:00
yangdx	c46c1b26a9	Add pycryptodome dependency for PDF encryption support	2025-10-31 01:49:42 +08:00
yangdx	5155edd8d2	feat: Improve entity merge and edit UX - API: The `graph/entity/edit` endpoint now returns a detailed `operation_summary` for better client-side handling of update, rename, and merge outcomes. - Web UI: Added an "auto-merge on rename" option. The UI now gracefully handles merge success, partial failures (update OK, merge fail), and other errors with specific user feedback.	2025-10-27 23:42:08 +08:00
yangdx	97034f06e3	Add allow_merge parameter to entity update API endpoint	2025-10-27 14:30:27 +08:00
yangdx	6015e8bc68	Refactor graph utils to use unified persistence callback - Add _persist_graph_updates function - Remove duplicate callback functions	2025-10-26 20:20:16 +08:00
yangdx	bf1897a67e	Normalize entity order for undirected graph consistency • Normalize entity pairs for storage • Update API docs for undirected edges	2025-10-26 15:53:31 +08:00
Daniel.y	c82485d94d	Merge pull request #2253 from Mobious/main Allow users to provide keywords with QueryRequest	2025-10-25 11:26:54 +08:00
yangdx	78ad8873b8	Add cancellation check in delete loop	2025-10-24 14:47:20 +08:00
yangdx	743aefc655	Add pipeline cancellation feature for graceful processing termination • Add cancel_pipeline API endpoint • Implement PipelineCancelledException • Add cancellation checks in main loop • Handle task cancellation gracefully • Mark cancelled docs as FAILED	2025-10-24 14:08:12 +08:00
Mobious	f24a261613	Allow users to provide keywords with QueryRequest	2025-10-23 12:53:19 -10:00
yangdx	8dc23eeff2	Fix RayAnything compatible problem • Use "preprocessed" to indicate multimodal processing is required • Update DocProcessingStatus to process status convertion automatically • Remove multimodal_processed from DocStatus enum value • Update UI filter logic	2025-10-22 20:15:29 +08:00
yangdx	162370b6e6	Add optional LLM cache deletion when deleting documents • Add delete_llm_cache parameter to API • Collect cache IDs from text chunks • Delete cache after graph operations • Update UI with new checkbox option • Add i18n translations for cache option	2025-10-22 12:19:23 +08:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	130b4959dc	Add PREPROCESSED (multimodal_processed) status for multimodal document processing • Add DocStatus.PREPROCESSED enum value • Update API routes and response models • Add preprocessed filter in web UI • Update localization files • Handle preprocessed status in deletion	2025-10-14 14:02:05 +08:00
yangdx	12facac506	Enhance graph API endpoints with detailed docs and field validation - Remove redundant README section - Add Pydantic field validation - Expand endpoint docstrings - Include request/response examples - Document merge operation benefits	2025-10-10 12:49:00 +08:00
yangdx	85d1a563b3	Merge branch 'adminunblinded/main'	2025-10-10 12:31:47 +08:00
NeelM0906	b7c77396a0	Fix entity/relation creation endpoints to properly update vector stores - Changed create_entity to use rag.acreate_entity() instead of direct graph manipulation - Changed create_relation to use rag.acreate_relation() instead of direct graph manipulation - This ensures vector embeddings are created and entities/relations are searchable - Adds proper concurrency locks and metadata population	2025-10-09 17:02:17 -04:00
NeelM0906	f6d1fb98ac	Fix Linting errors	2025-10-09 16:52:22 -04:00
NeelM0906	9f44e89de7	Add knowledge graph manipulation endpoints Added three new REST API endpoints for direct knowledge graph manipulation: - POST /graph/entity/create: Create new entities in the knowledge graph - POST /graph/relation/create: Create relationships between entities - POST /graph/entities/merge: Merge duplicate/misspelled entities while preserving relationships The merge endpoint is particularly useful for consolidating entities discovered after document processing, fixing spelling errors, and cleaning up the knowledge graph. All relationships from source entities are transferred to the target entity, with intelligent handling of duplicate relationships. Updated API documentation in lightrag/api/README.md with usage examples for all three endpoints.	2025-10-08 15:59:47 -04:00
Jon	cf2a024e37	feat: Add endpoint and UI to retry failed documents Add a new `/documents/reprocess_failed` API endpoint and corresponding UI button to retry processing of failed and pending documents. This addresses a common recovery scenario when document processing fails due to server crashes, network errors, or LLM service outages. Backend changes: - Add ReprocessResponse model with status, message, and track_id fields - Add POST /documents/reprocess_failed endpoint that triggers background reprocessing of FAILED, PENDING, and interrupted PROCESSING documents - Reuses existing apipeline_process_enqueue_documents for consistency - Includes comprehensive docstring and logging for observability Frontend changes: - Add TypeScript types and API function for the new endpoint - Add retry handler with intelligent polling (fast refresh → normal) - Add "Retry Failed" button in Documents page toolbar - Button disabled when pipeline is busy to prevent duplicate operations - Complete i18n support (English and Chinese translations) This feature provides a convenient way to recover from processing failures without requiring a full filesystem rescan.	2025-10-04 16:46:29 -04:00
yangdx	83d99e1424	fix(OllamaAPI): Add validation to ensure last message is from user role • Validate last message role is "user" • Raise 400 error for invalid role • Improve API request validation • Prevent invalid message sequences	2025-10-01 20:48:37 +08:00
yangdx	df43afc89b	Relax conversation history role validation requirements • Remove strict role value checking • Allow any non-empty string roles	2025-09-29 13:10:15 +08:00
yangdx	7cba458f22	Limit deprecated documents endpoint to 1000 records with fair distribution	2025-09-28 11:18:10 +08:00

1 2 3 4 5

239 commits