LightRAG

Author	SHA1	Message	Date
clssck	663ada943a	chore: add citation system and enhance RAG UI components Add citation tracking and display system across backend and frontend components. Backend changes include citation.py for document attribution, enhanced query routes with citation metadata, improved prompt templates, and PostgreSQL schema updates. Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements, and ChatMessage enhancements for displaying document sources. Update dependencies and docker-compose test configuration for improved development workflow.	2025-12-01 17:50:00 +01:00
clssck	43af31f888	feat: add db_degree visibility and orphan connection UI Graph Connectivity Awareness: - Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph) - Show database degree vs visual degree in node panel with amber badge - Add visual indicator (amber border) for nodes with hidden connections - Add "Load X hidden connection(s)" button to expand hidden neighbors - Add configurable "Expand Depth" setting (1-5) in graph settings - Use global maxNodes setting for node expansion consistency Orphan Connection UI: - Add OrphanConnectionDialog component for manual orphan entity connection - Add OrphanConnectionControl button in graph sidebar - Expose /graph/orphans/connect API endpoint for frontend use Backend Improvements: - Add get_orphan_entities() and connect_orphan_entities() to base storage - Add orphan connection configuration parameters - Improve entity extraction with relationship density requirements Frontend: - Add graphExpandDepth and graphIncludeOrphans to settings store - Add min_degree and include_orphans graph filtering parameters - Update translations (en.json, zh.json)	2025-11-29 21:08:07 +01:00
clssck	d2c9e6e2ec	test(lightrag): add orphan connection feature with quality validation tests Implement automatic orphan entity connection system that identifies entities with no relationships and creates meaningful connections via vector similarity + LLM validation. This improves knowledge graph connectivity and retrieval quality. Changes: - Add orphan connection configuration parameters (thresholds, cross-connect settings) - Implement aconnect_orphan_entities() method with 4-step validation pipeline - Add SQL templates for efficient orphan and candidate entity queries - Create POST /graph/orphans/connect API endpoint with configurable parameters - Add orphan connection validation prompt for LLM-based relationship verification - Include relationship density requirement in extraction prompts to prevent orphans - Update docker-compose.test.yml with optimized extraction parameters - Add quality validation test suite (run_quality_tests.py) for retrieval evaluation - Add unit test framework (test_orphan_connection_quality.py) with test cases - Enable auto-run of orphan connection after document processing	2025-11-28 18:23:30 +01:00
clssck	b6074b9a81	chore(lightrag, lightrag_webui): improve code quality and security - Extract PostgreSQL storage check into named variable for clarity - Move APIRouter initialization into create_table_routes function scope - Add robust type handling for database query results - Add input validation for table names and pagination parameters - Add regex-based SQL injection prevention for table name sanitization - Improve clipboard copy fallback logic and error handling - Add memoization for JSON serialization to prevent unnecessary recalculations - Hide meta column from table explorer UI display - Sort table columns alphabetically for consistent ordering - Add keyboard accessibility to status filter buttons - Add preprocessed status filter to document manager - Update @tanstack/react-query from 5.60.0 to 5.87.1 - Extract dev storage config into constant to reduce duplication - Update documentation comments for clarity	2025-11-27 21:39:42 +01:00
clssck	a9edadef45	feat: add Table Explorer feature with dynamic table data fetching and schema display - Implemented Table Explorer component to allow users to select and view database tables. - Added API calls for fetching table list, schema, and paginated data. - Introduced row detail modal for displaying and copying row data. - Enhanced DataTable component to support row click events. - Updated UI components for better user experience and accessibility. - Added mock data for development mode to facilitate testing. - Updated localization files to include new terms related to tables. - Modified settings store to include storage configuration for conditional UI rendering. - Improved styling and layout for various components to align with new design standards.	2025-11-27 18:27:14 +01:00
clssck	48c7732edc	feat: add automatic entity resolution with 3-layer matching Implement automatic entity resolution to prevent duplicate nodes in the knowledge graph. The system uses a 3-layer approach: 1. Case-insensitive exact matching (free, instant) 2. Fuzzy string matching >85% threshold (free, instant) 3. Vector similarity + LLM verification (for acronyms/synonyms) Key features: - Pre-resolution phase prevents race conditions in parallel processing - Numeric suffix detection blocks false matches (IL-4 ≠ IL-13) - PostgreSQL alias cache for fast lookups on subsequent ingestion - Configurable thresholds via environment variables Bug fixes included: - Fix fuzzy matching false positives for numbered entities - Fix alias cache not being populated (missing db parameter) - Skip entity_aliases table from generic id index creation New files: - lightrag/entity_resolution/ - Core resolution module - tests/test_entity_resolution/ - Unit tests - docker/postgres-age-vector/ - Custom PG image with pgvector + AGE - docker-compose.test.yml - Integration test environment Configuration (env.example): - ENTITY_RESOLUTION_ENABLED=true - ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85 - ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5 - ENTITY_RESOLUTION_MAX_CANDIDATES=3	2025-11-27 15:35:02 +01:00
yangdx	b7de694f48	Add comprehensive error logging across API routes - Add error logs to Ollama API endpoints - Replace logging with unified logger - Log streaming query errors - Add data query error logging - Include stack traces for debugging	2025-11-19 22:50:06 +08:00
yangdx	0fb2925c6a	Remove ascii_colors dependency and fix stream handling errors • Remove ascii_colors.trace_exception calls • Add SafeStreamHandler for closed streams • Patch ascii_colors console handler • Prevent ValueError on stream close • Improve logging error handling	2025-11-19 21:38:17 +08:00
yangdx	95cd0ece74	Fix DOCX table extraction by escaping special characters in cells - Add escape_cell() function - Escape backslashes first - Handle tabs and newlines - Preserve tab-delimited format - Prevent double-escaping issues	2025-11-19 09:54:35 +08:00
yangdx	87de2b3e9e	Update XLSX extraction documentation to reflect current implementation	2025-11-19 04:26:41 +08:00
yangdx	0244699d81	Optimize XLSX extraction by using sheet.max_column instead of two-pass scan • Remove two-pass row scanning approach • Use built-in sheet.max_column property • Simplify column width detection logic • Improve memory efficiency • Maintain column alignment preservation	2025-11-19 04:02:39 +08:00
yangdx	2b16016312	Optimize XLSX extraction to avoid storing all rows in memory • Remove intermediate row storage • Use iterator twice instead of list() • Preserve column alignment logic • Reduce memory footprint • Maintain same output format	2025-11-19 03:48:36 +08:00
yangdx	ef659a1e09	Preserve column alignment in XLSX extraction with two-pass processing • Two-pass approach for consistent width • Maintain tabular structure integrity • Determine max columns first pass • Extract with alignment second pass • Prevent column misalignment issues	2025-11-19 03:34:22 +08:00
yangdx	3efb1716b4	Enhance XLSX extraction with structured tab-delimited format and escaping - Add clear sheet separators - Escape special characters - Trim trailing empty columns - Preserve row structure - Single-pass optimization	2025-11-19 03:06:29 +08:00
yangdx	e7d2803a65	Remove text stripping in DOCX extraction to preserve whitespace • Keep original paragraph spacing • Preserve cell whitespace in tables • Maintain document formatting • Don't strip leading/trailing spaces	2025-11-19 02:12:27 +08:00
yangdx	186c8f0e16	Preserve blank paragraphs in DOCX extraction to maintain spacing • Remove text emptiness check • Always append paragraph text • Maintain document formatting • Preserve original spacing	2025-11-19 02:03:10 +08:00
yangdx	fa887d811b	Fix table column structure preservation in DOCX extraction • Always append cell text to maintain columns • Preserve empty cells in table structure • Check for any content before adding rows • Use tab separation for proper alignment • Improve table formatting consistency	2025-11-19 01:52:02 +08:00
yangdx	4438ba41a3	Enhance DOCX extraction to preserve document order with tables • Include tables in extracted content • Maintain original document order • Add spacing around tables • Use tabs to separate table cells • Process all body elements sequentially	2025-11-19 01:31:33 +08:00
yangdx	702cfd2981	Fix document deletion concurrency control and validation logic • Clarify job naming for single vs batch deletion • Update job name validation in busy pipeline check	2025-11-18 13:59:24 +08:00
yangdx	1745b30a5f	Fix missing workspace parameter in update flags status call	2025-11-18 12:55:48 +08:00
yangdx	52c812b9a0	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls	2025-11-17 12:54:33 +08:00
yangdx	926960e957	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking	2025-11-17 12:54:33 +08:00
yangdx	c246eff725	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety)	2025-11-17 12:54:32 +08:00
yangdx	69a0b74ce7	refactor: move document deps to api group, remove dynamic imports - Merge offline-docs into api extras - Remove pipmaster dynamic installs - Add async document processing - Pre-check docling availability - Update offline deployment docs	2025-11-17 12:54:32 +08:00
yangdx	c434879c7a	Replace PyPDF2 with pypdf for PDF processing - Update import from PyPDF2 to pypdf - Change dependency to pypdf>=6.1.0 - Update all requirements files - Remove PyPDF2 from lock file - Use modern pypdf library	2025-11-17 12:54:32 +08:00
BukeLy	eb52ec94d7	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed	2025-11-17 12:53:44 +08:00
anouarbm	c9e1c6c1c2	fix(api): change content field to list in query responses BREAKING CHANGE: content field is now List[str] instead of str - Add ReferenceItem Pydantic model for type safety - Update /query and /query/stream to return content as list - Update OpenAPI schema and examples - Add migration guide to API README - Fix RAGAS evaluation to handle list format Addresses PR #2297 feedback. Tested with RAGAS: 97.37% score.	2025-11-03 04:57:08 +01:00
anouarbm	9d69e8d776	fix(api): Change content field from string to list in query responses BREAKING CHANGE: The `content` field in query response references is now an array of strings instead of a concatenated string. This preserves individual chunk boundaries when a single file has multiple chunks. Changes: - Update QueryResponse Pydantic model to accept List[str] for content - Modify query_text endpoint to return content as list (query_routes.py:425) - Modify query_text_stream endpoint to support chunk content enrichment - Update OpenAPI schema and examples to reflect array structure - Update API README with breaking change notice and migration guide - Fix RAGAS evaluation to flatten chunk content lists	2025-11-03 04:37:09 +01:00
anouarbm	0b5e3f9dc4	Use logger in RAG evaluation and optimize reference content joins	2025-11-02 18:43:53 +01:00
anouarbm	963ad4c637	docs: Add documentation and examples for include_chunk_content parameter Added comprehensive documentation for the new include_chunk_content parameter that enables retrieval of actual chunk text content in API responses. Documentation Updates: - Added "Include Chunk Content in References" section to API README - Explained use cases: RAG evaluation, debugging, citations, transparency - Provided JSON request/response examples - Clarified parameter interaction with include_references OpenAPI/Swagger Examples: - Added "Response with chunk content" example to /query endpoint - Shows complete reference structure with content field - Demonstrates realistic chunk text content This makes the feature discoverable through: 1. API documentation (README.md) 2. Interactive Swagger UI (http://localhost:9621/docs) 3. Code examples for developers	2025-11-02 17:53:27 +01:00
anouarbm	0bbef9814e	Optimize RAGAS evaluation with parallel execution and chunk content enrichment Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking. Key Features: - Single API call per evaluation (2x faster than before) - Parallel evaluation based on MAX_ASYNC environment variable - Chunk content enrichment in /query endpoint responses - Comprehensive benchmark statistics (moyennes) - NaN-safe metric calculations API Changes: - Added include_chunk_content parameter to QueryRequest (backward compatible) - /query endpoint enriches references with actual chunk content when requested - No breaking changes - default behavior unchanged Evaluation Improvements: - Parallel execution using asyncio.Semaphore (respects MAX_ASYNC) - Shared HTTP client with connection pooling - Proper timeout handling (3min connect, 5min read) - Debug output for context retrieval verification - Benchmark statistics with averages, min/max scores Results: - Moyenne RAGAS Score: 0.9772 - Perfect Faithfulness: 1.0000 - Perfect Context Recall: 1.0000 - Perfect Context Precision: 1.0000 - Excellent Answer Relevance: 0.9087	2025-11-02 17:39:43 +01:00
yangdx	61b57cbb5d	Add PDF decryption support for password-protected files • Add PDF_DECRYPT_PASSWORD env variable • Check encryption status before reading • Handle decrypt errors gracefully • Log detailed error messages • Support both encrypted/plain PDFs	2025-11-01 15:01:17 +08:00
yangdx	c46c1b26a9	Add pycryptodome dependency for PDF encryption support	2025-10-31 01:49:42 +08:00
yangdx	5155edd8d2	feat: Improve entity merge and edit UX - API: The `graph/entity/edit` endpoint now returns a detailed `operation_summary` for better client-side handling of update, rename, and merge outcomes. - Web UI: Added an "auto-merge on rename" option. The UI now gracefully handles merge success, partial failures (update OK, merge fail), and other errors with specific user feedback.	2025-10-27 23:42:08 +08:00
yangdx	97034f06e3	Add allow_merge parameter to entity update API endpoint	2025-10-27 14:30:27 +08:00
yangdx	6015e8bc68	Refactor graph utils to use unified persistence callback - Add _persist_graph_updates function - Remove duplicate callback functions	2025-10-26 20:20:16 +08:00
yangdx	bf1897a67e	Normalize entity order for undirected graph consistency • Normalize entity pairs for storage • Update API docs for undirected edges	2025-10-26 15:53:31 +08:00
Daniel.y	c82485d94d	Merge pull request #2253 from Mobious/main Allow users to provide keywords with QueryRequest	2025-10-25 11:26:54 +08:00
yangdx	78ad8873b8	Add cancellation check in delete loop	2025-10-24 14:47:20 +08:00
yangdx	743aefc655	Add pipeline cancellation feature for graceful processing termination • Add cancel_pipeline API endpoint • Implement PipelineCancelledException • Add cancellation checks in main loop • Handle task cancellation gracefully • Mark cancelled docs as FAILED	2025-10-24 14:08:12 +08:00
Mobious	f24a261613	Allow users to provide keywords with QueryRequest	2025-10-23 12:53:19 -10:00
yangdx	8dc23eeff2	Fix RayAnything compatible problem • Use "preprocessed" to indicate multimodal processing is required • Update DocProcessingStatus to process status convertion automatically • Remove multimodal_processed from DocStatus enum value • Update UI filter logic	2025-10-22 20:15:29 +08:00
yangdx	162370b6e6	Add optional LLM cache deletion when deleting documents • Add delete_llm_cache parameter to API • Collect cache IDs from text chunks • Delete cache after graph operations • Update UI with new checkbox option • Add i18n translations for cache option	2025-10-22 12:19:23 +08:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	130b4959dc	Add PREPROCESSED (multimodal_processed) status for multimodal document processing • Add DocStatus.PREPROCESSED enum value • Update API routes and response models • Add preprocessed filter in web UI • Update localization files • Handle preprocessed status in deletion	2025-10-14 14:02:05 +08:00
yangdx	12facac506	Enhance graph API endpoints with detailed docs and field validation - Remove redundant README section - Add Pydantic field validation - Expand endpoint docstrings - Include request/response examples - Document merge operation benefits	2025-10-10 12:49:00 +08:00
yangdx	85d1a563b3	Merge branch 'adminunblinded/main'	2025-10-10 12:31:47 +08:00
NeelM0906	b7c77396a0	Fix entity/relation creation endpoints to properly update vector stores - Changed create_entity to use rag.acreate_entity() instead of direct graph manipulation - Changed create_relation to use rag.acreate_relation() instead of direct graph manipulation - This ensures vector embeddings are created and entities/relations are searchable - Adds proper concurrency locks and metadata population	2025-10-09 17:02:17 -04:00
NeelM0906	f6d1fb98ac	Fix Linting errors	2025-10-09 16:52:22 -04:00
NeelM0906	9f44e89de7	Add knowledge graph manipulation endpoints Added three new REST API endpoints for direct knowledge graph manipulation: - POST /graph/entity/create: Create new entities in the knowledge graph - POST /graph/relation/create: Create relationships between entities - POST /graph/entities/merge: Merge duplicate/misspelled entities while preserving relationships The merge endpoint is particularly useful for consolidating entities discovered after document processing, fixing spelling errors, and cleaning up the knowledge graph. All relationships from source entities are transferred to the target entity, with intelligent handling of duplicate relationships. Updated API documentation in lightrag/api/README.md with usage examples for all three endpoints.	2025-10-08 15:59:47 -04:00

1 2 3 4 5

243 commits