LightRAG

Author	SHA1	Message	Date
clssck	663ada943a	chore: add citation system and enhance RAG UI components Add citation tracking and display system across backend and frontend components. Backend changes include citation.py for document attribution, enhanced query routes with citation metadata, improved prompt templates, and PostgreSQL schema updates. Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements, and ChatMessage enhancements for displaying document sources. Update dependencies and docker-compose test configuration for improved development workflow.	2025-12-01 17:50:00 +01:00
clssck	43af31f888	feat: add db_degree visibility and orphan connection UI Graph Connectivity Awareness: - Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph) - Show database degree vs visual degree in node panel with amber badge - Add visual indicator (amber border) for nodes with hidden connections - Add "Load X hidden connection(s)" button to expand hidden neighbors - Add configurable "Expand Depth" setting (1-5) in graph settings - Use global maxNodes setting for node expansion consistency Orphan Connection UI: - Add OrphanConnectionDialog component for manual orphan entity connection - Add OrphanConnectionControl button in graph sidebar - Expose /graph/orphans/connect API endpoint for frontend use Backend Improvements: - Add get_orphan_entities() and connect_orphan_entities() to base storage - Add orphan connection configuration parameters - Improve entity extraction with relationship density requirements Frontend: - Add graphExpandDepth and graphIncludeOrphans to settings store - Add min_degree and include_orphans graph filtering parameters - Update translations (en.json, zh.json)	2025-11-29 21:08:07 +01:00
clssck	d2c9e6e2ec	test(lightrag): add orphan connection feature with quality validation tests Implement automatic orphan entity connection system that identifies entities with no relationships and creates meaningful connections via vector similarity + LLM validation. This improves knowledge graph connectivity and retrieval quality. Changes: - Add orphan connection configuration parameters (thresholds, cross-connect settings) - Implement aconnect_orphan_entities() method with 4-step validation pipeline - Add SQL templates for efficient orphan and candidate entity queries - Create POST /graph/orphans/connect API endpoint with configurable parameters - Add orphan connection validation prompt for LLM-based relationship verification - Include relationship density requirement in extraction prompts to prevent orphans - Update docker-compose.test.yml with optimized extraction parameters - Add quality validation test suite (run_quality_tests.py) for retrieval evaluation - Add unit test framework (test_orphan_connection_quality.py) with test cases - Enable auto-run of orphan connection after document processing	2025-11-28 18:23:30 +01:00
clssck	48c7732edc	feat: add automatic entity resolution with 3-layer matching Implement automatic entity resolution to prevent duplicate nodes in the knowledge graph. The system uses a 3-layer approach: 1. Case-insensitive exact matching (free, instant) 2. Fuzzy string matching >85% threshold (free, instant) 3. Vector similarity + LLM verification (for acronyms/synonyms) Key features: - Pre-resolution phase prevents race conditions in parallel processing - Numeric suffix detection blocks false matches (IL-4 ≠ IL-13) - PostgreSQL alias cache for fast lookups on subsequent ingestion - Configurable thresholds via environment variables Bug fixes included: - Fix fuzzy matching false positives for numbered entities - Fix alias cache not being populated (missing db parameter) - Skip entity_aliases table from generic id index creation New files: - lightrag/entity_resolution/ - Core resolution module - tests/test_entity_resolution/ - Unit tests - docker/postgres-age-vector/ - Custom PG image with pgvector + AGE - docker-compose.test.yml - Integration test environment Configuration (env.example): - ENTITY_RESOLUTION_ENABLED=true - ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85 - ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5 - ENTITY_RESOLUTION_MAX_CANDIDATES=3	2025-11-27 15:35:02 +01:00
yangdx	9c10c87554	Fix linting	2025-11-18 22:38:43 +08:00
EightyOliveira	dacca334e0	refactor(chunking): rename params and improve docstring for chunking_by_token_size	2025-11-18 15:46:28 +08:00
yangdx	702cfd2981	Fix document deletion concurrency control and validation logic • Clarify job naming for single vs batch deletion • Update job name validation in busy pipeline check	2025-11-18 13:59:24 +08:00
yangdx	4048fc4b89	Fix: auto-acquire pipeline when idle in document deletion • Track if we acquired the pipeline lock • Auto-acquire pipeline when idle • Only release if we acquired it • Prevent concurrent deletion conflicts • Improve deletion job validation	2025-11-18 13:25:13 +08:00
yangdx	6cef8df159	Reduce log level and improve workspace mismatch message clarity • Change warning to info level • Simplify workspace mismatch wording	2025-11-18 08:25:21 +08:00
yangdx	9d7b7981ce	Add pipeline status validation before document deletion	2025-11-17 14:58:10 +08:00
yangdx	e22ac52ebc	Auto-initialize pipeline status in LightRAG.initialize_storages() • Remove manual initialize_pipeline_status calls • Auto-init in initialize_storages method • Update error messages for clarity • Warn on workspace conflicts	2025-11-17 12:54:33 +08:00
yangdx	52c812b9a0	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls	2025-11-17 12:54:33 +08:00
yangdx	926960e957	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking	2025-11-17 12:54:33 +08:00
yangdx	2fb57e767d	Fix embedding token limit initialization order * Capture max_token_size before decorator * Apply wrapper after capturing attribute * Prevent decorator from stripping dataclass * Ensure token limit is properly set	2025-11-17 12:54:32 +08:00
yangdx	f0254773c6	Convert embedding_token_limit from property to field with __post_init__ • Remove property decorator • Add field with init=False • Set value in __post_init__ method • embedding_token_limit is now in config dictionary	2025-11-17 12:54:32 +08:00
yangdx	14a6c24ed7	Add configurable embedding token limit with validation - Add EMBEDDING_TOKEN_LIMIT env var - Set max_token_size on embedding func - Add token limit property to LightRAG - Validate summary length vs limit - Log warning when limit exceeded	2025-11-17 12:54:32 +08:00
yangdx	7d394fb0a4	Replace asyncio.iscoroutine with inspect.isawaitable for better detection	2025-11-17 12:54:32 +08:00
yangdx	af5423919b	Support async chunking functions in LightRAG processing pipeline - Add Awaitable and Union type imports - Update chunking_func type annotation - Handle coroutine results with await - Add return type validation - Update docstring for async support	2025-11-17 12:54:32 +08:00
Tong Da	5016025453	easier version: detect chunking_func result is coroutine or not	2025-11-17 12:54:32 +08:00
Tong Da	7740500693	support async chunking func to improve processing performance when a heavy `chunking_func` is passed in by user	2025-11-17 12:54:32 +08:00
BukeLy	18a4870229	fix: Add default workspace support for backward compatibility Fixes two compatibility issues in workspace isolation: 1. Problem: lightrag_server.py calls initialize_pipeline_status() without workspace parameter, causing pipeline to initialize in global namespace instead of rag's workspace. Solution: Add set_default_workspace() mechanism in shared_storage. LightRAG.initialize_storages() now sets default workspace, which initialize_pipeline_status() uses when called without parameters. 2. Problem: /health endpoint hardcoded to use "pipeline_status", cannot return workspace-specific status or support frontend workspace selection. Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now extracts workspace from header or falls back to server default, returning correct workspace-specific pipeline status. Changes: - lightrag/kg/shared_storage.py: Add set/get_default_workspace() - lightrag/lightrag.py: Call set_default_workspace() in initialize_storages() - lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper, update /health endpoint to support LIGHTRAG-WORKSPACE header Testing: - Backward compatibility: Old code works without modification - Multi-instance safety: Explicit workspace passing preserved - /health endpoint: Supports both default and header-specified workspaces Related: #2353	2025-11-17 12:54:20 +08:00
BukeLy	eb52ec94d7	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed	2025-11-17 12:53:44 +08:00
yangdx	ea141e2779	Fix: Remove redundant entity/relation chunk deletions	2025-11-07 02:56:16 +08:00
yangdx	04ed709b34	Optimize entity deletion by batching edge queries to avoid N+1 problem • Add batch get_nodes_edges_batch call • Remove individual get_node_edges calls • Improve query performance	2025-11-06 21:34:47 +08:00
yangdx	afb5e5c1cb	Fix edge cleanup when deleting entities to prevent orphaned relationships - Track edges to delete in set - Clean VDB before node deletion - Remove from relation chunks storage - Prevent orphaned relationship data	2025-10-31 02:36:15 +08:00
yangdx	c36afecba4	Remove redundant await call in file extraction pipeline	2025-10-30 20:35:41 +08:00
yangdx	3fa79026e0	Fix Entity Source IDs Tracking Problem - Handle existing node updates properly in edge merging stage - Fix source_ids merging logic - Reorder entity deletion and optimize node operations - Delete relationships before entities - Add edge existence debugging logs	2025-10-29 01:19:55 +08:00
yangdx	c81a56a113	Fix entity and relationship deletion when no chunk references remain	2025-10-28 16:02:35 +08:00
yangdx	5155edd8d2	feat: Improve entity merge and edit UX - API: The `graph/entity/edit` endpoint now returns a detailed `operation_summary` for better client-side handling of update, rename, and merge outcomes. - Web UI: Added an "auto-merge on rename" option. The UI now gracefully handles merge success, partial failures (update OK, merge fail), and other errors with specific user feedback.	2025-10-27 23:42:08 +08:00
yangdx	2c09adb8d3	Add chunk tracking support to entity merge functionality - Pass chunk storages to merge function - Merge relation chunk tracking data - Merge entity chunk tracking data - Delete old chunk tracking records - Persist chunk storage updates	2025-10-27 02:06:21 +08:00
yangdx	3fbd704bf9	Enhance entity/relation editing with chunk tracking synchronization • Add chunk storage sync to edit ops • Implement incremental chunk ID updates • Support entity renaming migrations • Normalize relation keys consistently • Preserve chunk references on edits	2025-10-26 14:34:56 +08:00
yangdx	29bf593663	Fix entity and relation chunk cleanup in deletion pipeline • Delete from entity_chunks storage • Delete from relation_chunks storage	2025-10-25 22:32:27 +08:00
yangdx	a9bc348446	Remove enable_logging parameter from data init lock call	2025-10-25 11:48:14 +08:00
yangdx	97a2ee4ef1	Rename rebuild function name and improve relationship logging format	2025-10-25 11:17:43 +08:00
yangdx	a9ec15e669	Resolve lock leakage issue during user cancellation handling • Change default log level to INFO • Force enable error logging output • Add lock cleanup rollback protection • Handle LLM cache persistence errors • Fix async task exception handling	2025-10-25 03:06:45 +08:00
yangdx	77336e50b6	Improve error handling and add cancellation checks in pipeline	2025-10-24 17:54:17 +08:00
yangdx	743aefc655	Add pipeline cancellation feature for graceful processing termination • Add cancel_pipeline API endpoint • Implement PipelineCancelledException • Add cancellation checks in main loop • Handle task cancellation gracefully • Mark cancelled docs as FAILED	2025-10-24 14:08:12 +08:00
yangdx	b76350a3bc	Fix linting	2025-10-22 12:53:42 +08:00
yangdx	d7e2527e1a	Handle cache deletion errors gracefully instead of raising exceptions	2025-10-22 12:53:19 +08:00
yangdx	162370b6e6	Add optional LLM cache deletion when deleting documents • Add delete_llm_cache parameter to API • Collect cache IDs from text chunks • Delete cache after graph operations • Update UI with new checkbox option • Add i18n translations for cache option	2025-10-22 12:19:23 +08:00
yangdx	e5e16b7bd1	Fix Redis data migration error • Use proper Redis connection context • Fix namespace pattern for key scanning • Propagate storage check exceptions • Remove defensive error swallowing	2025-10-21 16:27:04 +08:00
yangdx	a9fec26798	Add file path limit configuration for entities and relations • Add MAX_FILE_PATHS env variable • Implement file path count limiting • Support KEEP/FIFO strategies • Add truncation placeholder • Remove old build_file_path function	2025-10-20 20:12:53 +08:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	9f49e56a44	Merge branch 'main' into feat-entity-size-caps	2025-10-17 15:59:44 +08:00
DivinesLight	c06522b927	Get max source Id config from .env and lightRAG init	2025-10-15 18:24:38 +05:00
yangdx	29bac49fb9	Handle empty query results by returning None instead of fail responses • Return None when no context found • Add structured failure metadata • Use PROMPTS["fail_response"] for content • Keep API compatible	2025-10-15 12:04:49 +08:00
yangdx	130b4959dc	Add PREPROCESSED (multimodal_processed) status for multimodal document processing • Add DocStatus.PREPROCESSED enum value • Update API routes and response models • Add preprocessed filter in web UI • Update localization files • Handle preprocessed status in deletion	2025-10-14 14:02:05 +08:00
yangdx	074f0c8b23	Update docstring for adelete_by_doc_id method clarity	2025-10-12 10:12:45 +08:00
yangdx	457d51952e	Add doc_name field to full docs storage - Store file_path in full_docs storage - Update PostgreSQL implementation by map file_path to doc_name - Other storage implementation automatically handles the new field	2025-10-05 11:44:27 +08:00
yangdx	1766cddd6c	Fix mode parameter serialization error in Ollama chat API • Use mode.value for API requests • Add debug logging in aquery_llm	2025-09-27 15:11:51 +08:00

1 2 3 4 5 ...

629 commits