LightRAG

Author	SHA1	Message	Date
clssck	43af31f888	feat: add db_degree visibility and orphan connection UI Graph Connectivity Awareness: - Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph) - Show database degree vs visual degree in node panel with amber badge - Add visual indicator (amber border) for nodes with hidden connections - Add "Load X hidden connection(s)" button to expand hidden neighbors - Add configurable "Expand Depth" setting (1-5) in graph settings - Use global maxNodes setting for node expansion consistency Orphan Connection UI: - Add OrphanConnectionDialog component for manual orphan entity connection - Add OrphanConnectionControl button in graph sidebar - Expose /graph/orphans/connect API endpoint for frontend use Backend Improvements: - Add get_orphan_entities() and connect_orphan_entities() to base storage - Add orphan connection configuration parameters - Improve entity extraction with relationship density requirements Frontend: - Add graphExpandDepth and graphIncludeOrphans to settings store - Add min_degree and include_orphans graph filtering parameters - Update translations (en.json, zh.json)	2025-11-29 21:08:07 +01:00
clssck	ef7327bb3e	chore(docker-compose, lightrag): optimize test infrastructure and add evaluation tools Add comprehensive E2E testing infrastructure with PostgreSQL performance tuning, Gunicorn multi-worker support, and evaluation scripts for RAGAS-based quality assessment. Introduces 4 new evaluation utilities: compare_results.py for A/B test analysis, download_wikipedia.py for reproducible test datasets, e2e_test_harness.py for automated evaluation pipelines, and ingest_test_docs.py for batch document ingestion. Updates docker-compose.test.yml with aggressive async settings, memory limits, and optimized chunking parameters. Parallelize entity summarization in operate.py for improved extraction performance. Fix typos in merge node/edge logs.	2025-11-29 10:39:20 +01:00
clssck	48c7732edc	feat: add automatic entity resolution with 3-layer matching Implement automatic entity resolution to prevent duplicate nodes in the knowledge graph. The system uses a 3-layer approach: 1. Case-insensitive exact matching (free, instant) 2. Fuzzy string matching >85% threshold (free, instant) 3. Vector similarity + LLM verification (for acronyms/synonyms) Key features: - Pre-resolution phase prevents race conditions in parallel processing - Numeric suffix detection blocks false matches (IL-4 ≠ IL-13) - PostgreSQL alias cache for fast lookups on subsequent ingestion - Configurable thresholds via environment variables Bug fixes included: - Fix fuzzy matching false positives for numbered entities - Fix alias cache not being populated (missing db parameter) - Skip entity_aliases table from generic id index creation New files: - lightrag/entity_resolution/ - Core resolution module - tests/test_entity_resolution/ - Unit tests - docker/postgres-age-vector/ - Custom PG image with pgvector + AGE - docker-compose.test.yml - Integration test environment Configuration (env.example): - ENTITY_RESOLUTION_ENABLED=true - ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85 - ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5 - ENTITY_RESOLUTION_MAX_CANDIDATES=3	2025-11-27 15:35:02 +01:00
yangdx	4f12fe121d	Change entity extraction logging from warning to info level • Reduce log noise for empty entities	2025-11-27 11:00:34 +08:00
yangdx	f988a22652	Add token limit validation for character-only chunking - Add ChunkTokenLimitExceededError exception - Validate chunks against token limits - Include chunk preview in error messages - Add comprehensive test coverage - Log warnings for oversized chunks	2025-11-19 18:32:43 +08:00
yangdx	e77340d4a1	Adjust chunking parameters to match the default environment variable settings	2025-11-18 23:14:50 +08:00
EightyOliveira	dacca334e0	refactor(chunking): rename params and improve docstring for chunking_by_token_size	2025-11-18 15:46:28 +08:00
yangdx	ab4d7ac2b0	Add configurable embedding token limit with validation - Add EMBEDDING_TOKEN_LIMIT env var - Set max_token_size on embedding func - Add token limit property to LightRAG - Validate summary length vs limit - Log warning when limit exceeded	2025-11-14 19:28:36 +08:00
yangdx	03cc6262c4	Prohibit direct access to internal functions of EmbeddingFunc. • Fix similarity search error in query stage • Remove redundant null checks • Improve log readability	2025-11-08 01:43:36 +08:00
yangdx	ec2ea4fd3f	Rename function and variables for clarity in context building - Rename _build_llm_context to _build_context_str - Change text_units_context to chunks_context - Move string building before early return - Update log messages and comments - Consistent variable naming throughout	2025-11-01 12:15:24 +08:00
yangdx	3fa79026e0	Fix Entity Source IDs Tracking Problem - Handle existing node updates properly in edge merging stage - Fix source_ids merging logic - Reorder entity deletion and optimize node operations - Delete relationships before entities - Add edge existence debugging logs	2025-10-29 01:19:55 +08:00
yangdx	29c4a91dc3	Move relationship ID sorting to before vector DB operations • Remove verbose entity rebuild logging • Sort IDs before vector DB updates • Keep graph storage with original order	2025-10-28 19:13:48 +08:00
yangdx	5ee9a2f8c6	Fix entity consistency in knowledge graph rebuilding and merging • Sort src/tgt for consistent ordering • Create missing nodes before edges • Update entity chunks storage • Pass entity_vdb to rebuild function • Ensure entities exist in all storages	2025-10-25 21:37:03 +08:00
yangdx	97a2ee4ef1	Rename rebuild function name and improve relationship logging format	2025-10-25 11:17:43 +08:00
yangdx	a9ec15e669	Resolve lock leakage issue during user cancellation handling • Change default log level to INFO • Force enable error logging output • Add lock cleanup rollback protection • Handle LLM cache persistence errors • Fix async task exception handling	2025-10-25 03:06:45 +08:00
yangdx	77336e50b6	Improve error handling and add cancellation checks in pipeline	2025-10-24 17:54:17 +08:00
yangdx	78ad8873b8	Add cancellation check in delete loop	2025-10-24 14:47:20 +08:00
yangdx	743aefc655	Add pipeline cancellation feature for graceful processing termination • Add cancel_pipeline API endpoint • Implement PipelineCancelledException • Add cancellation checks in main loop • Handle task cancellation gracefully • Mark cancelled docs as FAILED	2025-10-24 14:08:12 +08:00
yangdx	00aa5e53a7	Improve entity identifier truncation warning message format	2025-10-22 15:56:19 +08:00
yangdx	904b1f46f9	Add entity name length truncation with configurable limit	2025-10-22 14:02:30 +08:00
yangdx	a809245aed	Preserve file path order by using lists instead of sets	2025-10-21 18:57:54 +08:00
yangdx	fe890fca15	Improve formatting of limit method info in rebuild functions	2025-10-21 18:34:06 +08:00
yangdx	3ed2abd82c	Improve logging to show source ID ratios when skipping entities/edges	2025-10-21 16:20:34 +08:00
yangdx	80668aae22	Improve file path truncation labels and UI consistency • Standardize FIFO/KEEP truncation labels • Update UI truncation text format	2025-10-21 15:39:31 +08:00
yangdx	be3d274a0b	Refactor node and edge merging logic with improved code structure • Add numbered steps for clarity • Improve early return handling • Enhance file path limiting logic	2025-10-21 15:16:47 +08:00
yangdx	a5253244f9	Simplify skip logging and reduce pipeline status updates	2025-10-21 06:33:34 +08:00
yangdx	cd1c48beaf	Standardize placeholder format to use colon separator consistently	2025-10-21 05:03:57 +08:00
yangdx	1154c5683f	Refactor deduplication calculation and remove unused variables	2025-10-21 04:41:15 +08:00
yangdx	665f60b90f	Refactor entity/relation merge to consolidate VDB operations within functions • Move VDB upserts into merge functions • Fix early return data structure issues • Update status messages (IGNORE_NEW → KEEP) • Consolidate error handling paths • Improve relationship content format	2025-10-21 03:19:34 +08:00
yangdx	e01c998ee9	Track placeholders in file paths for accurate source count display • Add has_placeholder tracking variable • Detect placeholder patterns in paths • Show + sign for truncated counts	2025-10-20 23:48:04 +08:00
yangdx	e0fd31a60d	Fix logging message formatting	2025-10-20 22:09:09 +08:00
yangdx	a9fec26798	Add file path limit configuration for entities and relations • Add MAX_FILE_PATHS env variable • Implement file path count limiting • Support KEEP/FIFO strategies • Add truncation placeholder • Remove old build_file_path function	2025-10-20 20:12:53 +08:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	9f49e56a44	Merge branch 'main' into feat-entity-size-caps	2025-10-17 15:59:44 +08:00
yangdx	35cd567c9e	Allow related chunks missing in knowledge graph queries	2025-10-17 00:19:30 +08:00
DivinesLight	c06522b927	Get max source Id config from .env and lightRAG init	2025-10-15 18:24:38 +05:00
yangdx	29bac49fb9	Handle empty query results by returning None instead of fail responses • Return None when no context found • Add structured failure metadata • Use PROMPTS["fail_response"] for content • Keep API compatible	2025-10-15 12:04:49 +08:00
haseebuchiha	d52c3377b4	Import from env and use default if none and removed useless import	2025-10-14 16:14:03 +05:00
DivinesLight	54f0a7d1ca	Quick fix to limit source_id ballooning while inserting nodes	2025-10-14 14:47:04 +05:00
yangdx	85d1a563b3	Merge branch 'adminunblinded/main'	2025-10-10 12:31:47 +08:00
NeelM0906	f6d1fb98ac	Fix Linting errors	2025-10-09 16:52:22 -04:00
yangdx	aac787bafb	Clarify chunk tracking log message in _build_llm_context	2025-10-05 13:33:55 +08:00
yangdx	37e8898cf6	Simplify reference formatting in LLM context generation - Remove extra newlines in reference lists - Change code block type from text to generic	2025-10-01 22:20:58 +08:00
yangdx	dbb0b3afb4	Fix hl_keywords and ll_keywords cache logic - Remove hl_keywords and ll_keywords from keywork extracht cache - Add hl_keywords and ll_keywords to LLM query cache	2025-09-27 15:26:52 +08:00
yangdx	8cd4139cbf	refactor: fix double query problem by add aquery_llm function for consistent response handling - Add new aquery_llm/query_llm methods providing structured responses - Consolidate /query and /query/stream endpoints to use unified aquery_llm - Optimize cache handling by moving cache checks before LLM calls	2025-09-26 19:05:03 +08:00
yangdx	cbdc4c4bdf	Refactor prompts and context building for better maintainability - Extract context templates to PROMPTS - Unify token calculation logic - Simplify user_prompt formatting - Reduce code duplication - Improve prompt structure consistency	2025-09-26 12:39:06 +08:00
yangdx	fba2356c81	Move user_prompt to system prompt - Refactor query prompt handling to separate user prompts in system context - Simplify user_query to only contain query - Apply changes to both kg_query and naive_query	2025-09-26 10:02:01 +08:00
yangdx	b848ca49e6	Fix linting	2025-09-25 16:22:00 +08:00
yangdx	b08b8a6a6a	Add reference list support to query API endpoints with unified result handling • Add include_references param to QueryRequest • Extend QueryResponse with references field • Create unified QueryResult data structures • Refactor kg_query and naive_query functions • Update streaming to send references first	2025-09-25 16:21:42 +08:00
yangdx	5eb4a4b799	feat: simplify citations, add reference merging, and restructure API response format	2025-09-24 14:30:10 +08:00

1 2 3 4 5 ...

597 commits