LightRAG

Author	SHA1	Message	Date
clssck	663ada943a	chore: add citation system and enhance RAG UI components Add citation tracking and display system across backend and frontend components. Backend changes include citation.py for document attribution, enhanced query routes with citation metadata, improved prompt templates, and PostgreSQL schema updates. Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements, and ChatMessage enhancements for displaying document sources. Update dependencies and docker-compose test configuration for improved development workflow.	2025-12-01 17:50:00 +01:00
clssck	43af31f888	feat: add db_degree visibility and orphan connection UI Graph Connectivity Awareness: - Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph) - Show database degree vs visual degree in node panel with amber badge - Add visual indicator (amber border) for nodes with hidden connections - Add "Load X hidden connection(s)" button to expand hidden neighbors - Add configurable "Expand Depth" setting (1-5) in graph settings - Use global maxNodes setting for node expansion consistency Orphan Connection UI: - Add OrphanConnectionDialog component for manual orphan entity connection - Add OrphanConnectionControl button in graph sidebar - Expose /graph/orphans/connect API endpoint for frontend use Backend Improvements: - Add get_orphan_entities() and connect_orphan_entities() to base storage - Add orphan connection configuration parameters - Improve entity extraction with relationship density requirements Frontend: - Add graphExpandDepth and graphIncludeOrphans to settings store - Add min_degree and include_orphans graph filtering parameters - Update translations (en.json, zh.json)	2025-11-29 21:08:07 +01:00
clssck	ef7327bb3e	chore(docker-compose, lightrag): optimize test infrastructure and add evaluation tools Add comprehensive E2E testing infrastructure with PostgreSQL performance tuning, Gunicorn multi-worker support, and evaluation scripts for RAGAS-based quality assessment. Introduces 4 new evaluation utilities: compare_results.py for A/B test analysis, download_wikipedia.py for reproducible test datasets, e2e_test_harness.py for automated evaluation pipelines, and ingest_test_docs.py for batch document ingestion. Updates docker-compose.test.yml with aggressive async settings, memory limits, and optimized chunking parameters. Parallelize entity summarization in operate.py for improved extraction performance. Fix typos in merge node/edge logs.	2025-11-29 10:39:20 +01:00
clssck	48c7732edc	feat: add automatic entity resolution with 3-layer matching Implement automatic entity resolution to prevent duplicate nodes in the knowledge graph. The system uses a 3-layer approach: 1. Case-insensitive exact matching (free, instant) 2. Fuzzy string matching >85% threshold (free, instant) 3. Vector similarity + LLM verification (for acronyms/synonyms) Key features: - Pre-resolution phase prevents race conditions in parallel processing - Numeric suffix detection blocks false matches (IL-4 ≠ IL-13) - PostgreSQL alias cache for fast lookups on subsequent ingestion - Configurable thresholds via environment variables Bug fixes included: - Fix fuzzy matching false positives for numbered entities - Fix alias cache not being populated (missing db parameter) - Skip entity_aliases table from generic id index creation New files: - lightrag/entity_resolution/ - Core resolution module - tests/test_entity_resolution/ - Unit tests - docker/postgres-age-vector/ - Custom PG image with pgvector + AGE - docker-compose.test.yml - Integration test environment Configuration (env.example): - ENTITY_RESOLUTION_ENABLED=true - ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85 - ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5 - ENTITY_RESOLUTION_MAX_CANDIDATES=3	2025-11-27 15:35:02 +01:00
yangdx	4f12fe121d	Change entity extraction logging from warning to info level • Reduce log noise for empty entities	2025-11-27 11:00:34 +08:00
yangdx	f988a22652	Add token limit validation for character-only chunking - Add ChunkTokenLimitExceededError exception - Validate chunks against token limits - Include chunk preview in error messages - Add comprehensive test coverage - Log warnings for oversized chunks	2025-11-19 18:32:43 +08:00
yangdx	e77340d4a1	Adjust chunking parameters to match the default environment variable settings	2025-11-18 23:14:50 +08:00
EightyOliveira	dacca334e0	refactor(chunking): rename params and improve docstring for chunking_by_token_size	2025-11-18 15:46:28 +08:00
yangdx	ab4d7ac2b0	Add configurable embedding token limit with validation - Add EMBEDDING_TOKEN_LIMIT env var - Set max_token_size on embedding func - Add token limit property to LightRAG - Validate summary length vs limit - Log warning when limit exceeded	2025-11-14 19:28:36 +08:00
yangdx	03cc6262c4	Prohibit direct access to internal functions of EmbeddingFunc. • Fix similarity search error in query stage • Remove redundant null checks • Improve log readability	2025-11-08 01:43:36 +08:00
yangdx	ec2ea4fd3f	Rename function and variables for clarity in context building - Rename _build_llm_context to _build_context_str - Change text_units_context to chunks_context - Move string building before early return - Update log messages and comments - Consistent variable naming throughout	2025-11-01 12:15:24 +08:00
yangdx	3fa79026e0	Fix Entity Source IDs Tracking Problem - Handle existing node updates properly in edge merging stage - Fix source_ids merging logic - Reorder entity deletion and optimize node operations - Delete relationships before entities - Add edge existence debugging logs	2025-10-29 01:19:55 +08:00
yangdx	29c4a91dc3	Move relationship ID sorting to before vector DB operations • Remove verbose entity rebuild logging • Sort IDs before vector DB updates • Keep graph storage with original order	2025-10-28 19:13:48 +08:00
yangdx	5ee9a2f8c6	Fix entity consistency in knowledge graph rebuilding and merging • Sort src/tgt for consistent ordering • Create missing nodes before edges • Update entity chunks storage • Pass entity_vdb to rebuild function • Ensure entities exist in all storages	2025-10-25 21:37:03 +08:00
yangdx	97a2ee4ef1	Rename rebuild function name and improve relationship logging format	2025-10-25 11:17:43 +08:00
yangdx	a9ec15e669	Resolve lock leakage issue during user cancellation handling • Change default log level to INFO • Force enable error logging output • Add lock cleanup rollback protection • Handle LLM cache persistence errors • Fix async task exception handling	2025-10-25 03:06:45 +08:00
yangdx	77336e50b6	Improve error handling and add cancellation checks in pipeline	2025-10-24 17:54:17 +08:00
yangdx	78ad8873b8	Add cancellation check in delete loop	2025-10-24 14:47:20 +08:00
yangdx	743aefc655	Add pipeline cancellation feature for graceful processing termination • Add cancel_pipeline API endpoint • Implement PipelineCancelledException • Add cancellation checks in main loop • Handle task cancellation gracefully • Mark cancelled docs as FAILED	2025-10-24 14:08:12 +08:00
yangdx	00aa5e53a7	Improve entity identifier truncation warning message format	2025-10-22 15:56:19 +08:00
yangdx	904b1f46f9	Add entity name length truncation with configurable limit	2025-10-22 14:02:30 +08:00
yangdx	a809245aed	Preserve file path order by using lists instead of sets	2025-10-21 18:57:54 +08:00
yangdx	fe890fca15	Improve formatting of limit method info in rebuild functions	2025-10-21 18:34:06 +08:00
yangdx	3ed2abd82c	Improve logging to show source ID ratios when skipping entities/edges	2025-10-21 16:20:34 +08:00
yangdx	80668aae22	Improve file path truncation labels and UI consistency • Standardize FIFO/KEEP truncation labels • Update UI truncation text format	2025-10-21 15:39:31 +08:00
yangdx	be3d274a0b	Refactor node and edge merging logic with improved code structure • Add numbered steps for clarity • Improve early return handling • Enhance file path limiting logic	2025-10-21 15:16:47 +08:00
yangdx	a5253244f9	Simplify skip logging and reduce pipeline status updates	2025-10-21 06:33:34 +08:00
yangdx	cd1c48beaf	Standardize placeholder format to use colon separator consistently	2025-10-21 05:03:57 +08:00
yangdx	1154c5683f	Refactor deduplication calculation and remove unused variables	2025-10-21 04:41:15 +08:00
yangdx	665f60b90f	Refactor entity/relation merge to consolidate VDB operations within functions • Move VDB upserts into merge functions • Fix early return data structure issues • Update status messages (IGNORE_NEW → KEEP) • Consolidate error handling paths • Improve relationship content format	2025-10-21 03:19:34 +08:00
yangdx	e01c998ee9	Track placeholders in file paths for accurate source count display • Add has_placeholder tracking variable • Detect placeholder patterns in paths • Show + sign for truncated counts	2025-10-20 23:48:04 +08:00
yangdx	e0fd31a60d	Fix logging message formatting	2025-10-20 22:09:09 +08:00
yangdx	a9fec26798	Add file path limit configuration for entities and relations • Add MAX_FILE_PATHS env variable • Implement file path count limiting • Support KEEP/FIFO strategies • Add truncation placeholder • Remove old build_file_path function	2025-10-20 20:12:53 +08:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	9f49e56a44	Merge branch 'main' into feat-entity-size-caps	2025-10-17 15:59:44 +08:00
yangdx	35cd567c9e	Allow related chunks missing in knowledge graph queries	2025-10-17 00:19:30 +08:00
DivinesLight	c06522b927	Get max source Id config from .env and lightRAG init	2025-10-15 18:24:38 +05:00
yangdx	29bac49fb9	Handle empty query results by returning None instead of fail responses • Return None when no context found • Add structured failure metadata • Use PROMPTS["fail_response"] for content • Keep API compatible	2025-10-15 12:04:49 +08:00
haseebuchiha	d52c3377b4	Import from env and use default if none and removed useless import	2025-10-14 16:14:03 +05:00
DivinesLight	54f0a7d1ca	Quick fix to limit source_id ballooning while inserting nodes	2025-10-14 14:47:04 +05:00
yangdx	85d1a563b3	Merge branch 'adminunblinded/main'	2025-10-10 12:31:47 +08:00
NeelM0906	f6d1fb98ac	Fix Linting errors	2025-10-09 16:52:22 -04:00
yangdx	aac787bafb	Clarify chunk tracking log message in _build_llm_context	2025-10-05 13:33:55 +08:00
yangdx	37e8898cf6	Simplify reference formatting in LLM context generation - Remove extra newlines in reference lists - Change code block type from text to generic	2025-10-01 22:20:58 +08:00
yangdx	dbb0b3afb4	Fix hl_keywords and ll_keywords cache logic - Remove hl_keywords and ll_keywords from keywork extracht cache - Add hl_keywords and ll_keywords to LLM query cache	2025-09-27 15:26:52 +08:00
yangdx	8cd4139cbf	refactor: fix double query problem by add aquery_llm function for consistent response handling - Add new aquery_llm/query_llm methods providing structured responses - Consolidate /query and /query/stream endpoints to use unified aquery_llm - Optimize cache handling by moving cache checks before LLM calls	2025-09-26 19:05:03 +08:00
yangdx	cbdc4c4bdf	Refactor prompts and context building for better maintainability - Extract context templates to PROMPTS - Unify token calculation logic - Simplify user_prompt formatting - Reduce code duplication - Improve prompt structure consistency	2025-09-26 12:39:06 +08:00
yangdx	fba2356c81	Move user_prompt to system prompt - Refactor query prompt handling to separate user prompts in system context - Simplify user_query to only contain query - Apply changes to both kg_query and naive_query	2025-09-26 10:02:01 +08:00
yangdx	b848ca49e6	Fix linting	2025-09-25 16:22:00 +08:00
yangdx	b08b8a6a6a	Add reference list support to query API endpoints with unified result handling • Add include_references param to QueryRequest • Extend QueryResponse with references field • Create unified QueryResult data structures • Refactor kg_query and naive_query functions • Update streaming to send references first	2025-09-25 16:21:42 +08:00

1 2 3 4 5 ...

598 commits