LightRAG/lightrag
clssck 48c7732edc feat: add automatic entity resolution with 3-layer matching
Implement automatic entity resolution to prevent duplicate nodes in the
knowledge graph. The system uses a 3-layer approach:

1. Case-insensitive exact matching (free, instant)
2. Fuzzy string matching >85% threshold (free, instant)
3. Vector similarity + LLM verification (for acronyms/synonyms)

Key features:
- Pre-resolution phase prevents race conditions in parallel processing
- Numeric suffix detection blocks false matches (IL-4 ≠ IL-13)
- PostgreSQL alias cache for fast lookups on subsequent ingestion
- Configurable thresholds via environment variables

Bug fixes included:
- Fix fuzzy matching false positives for numbered entities
- Fix alias cache not being populated (missing db parameter)
- Skip entity_aliases table from generic id index creation

New files:
- lightrag/entity_resolution/ - Core resolution module
- tests/test_entity_resolution/ - Unit tests
- docker/postgres-age-vector/ - Custom PG image with pgvector + AGE
- docker-compose.test.yml - Integration test environment

Configuration (env.example):
- ENTITY_RESOLUTION_ENABLED=true
- ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85
- ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5
- ENTITY_RESOLUTION_MAX_CANDIDATES=3
2025-11-27 15:35:02 +01:00
..
api feat: add automatic entity resolution with 3-layer matching 2025-11-27 15:35:02 +01:00
entity_resolution feat: add automatic entity resolution with 3-layer matching 2025-11-27 15:35:02 +01:00
evaluation Update LLM cache migration docs and improve UX prompts 2025-11-08 23:48:19 +08:00
kg feat: add automatic entity resolution with 3-layer matching 2025-11-27 15:35:02 +01:00
llm fix:exception handling order error 2025-11-25 16:36:41 +08:00
tools Refactor main function to provide sync CLI entry point 2025-11-21 13:11:55 +08:00
__init__.py Bump core version to 1.4.9.9 and API to 0252 2025-11-08 11:27:26 +08:00
base.py Remove unused chunk-based node/edge retrieval methods 2025-11-06 18:17:10 +08:00
constants.py Refactor entity merging with unified attribute merge function 2025-10-27 00:04:17 +08:00
exceptions.py Fix ChunkTokenLimitExceededError message formatting 2025-11-19 18:50:45 +08:00
lightrag.py feat: add automatic entity resolution with 3-layer matching 2025-11-27 15:35:02 +01:00
namespace.py Add entity/relation chunk tracking with configurable source ID limits 2025-10-20 15:24:15 +08:00
operate.py feat: add automatic entity resolution with 3-layer matching 2025-11-27 15:35:02 +01:00
prompt.py Fix typo in 'equipment' in prompt.py 2025-10-22 11:13:22 +08:00
rerank.py fix: Resolve default rerank config problem when env var missing 2025-08-23 01:07:59 +08:00
types.py
utils.py Fix double decoration in azure_openai_embed and document decorator usage 2025-11-21 18:03:53 +08:00
utils_graph.py Improve entity merge logging by removing redundant message and fixing typo 2025-10-31 17:16:59 +08:00