LightRAG/examples
clssck 48c7732edc feat: add automatic entity resolution with 3-layer matching
Implement automatic entity resolution to prevent duplicate nodes in the
knowledge graph. The system uses a 3-layer approach:

1. Case-insensitive exact matching (free, instant)
2. Fuzzy string matching >85% threshold (free, instant)
3. Vector similarity + LLM verification (for acronyms/synonyms)

Key features:
- Pre-resolution phase prevents race conditions in parallel processing
- Numeric suffix detection blocks false matches (IL-4 ≠ IL-13)
- PostgreSQL alias cache for fast lookups on subsequent ingestion
- Configurable thresholds via environment variables

Bug fixes included:
- Fix fuzzy matching false positives for numbered entities
- Fix alias cache not being populated (missing db parameter)
- Skip entity_aliases table from generic id index creation

New files:
- lightrag/entity_resolution/ - Core resolution module
- tests/test_entity_resolution/ - Unit tests
- docker/postgres-age-vector/ - Custom PG image with pgvector + AGE
- docker-compose.test.yml - Integration test environment

Configuration (env.example):
- ENTITY_RESOLUTION_ENABLED=true
- ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85
- ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5
- ENTITY_RESOLUTION_MAX_CANDIDATES=3
2025-11-27 15:35:02 +01:00
..
unofficial-sample Remove manual initialize_pipeline_status() calls across codebase 2025-11-17 12:54:33 +08:00
generate_query.py Restore query generation example and fix README path reference 2025-10-29 19:11:40 +08:00
graph_visual_with_html.py fixed networkx 2025-02-19 13:48:18 +01:00
graph_visual_with_neo4j.py Update edge keywords extraction in graph visualization 2025-11-17 12:54:32 +08:00
insert_custom_kg.py Updated documentation examples to include chunk_order_index case 2025-02-19 14:58:51 +01:00
lightrag_azure_openai_demo.py Remove manual initialize_pipeline_status() calls across codebase 2025-11-17 12:54:33 +08:00
lightrag_ollama_demo.py Remove manual initialize_pipeline_status() calls across codebase 2025-11-17 12:54:33 +08:00
lightrag_openai_compatible_demo.py Remove manual initialize_pipeline_status() calls across codebase 2025-11-17 12:54:33 +08:00
lightrag_openai_demo.py Remove manual initialize_pipeline_status() calls across codebase 2025-11-17 12:54:33 +08:00
lightrag_openai_mongodb_graph_demo.py Remove manual initialize_pipeline_status() calls across codebase 2025-11-17 12:54:33 +08:00
modalprocessors_example.py Remove manual initialize_pipeline_status() calls across codebase 2025-11-17 12:54:33 +08:00
raganything_example.py Update raganything_example.py 2025-07-04 11:32:01 +08:00
rerank_example.py Remove manual initialize_pipeline_status() calls across codebase 2025-11-17 12:54:33 +08:00
test_entity_resolution.py feat: add automatic entity resolution with 3-layer matching 2025-11-27 15:35:02 +01:00