graphiti/graphiti_core
Daniel Chalef 420676faf2
fix: Prevent duplicate edge facts within same episode (#955)
* fix: Prevent duplicate edge facts within same episode

This fixes three related bugs that allowed verbatim duplicate edge facts:

1. Fixed LLM deduplication: Changed related_edges_context to use integer
   indices instead of UUIDs, matching the EdgeDuplicate model expectations.

2. Fixed batch deduplication: Removed episode skip in dedupe_edges_bulk
   that prevented comparing edges from the same episode. Added self-comparison
   guard to prevent edge from comparing against itself.

3. Added fast-path deduplication: Added exact string matching before parallel
   processing in resolve_extracted_edges to catch within-episode duplicates
   early, preventing race conditions where concurrent edges can't see each other.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* test: Add tests for edge deduplication fixes

Added three tests to verify the edge deduplication fixes:

1. test_dedupe_edges_bulk_deduplicates_within_episode: Verifies that
   dedupe_edges_bulk now compares edges from the same episode after
   removing the `if i == j: continue` check.

2. test_resolve_extracted_edge_uses_integer_indices_for_duplicates:
   Validates that the LLM receives integer indices for duplicate
   detection and correctly processes returned duplicate_facts.

3. test_resolve_extracted_edges_fast_path_deduplication: Confirms that
   the fast-path exact string matching deduplicates identical edges
   before parallel processing, preventing race conditions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Remove unused variables flagged by ruff

- Remove unused loop variable 'j' in bulk_utils.py
- Remove unused return value 'edges_by_episode' in test
- Replace unused 'edge_uuid' with '_' in test loop

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-01 07:30:30 -07:00
..
cross_encoder fix typo and model selector (#843) 2025-08-18 11:15:45 -04:00
driver Graph quality updates (#922) 2025-09-23 17:53:39 -04:00
embedder OpenSearch Integration for Neo4j (#896) 2025-09-09 10:51:46 -04:00
llm_client Make natural language extraction configurable (#943) 2025-09-30 11:09:03 -04:00
migrations cleanup (#894) 2025-09-05 11:30:46 -04:00
models OpenSearch Integration for Neo4j (#896) 2025-09-09 10:51:46 -04:00
prompts Improve JSON entity extraction prompt (#949) 2025-09-30 11:00:14 -04:00
search fix-fulltext-syntax-error (#914) 2025-09-23 10:52:44 -04:00
telemetry feat: add telemetry with PostHog and update Docker configurations (#633) 2025-06-27 12:23:30 -07:00
utils fix: Prevent duplicate edge facts within same episode (#955) 2025-10-01 07:30:30 -07:00
__init__.py chore: Fix packaging (#38) 2024-08-25 10:07:50 -07:00
edges.py OpenSearch updates (#906) 2025-09-14 01:43:37 -04:00
errors.py Add group ID validation and error handling (#618) 2025-06-24 09:33:54 -07:00
graph_queries.py Graph quality updates (#922) 2025-09-23 17:53:39 -04:00
graphiti.py Allow Edge extraction to keep discovered edge labels (#950) 2025-09-29 21:32:47 -07:00
graphiti_types.py ensure ascii default to false (#817) 2025-08-08 11:20:02 -04:00
helpers.py fix-fulltext-syntax-error (#914) 2025-09-23 10:52:44 -04:00
nodes.py OpenSearch updates (#906) 2025-09-14 01:43:37 -04:00
py.typed Add py.typed file (#105) 2024-09-11 08:44:06 -04:00