graphiti/tests
Daniel Chalef 420676faf2
fix: Prevent duplicate edge facts within same episode (#955)
* fix: Prevent duplicate edge facts within same episode

This fixes three related bugs that allowed verbatim duplicate edge facts:

1. Fixed LLM deduplication: Changed related_edges_context to use integer
   indices instead of UUIDs, matching the EdgeDuplicate model expectations.

2. Fixed batch deduplication: Removed episode skip in dedupe_edges_bulk
   that prevented comparing edges from the same episode. Added self-comparison
   guard to prevent edge from comparing against itself.

3. Added fast-path deduplication: Added exact string matching before parallel
   processing in resolve_extracted_edges to catch within-episode duplicates
   early, preventing race conditions where concurrent edges can't see each other.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* test: Add tests for edge deduplication fixes

Added three tests to verify the edge deduplication fixes:

1. test_dedupe_edges_bulk_deduplicates_within_episode: Verifies that
   dedupe_edges_bulk now compares edges from the same episode after
   removing the `if i == j: continue` check.

2. test_resolve_extracted_edge_uses_integer_indices_for_duplicates:
   Validates that the LLM receives integer indices for duplicate
   detection and correctly processes returned duplicate_facts.

3. test_resolve_extracted_edges_fast_path_deduplication: Confirms that
   the fast-path exact string matching deduplicates identical edges
   before parallel processing, preventing race conditions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Remove unused variables flagged by ruff

- Remove unused loop variable 'j' in bulk_utils.py
- Remove unused return value 'edges_by_episode' in test
- Replace unused 'edge_uuid' with '_' in test loop

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-01 07:30:30 -07:00
..
cross_encoder Gemini client improvements; Gemini reranker (#645) 2025-06-30 12:55:17 -07:00
driver chore/prepare kuzu integration (#762) 2025-07-29 09:07:34 -04:00
embedder save edge update (#721) 2025-07-14 11:15:38 -04:00
evals add_episode() refactor (#421) 2025-04-30 12:08:52 -04:00
llm_client save edge update (#721) 2025-07-14 11:15:38 -04:00
utils fix: Prevent duplicate edge facts within same episode (#955) 2025-10-01 07:30:30 -07:00
helpers_test.py Add support for Kuzu as the graph driver (#799) 2025-08-27 11:45:21 -04:00
test_edge_int.py Improve node deduplication w/ deterministic matching, LLM fallbacks (#929) 2025-09-25 07:13:19 -07:00
test_entity_exclusion_int.py Add support for Kuzu as the graph driver (#799) 2025-08-27 11:45:21 -04:00
test_graphiti_int.py Add support for Kuzu as the graph driver (#799) 2025-08-27 11:45:21 -04:00
test_graphiti_mock.py don't return index labels (#887) 2025-09-02 12:02:33 -04:00
test_node_int.py Improve node deduplication w/ deterministic matching, LLM fallbacks (#929) 2025-09-25 07:13:19 -07:00