* fix: Improve deduplication ID validation and logging
- Add comprehensive logging to verify IDs sent to LLM (sent vs received)
- Enhance prompt with explicit ID bounds (0 through N-1)
- Add validation warnings for missing and extra IDs from LLM responses
- Improve error message clarity for invalid dedupe IDs
- Log actual IDs sent to LLM to confirm no index leakage
This helps diagnose cases where the LLM returns IDs outside the valid
range (e.g., ID 19 when only 0-18 were sent).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Remove redundant logging parameter
Address reviewer comment about redundant third parameter in debug log statement.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Address reviewer comments on list slicing and prompt clarity
- Fix list slicing bug: change <= to < to avoid gap when exactly 20 elements
(previously would skip element 10 when showing 21 elements)
- Consolidate redundant prompt phrasing while maintaining clarity
(reduced from 3 sentences to 2, keeping essential constraints)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Remove redundant prompt text to reduce token usage
Consolidate 'using these exact IDs (0 through N-1)' with following sentence
to eliminate repetition. Changes:
- 'using these exact IDs (0 through {N-1}). Do not skip IDs or use IDs outside this range'
- 'with IDs 0 through {N-1}. Do not skip or add IDs'
Saves ~15 tokens per deduplication call.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
- Consolidate issue-triage.yml and issue-deduplication.yml into single workflow with sequential jobs
- Create daily_issue_maintenance.yml with three jobs:
- find-legacy-duplicates: Manual job to scan all open issues for duplicates
- check-stale-issues: Daily job to request confirmation on issues >60 days old
- close-unconfirmed-issues: Daily job to close issues without confirmation after 14 days
- Update triage to use gh CLI tools with database-specific labels (neo4j, falkordb, neptune)
- Separate deduplication into dedicated job using MCP GitHub tools
- Add "duplicate" label to both real-time and batch deduplication workflows
- Update claude-code-review.yml to use latest Sonnet model
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
Add NodeSummaryFilter callback parameter to extract_attributes_from_nodes
and extract_attributes_from_node functions, allowing consumers to
selectively skip summary regeneration for specific nodes.
This enables downstream applications to implement custom logic for
throttling or filtering which nodes should have summaries regenerated,
reducing unnecessary LLM calls and token costs.
Key changes:
- Add NodeSummaryFilter type alias: Callable[[EntityNode], Awaitable[bool]]
- Update extract_attributes_from_nodes with optional should_summarize_node parameter
- Update extract_attributes_from_node with conditional summary generation logic
- Add 5 comprehensive test cases covering callback functionality
- Maintain full backwards compatibility (default None = all summaries generated)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
chore: Update Claude review prompt to focus on critical feedback only
Added instruction to eliminate positive feedback from code reviews, reducing noise and focusing on actionable improvements.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
* fix: Fix typo in JSON entity extraction prompt
Change "an entities" to "any entities" in guideline 1 of the extract_json prompt.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Update graphiti_core/prompts/extract_nodes.py
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
* chore: Update edge extraction prompt to paraphrase instead of quote
- Changed instruction 5 to request paraphrasing rather than verbatim quoting
- Updated string quotes to use double quotes for consistency
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* chore: Format edge_operations.py and update lock file
- Minor formatting fix in edge_operations.py list comprehension
- Update uv.lock with version bump to 0.21.0rc8
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* fix: Prevent duplicate edge facts within same episode
This fixes three related bugs that allowed verbatim duplicate edge facts:
1. Fixed LLM deduplication: Changed related_edges_context to use integer
indices instead of UUIDs, matching the EdgeDuplicate model expectations.
2. Fixed batch deduplication: Removed episode skip in dedupe_edges_bulk
that prevented comparing edges from the same episode. Added self-comparison
guard to prevent edge from comparing against itself.
3. Added fast-path deduplication: Added exact string matching before parallel
processing in resolve_extracted_edges to catch within-episode duplicates
early, preventing race conditions where concurrent edges can't see each other.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* test: Add tests for edge deduplication fixes
Added three tests to verify the edge deduplication fixes:
1. test_dedupe_edges_bulk_deduplicates_within_episode: Verifies that
dedupe_edges_bulk now compares edges from the same episode after
removing the `if i == j: continue` check.
2. test_resolve_extracted_edge_uses_integer_indices_for_duplicates:
Validates that the LLM receives integer indices for duplicate
detection and correctly processes returned duplicate_facts.
3. test_resolve_extracted_edges_fast_path_deduplication: Confirms that
the fast-path exact string matching deduplicates identical edges
before parallel processing, preventing race conditions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Remove unused variables flagged by ruff
- Remove unused loop variable 'j' in bulk_utils.py
- Remove unused return value 'edges_by_episode' in test
- Replace unused 'edge_uuid' with '_' in test loop
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Replace MULTILINGUAL_EXTRACTION_RESPONSES constant with configurable
get_extraction_language_instruction() function to improve determinism
and allow customization.
Changes:
- Replace constant with function in client.py
- Update all LLM client implementations to use new function
- Maintain backward compatibility with same default behavior
- Enable users to override function for custom language requirements
Users can now customize extraction behavior by monkey-patching:
```python
import graphiti_core.llm_client.client as client
client.get_extraction_language_instruction = lambda: "Custom instruction"
```
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
Add guideline to extract entities from all JSON properties, not just primary fields like name/user. This ensures comprehensive entity extraction while maintaining the existing exclusion of date properties.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
* chore: Update dependencies and enhance edge resolution logic
- Add new dependencies: boto3, opensearch-py, and langchain-aws to pyproject.toml.
- Modify Graphiti class to handle additional parameters in edge resolution.
- Improve edge type handling in deduplication logic by introducing custom edge type names.
- Enhance tests for edge resolution to cover new scenarios and ensure correct behavior.
This update improves the flexibility and functionality of edge operations while ensuring compatibility with new libraries.
* refactor: Clean up test_edge_operations.py and format response returns
- Remove unnecessary stubs for opensearchpy module.
- Format return values in llm_client.generate_response for consistency.
- Enhance readability by ensuring proper indentation and structure in test cases.
This refactor improves the clarity and maintainability of the test suite for edge operations.
* bump version to 0.30.0pre5 and enhance docstring for resolve_extracted_edge function
- Update version in pyproject.toml to 0.30.0pre5.
- Add detailed docstring to resolve_extracted_edge function in edge_operations.py, clarifying parameters and return values.
This update improves documentation clarity for the edge resolution process.
* fix: Add edge type validation based on node labels
- Add DEFAULT_EDGE_NAME constant for 'RELATES_TO'
- Implement pre-resolution validation to reset invalid edge names
- Add post-resolution validation for LLM-returned fact types
- Rename parameter from edge_types to edge_type_candidates for clarity
- Add comprehensive tests for validation scenarios
This ensures edges conform to edge_type_map constraints and prevents
misclassification when edge types don't match node label pairs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* chore: Bump version to 0.30.0pre4
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Refactor deduplication logic to enhance node resolution and track duplicate pairs (#929)
* Simplify deduplication process in bulk_utils by reusing canonical nodes.
* Update dedup_helpers to store duplicate pairs during resolution.
* Modify node_operations to append duplicate pairs when resolving nodes.
* Add tests to verify deduplication behavior and ensure correct state updates.
* reveret to concurrent dedup with fanout and then reconcilation
* add performance note for deduplication loop in bulk_utils
* enhance deduplication logic in bulk_utils to handle missing canonical nodes gracefully
* Update graphiti_core/utils/bulk_utils.py
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
* refactor deduplication logic in bulk_utils to use directed union-find for canonical UUID resolution
* implement _build_directed_uuid_map for efficient UUID resolution in bulk_utils
* document directed union-find lookup in bulk_utils for clarity
---------
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
* add repository guidelines and project structure documentation
* update neo4j image version and modify test command to disable specific databases
* implement deduplication helpers and integrate with node operations
* refactor string formatting to use single quotes in node operations
* enhance deduplication helpers with UUID indexing and update resolution logic
* implement exact fact matching (#931)