graphiti/graphiti_core/utils
Daniel Chalef 9aee3174bd
Refactor batch deduplication logic to enhance node resolution and track duplicate pairs (#929) (#936)
* Refactor deduplication logic to enhance node resolution and track duplicate pairs (#929)

* Simplify deduplication process in bulk_utils by reusing canonical nodes.
* Update dedup_helpers to store duplicate pairs during resolution.
* Modify node_operations to append duplicate pairs when resolving nodes.
* Add tests to verify deduplication behavior and ensure correct state updates.

* reveret to concurrent dedup with fanout and then reconcilation

* add performance note for deduplication loop in bulk_utils

* enhance deduplication logic in bulk_utils to handle missing canonical nodes gracefully

* Update graphiti_core/utils/bulk_utils.py

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

* refactor deduplication logic in bulk_utils to use directed union-find for canonical UUID resolution

* implement _build_directed_uuid_map for efficient UUID resolution in bulk_utils

* document directed union-find lookup in bulk_utils for clarity

---------

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
2025-09-26 08:40:18 -07:00
..
maintenance Refactor batch deduplication logic to enhance node resolution and track duplicate pairs (#929) (#936) 2025-09-26 08:40:18 -07:00
ontology_utils move summary out of attribute extraction (#792) 2025-07-31 12:15:21 -04:00
__init__.py refactor: use utc_now() for consistent UTC datetime handling (#234) 2024-12-09 10:36:04 -08:00
bulk_utils.py Refactor batch deduplication logic to enhance node resolution and track duplicate pairs (#929) (#936) 2025-09-26 08:40:18 -07:00
datetime_utils.py Add support for Kuzu as the graph driver (#799) 2025-08-27 11:45:21 -04:00