graphiti

History

Daniel Chalef 166c67492a Optimize MMR calculation with vectorized numpy operations This commit implements a comprehensive optimization of the Maximal Marginal Relevance (MMR) calculation in the search utilities. The key improvements include: ## Algorithm Improvements - True MMR Implementation: Replaced the previous diversity-aware scoring with proper iterative MMR algorithm that greedily selects documents one at a time - Vectorized Operations: Leveraged numpy's optimized BLAS operations through matrix multiplication instead of individual dot products - Adaptive Strategy: Uses different optimization strategies for small (≤100) and large datasets to balance performance and memory usage ## Performance Optimizations - Memory Efficiency: Reduced memory complexity from O(n²) to O(n) for large datasets - BLAS Optimization: Proper use of matrix multiplication leverages optimized BLAS libraries - Batch Normalization: Added `normalize_embeddings_batch()` for efficient L2 normalization of multiple embeddings at once - Early Termination: Stops selection when no candidates meet minimum score threshold ## Key Changes - `maximal_marginal_relevance()`: Complete rewrite with proper iterative MMR algorithm - `normalize_embeddings_batch()`: New function for efficient batch normalization - `_mmr_small_dataset()`: Optimized implementation for small datasets using precomputed similarity matrices - Added comprehensive test suite with 9 test cases covering edge cases, correctness, and performance scenarios ## Benefits - Correctness: Now implements true MMR algorithm instead of approximate diversity scoring - Memory Usage: O(n) memory complexity vs O(n²) for the original implementation - Scalability: Better performance characteristics for large datasets - Maintainability: Cleaner, more readable code with comprehensive test coverage 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>		2025-07-18 11:54:15 -07:00
..
cross_encoder	Bulk ingestion (#698 )	2025-07-10 12:14:49 -04:00
driver	[Bug Fix] Fix the Group ID usage with FalkorDB (#733 )	2025-07-17 12:35:08 -04:00
embedder	save edge update (#721 )	2025-07-14 11:15:38 -04:00
llm_client	save edge update (#721 )	2025-07-14 11:15:38 -04:00
models	save edge update (#721 )	2025-07-14 11:15:38 -04:00
prompts	Bulk updates (#732 )	2025-07-16 02:26:33 -04:00
search	Optimize MMR calculation with vectorized numpy operations	2025-07-18 11:54:15 -07:00
telemetry	feat: add telemetry with PostHog and update Docker configurations (#633 )	2025-06-27 12:23:30 -07:00
utils	make egg_operations more robust (#737 )	2025-07-16 17:12:20 -04:00
__init__.py	chore: Fix packaging (#38 )	2024-08-25 10:07:50 -07:00
edges.py	Return embeddings option in get_by_uuids (#736 )	2025-07-16 11:09:10 -04:00
errors.py	Add group ID validation and error handling (#618 )	2025-06-24 09:33:54 -07:00
graph_queries.py	update driver (#583 )	2025-06-13 14:12:09 -04:00
graphiti.py	[Bug Fix] Fix the Group ID usage with FalkorDB (#733 )	2025-07-17 12:35:08 -04:00
graphiti_types.py	Add support for falkordb (#575 )	2025-06-13 12:06:57 -04:00
helpers.py	[Bug Fix] Fix the Group ID usage with FalkorDB (#733 )	2025-07-17 12:35:08 -04:00
nodes.py	Return embeddings option in get_by_uuids (#736 )	2025-07-16 11:09:10 -04:00
py.typed	Add py.typed file (#105 )	2024-09-11 08:44:06 -04:00