This commit implements a comprehensive optimization of the Maximal Marginal Relevance (MMR)
calculation in the search utilities. The key improvements include:
## Algorithm Improvements
- **True MMR Implementation**: Replaced the previous diversity-aware scoring with proper
iterative MMR algorithm that greedily selects documents one at a time
- **Vectorized Operations**: Leveraged numpy's optimized BLAS operations through matrix
multiplication instead of individual dot products
- **Adaptive Strategy**: Uses different optimization strategies for small (≤100) and large
datasets to balance performance and memory usage
## Performance Optimizations
- **Memory Efficiency**: Reduced memory complexity from O(n²) to O(n) for large datasets
- **BLAS Optimization**: Proper use of matrix multiplication leverages optimized BLAS libraries
- **Batch Normalization**: Added `normalize_embeddings_batch()` for efficient L2 normalization
of multiple embeddings at once
- **Early Termination**: Stops selection when no candidates meet minimum score threshold
## Key Changes
- `maximal_marginal_relevance()`: Complete rewrite with proper iterative MMR algorithm
- `normalize_embeddings_batch()`: New function for efficient batch normalization
- `_mmr_small_dataset()`: Optimized implementation for small datasets using precomputed
similarity matrices
- Added comprehensive test suite with 9 test cases covering edge cases, correctness,
and performance scenarios
## Benefits
- **Correctness**: Now implements true MMR algorithm instead of approximate diversity scoring
- **Memory Usage**: O(n) memory complexity vs O(n²) for the original implementation
- **Scalability**: Better performance characteristics for large datasets
- **Maintainability**: Cleaner, more readable code with comprehensive test coverage
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: remove global DEFAULT_DATABASE usage in favor of driver-specific
config
Fixes bugs introduced in PR #607. This removes reliance on the global
DEFAULT_DATABASE environment variable. It specifies the database within
each driver. PR #607 introduced a Neo4j compatability, as the database
names are different when attempting to support FalkorDB.
This refactor improves compatability across database types and ensures
future reliance by isolating the configuraiton to the driver level.
* fix: make falkordb support optional
This ensures that the the optional dependency and subsequent import is compliant with the graphiti-core project dependencies.
* chore: fmt code
* chore: undo changes to uv.lock
* fix: undo potentially breaking changes to drive interface
* fix: ensure a default database of "None" is provided - falling back to internal default
* chore: ensure default value exists for session and delete_all_indexes
* chore: fix typos and grammar
* chore: update package versions and dependencies in uv.lock and bulk_utils.py
* docs: update database configuration instructions for Neo4j and FalkorDB
Clarified default database names and how to override them in driver constructors. Updated testing requirements to include specific commands for running integration and unit tests.
* fix: ensure params defaults to an empty dictionary in Neo4jDriver
Updated the execute_query method to initialize params as an empty dictionary if not provided, ensuring compatibility with the database configuration.
---------
Co-authored-by: Urmzd <urmzd@dal.ca>
* migrate to pyright
* Refactor type checking to use Pyright, update dependencies, and clean up code.
- Replaced MyPy with Pyright in configuration files and CI workflows.
- Updated `pyproject.toml` and `uv.lock` to reflect new dependencies and versions.
- Adjusted type hints and fixed minor code issues across various modules for better compatibility with Pyright.
- Added new packages `backoff` and `posthog` to the project dependencies.
* Update CI workflows to install all extra dependencies for type checking and unit tests
* Update dependencies in uv.lock to replace MyPy with Pyright and add nodeenv package. Adjust type hinting in config.py for compatibility with Pyright.