Commit graph

8 commits

Author SHA1 Message Date
Daniel Chalef
166c67492a Optimize MMR calculation with vectorized numpy operations
This commit implements a comprehensive optimization of the Maximal Marginal Relevance (MMR)
calculation in the search utilities. The key improvements include:

## Algorithm Improvements
- **True MMR Implementation**: Replaced the previous diversity-aware scoring with proper
  iterative MMR algorithm that greedily selects documents one at a time
- **Vectorized Operations**: Leveraged numpy's optimized BLAS operations through matrix
  multiplication instead of individual dot products
- **Adaptive Strategy**: Uses different optimization strategies for small (≤100) and large
  datasets to balance performance and memory usage

## Performance Optimizations
- **Memory Efficiency**: Reduced memory complexity from O(n²) to O(n) for large datasets
- **BLAS Optimization**: Proper use of matrix multiplication leverages optimized BLAS libraries
- **Batch Normalization**: Added `normalize_embeddings_batch()` for efficient L2 normalization
  of multiple embeddings at once
- **Early Termination**: Stops selection when no candidates meet minimum score threshold

## Key Changes
- `maximal_marginal_relevance()`: Complete rewrite with proper iterative MMR algorithm
- `normalize_embeddings_batch()`: New function for efficient batch normalization
- `_mmr_small_dataset()`: Optimized implementation for small datasets using precomputed
  similarity matrices
- Added comprehensive test suite with 9 test cases covering edge cases, correctness,
  and performance scenarios

## Benefits
- **Correctness**: Now implements true MMR algorithm instead of approximate diversity scoring
- **Memory Usage**: O(n) memory complexity vs O(n²) for the original implementation
- **Scalability**: Better performance characteristics for large datasets
- **Maintainability**: Cleaner, more readable code with comprehensive test coverage

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-18 11:54:15 -07:00
Daniel Chalef
0f6ac57dab
chore: update version to 0.9.3 and restructure dependencies (#338)
* Bump version from 0.9.0 to 0.9.1 in pyproject.toml and update google-genai dependency to >=0.1.0

* Bump version from 0.9.1 to 0.9.2 in pyproject.toml

* Update google-genai dependency version to >=0.8.0 in pyproject.toml

* loc file

* Update pyproject.toml to version 0.9.3, restructure dependencies, and modify author format. Remove outdated Google API key note from README.md.

* upgrade poetry and ruff
2025-04-08 20:47:38 -07:00
Preston Rasmussen
9efa6762d7
entity typo (#274) 2025-02-24 12:44:17 -05:00
Preston Rasmussen
088029a80c
node label filters (#265)
* node label filters

* update

* add search filters

* updates

* bump versions

* update tests

* test update
2025-02-21 12:38:01 -05:00
Preston Rasmussen
d7c20c1f59
Search refactor + Community search (#111)
* WIP

* WIP

* WIP

* community search

* WIP

* WIP

* integration tested

* tests

* tests

* mypy

* mypy

* format
2024-09-16 14:03:05 -04:00
Preston Rasmussen
42fb590606
Add group ids (#89)
* set and retrieve group ids

* update add episode with group id support

* add episode and search functional

* update bulk

* mypy updates

* remove unused imports

* update unit tests

* unit tests

* add optional uuid field

* format

* mypy

* ellipsis
2024-09-06 12:33:42 -04:00
Preston Rasmussen
06d8d9359f
Add Missing Node and edge CRUD (#51)
* add CRUD operations and fix search limit bugs

* format

* update tests

* å

* update tests to double limit call

* add default field

* format

* import correct field
2024-08-27 16:18:01 -04:00
Daniel Chalef
2d0705fc1b
Add get_nodes_by_query method to Graphiti class (#49)
* Add get_nodes_by_query method to Graphiti class

Add a method to the Graphiti class that wraps `get_relevant_nodes` and returns a list of nodes given a query.

* Add `get_nodes_by_query` method to the `Graphiti` class in `graphiti_core/graphiti.py`.
* Import `generate_embedding` from `graphiti_core/llm_client/utils.py`.
* Use `generate_embedding` to generate an embedding for the query.
* Call `get_relevant_nodes` with the generated embedding and return the relevant nodes.

Add an embedding function to `llm_client/utils.py`.

* Add `generate_embedding` function to `graphiti_core/llm_client/utils.py`.
* Accept an embedder and model_id as parameters.
* Generate an embedding for the given text and return it.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/getzep/graphiti?shareId=XXXX-XXXX-XXXX-XXXX).

* address comments left by @danielchalef on #49 (Add get_nodes_by_query method to Graphiti class);

* fix ellipsis name in cla config

* feat: Add get_nodes_by_query method to Graphiti class

* chore: Cleanup unused files, add hybrid node search, add tests

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: paulpaliychuk <pavlo.paliychuk.ca@gmail.com>
2024-08-26 20:00:28 -07:00