enhancement: Optimizing embedding calls in brute_force_search (#1101)

@Vasilije1990

- Use query_vector instead of query_text in brute_force_search

<!-- .github/pull_request_template.md -->

## Description

[Here](ef1aecd835/cognee/modules/retrieval/utils/brute_force_triplet_search.py (L163))
brute_force_search uses the vector engine to perform the same search —
with the same query text — across multiple collections, making the
embedding calls unnecessarily proportional to the number of collections
being searched.

Since the
[search](ef1aecd835/cognee/infrastructure/databases/vector/vector_db_interface.py (L85))
interface is already designed to accept precomputed query vectors, I’m
submitting an optimization to brute_force_search to take advantage of
this.

If this is considered good practice, it might be worth implementing a
direct query_vector argument in
[map_vector_distances_to_graph_edges](ef1aecd835/cognee/modules/graph/cognee_graph/CogneeGraph.py (L135))
, and using it both
[here](ef1aecd835/cognee/modules/retrieval/utils/brute_force_triplet_search.py (L179))
and in any future uses of map_vector_distances_to_graph_edges.

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.

---------

Co-authored-by: Pedro Henrique Thompson Furtado <pedrothompson@petrobras.com.br>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
This commit is contained in:
Pedro Thompson 2025-07-22 08:50:25 -03:00 committed by GitHub
parent dad7da2e7b
commit 115585ee9c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 11 additions and 1 deletions

View file

@ -12,6 +12,7 @@ This guide will help you get started and ensure your contributions can be effici
- [Code of Conduct](CODE_OF_CONDUCT.md)
- [Discord Community](https://discord.gg/bcy8xFAtfd)
- [Issue Tracker](https://github.com/topoteretes/cognee/issues)
- [Cognee Docs](https://docs.cognee.ai)
## 1. 🚀 Ways to Contribute
@ -69,6 +70,13 @@ Looking for a place to start? Try filtering for [good first issues](https://gith
git clone https://github.com/<your-github-username>/cognee.git
cd cognee
```
In case you are working on Vector and Graph Adapters
1. Fork the [**cognee**](https://github.com/topoteretes/cognee-community) repository
2. Clone your fork:
```shell
git clone https://github.com/<your-github-username>/cognee-community.git
cd cognee-community
```
### Create a Branch

View file

@ -155,12 +155,14 @@ async def brute_force_search(
logger.error("Failed to initialize vector engine: %s", e)
raise RuntimeError("Initialization error") from e
query_vector = (await vector_engine.embedding_engine.embed_text([query]))[0]
send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id)
async def search_in_collection(collection_name: str):
try:
return await vector_engine.search(
collection_name=collection_name, query_text=query, limit=0
collection_name=collection_name, query_vector=query_vector, limit=0
)
except CollectionNotFoundError:
return []