enhancement: Optimizing embedding calls in brute_force_search (#1101)
@Vasilije1990 - Use query_vector instead of query_text in brute_force_search <!-- .github/pull_request_template.md --> ## Description [Here](ef1aecd835/cognee/modules/retrieval/utils/brute_force_triplet_search.py (L163)) brute_force_search uses the vector engine to perform the same search — with the same query text — across multiple collections, making the embedding calls unnecessarily proportional to the number of collections being searched. Since the [search](ef1aecd835/cognee/infrastructure/databases/vector/vector_db_interface.py (L85)) interface is already designed to accept precomputed query vectors, I’m submitting an optimization to brute_force_search to take advantage of this. If this is considered good practice, it might be worth implementing a direct query_vector argument in [map_vector_distances_to_graph_edges](ef1aecd835/cognee/modules/graph/cognee_graph/CogneeGraph.py (L135)) , and using it both [here](ef1aecd835/cognee/modules/retrieval/utils/brute_force_triplet_search.py (L179)) and in any future uses of map_vector_distances_to_graph_edges. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Pedro Henrique Thompson Furtado <pedrothompson@petrobras.com.br> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
This commit is contained in:
parent
dad7da2e7b
commit
115585ee9c
2 changed files with 11 additions and 1 deletions
|
|
@ -12,6 +12,7 @@ This guide will help you get started and ensure your contributions can be effici
|
|||
- [Code of Conduct](CODE_OF_CONDUCT.md)
|
||||
- [Discord Community](https://discord.gg/bcy8xFAtfd)
|
||||
- [Issue Tracker](https://github.com/topoteretes/cognee/issues)
|
||||
- [Cognee Docs](https://docs.cognee.ai)
|
||||
|
||||
## 1. 🚀 Ways to Contribute
|
||||
|
||||
|
|
@ -69,6 +70,13 @@ Looking for a place to start? Try filtering for [good first issues](https://gith
|
|||
git clone https://github.com/<your-github-username>/cognee.git
|
||||
cd cognee
|
||||
```
|
||||
In case you are working on Vector and Graph Adapters
|
||||
1. Fork the [**cognee**](https://github.com/topoteretes/cognee-community) repository
|
||||
2. Clone your fork:
|
||||
```shell
|
||||
git clone https://github.com/<your-github-username>/cognee-community.git
|
||||
cd cognee-community
|
||||
```
|
||||
|
||||
### Create a Branch
|
||||
|
||||
|
|
|
|||
|
|
@ -155,12 +155,14 @@ async def brute_force_search(
|
|||
logger.error("Failed to initialize vector engine: %s", e)
|
||||
raise RuntimeError("Initialization error") from e
|
||||
|
||||
query_vector = (await vector_engine.embedding_engine.embed_text([query]))[0]
|
||||
|
||||
send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id)
|
||||
|
||||
async def search_in_collection(collection_name: str):
|
||||
try:
|
||||
return await vector_engine.search(
|
||||
collection_name=collection_name, query_text=query, limit=0
|
||||
collection_name=collection_name, query_vector=query_vector, limit=0
|
||||
)
|
||||
except CollectionNotFoundError:
|
||||
return []
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue