enhancement: Optimizing embedding calls in brute_force_search (#1101)
@Vasilije1990 - Use query_vector instead of query_text in brute_force_search <!-- .github/pull_request_template.md --> ## Description [Here](ef1aecd835/cognee/modules/retrieval/utils/brute_force_triplet_search.py (L163)) brute_force_search uses the vector engine to perform the same search — with the same query text — across multiple collections, making the embedding calls unnecessarily proportional to the number of collections being searched. Since the [search](ef1aecd835/cognee/infrastructure/databases/vector/vector_db_interface.py (L85)) interface is already designed to accept precomputed query vectors, I’m submitting an optimization to brute_force_search to take advantage of this. If this is considered good practice, it might be worth implementing a direct query_vector argument in [map_vector_distances_to_graph_edges](ef1aecd835/cognee/modules/graph/cognee_graph/CogneeGraph.py (L135)) , and using it both [here](ef1aecd835/cognee/modules/retrieval/utils/brute_force_triplet_search.py (L179)) and in any future uses of map_vector_distances_to_graph_edges. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Pedro Henrique Thompson Furtado <pedrothompson@petrobras.com.br> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Daulet Amirkhanov <damirkhanov01@gmail.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
This commit is contained in:
parent
d571b9f8bf
commit
16697d1668
1 changed files with 3 additions and 1 deletions
|
|
@ -155,12 +155,14 @@ async def brute_force_search(
|
|||
logger.error("Failed to initialize vector engine: %s", e)
|
||||
raise RuntimeError("Initialization error") from e
|
||||
|
||||
query_vector = (await vector_engine.embedding_engine.embed_text([query]))[0]
|
||||
|
||||
send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id)
|
||||
|
||||
async def search_in_collection(collection_name: str):
|
||||
try:
|
||||
return await vector_engine.search(
|
||||
collection_name=collection_name, query_text=query, limit=0
|
||||
collection_name=collection_name, query_vector=query_vector, limit=0
|
||||
)
|
||||
except CollectionNotFoundError:
|
||||
return []
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue