From 16697d16688e85125eabad70dd0e5474c7370e9e Mon Sep 17 00:00:00 2001 From: Pedro Thompson Date: Tue, 22 Jul 2025 08:50:25 -0300 Subject: [PATCH] enhancement: Optimizing embedding calls in brute_force_search (#1101) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @Vasilije1990 - Use query_vector instead of query_text in brute_force_search ## Description [Here](https://github.com/topoteretes/cognee/blob/ef1aecd835b1a2044eb724197bbef77f6dee5d3c/cognee/modules/retrieval/utils/brute_force_triplet_search.py#L163) brute_force_search uses the vector engine to perform the same search — with the same query text — across multiple collections, making the embedding calls unnecessarily proportional to the number of collections being searched. Since the [search](https://github.com/topoteretes/cognee/blob/ef1aecd835b1a2044eb724197bbef77f6dee5d3c/cognee/infrastructure/databases/vector/vector_db_interface.py#L85) interface is already designed to accept precomputed query vectors, I’m submitting an optimization to brute_force_search to take advantage of this. If this is considered good practice, it might be worth implementing a direct query_vector argument in [map_vector_distances_to_graph_edges](https://github.com/topoteretes/cognee/blob/ef1aecd835b1a2044eb724197bbef77f6dee5d3c/cognee/modules/graph/cognee_graph/CogneeGraph.py#L135) , and using it both [here](https://github.com/topoteretes/cognee/blob/ef1aecd835b1a2044eb724197bbef77f6dee5d3c/cognee/modules/retrieval/utils/brute_force_triplet_search.py#L179) and in any future uses of map_vector_distances_to_graph_edges. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Pedro Henrique Thompson Furtado Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Daulet Amirkhanov Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> --- cognee/modules/retrieval/utils/brute_force_triplet_search.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/cognee/modules/retrieval/utils/brute_force_triplet_search.py b/cognee/modules/retrieval/utils/brute_force_triplet_search.py index 381520737..4667f4738 100644 --- a/cognee/modules/retrieval/utils/brute_force_triplet_search.py +++ b/cognee/modules/retrieval/utils/brute_force_triplet_search.py @@ -155,12 +155,14 @@ async def brute_force_search( logger.error("Failed to initialize vector engine: %s", e) raise RuntimeError("Initialization error") from e + query_vector = (await vector_engine.embedding_engine.embed_text([query]))[0] + send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id) async def search_in_collection(collection_name: str): try: return await vector_engine.search( - collection_name=collection_name, query_text=query, limit=0 + collection_name=collection_name, query_vector=query_vector, limit=0 ) except CollectionNotFoundError: return []