From 115585ee9c5868dfca03773b3322edb4fa222f21 Mon Sep 17 00:00:00 2001 From: Pedro Thompson Date: Tue, 22 Jul 2025 08:50:25 -0300 Subject: [PATCH] enhancement: Optimizing embedding calls in brute_force_search (#1101) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @Vasilije1990 - Use query_vector instead of query_text in brute_force_search ## Description [Here](https://github.com/topoteretes/cognee/blob/ef1aecd835b1a2044eb724197bbef77f6dee5d3c/cognee/modules/retrieval/utils/brute_force_triplet_search.py#L163) brute_force_search uses the vector engine to perform the same search — with the same query text — across multiple collections, making the embedding calls unnecessarily proportional to the number of collections being searched. Since the [search](https://github.com/topoteretes/cognee/blob/ef1aecd835b1a2044eb724197bbef77f6dee5d3c/cognee/infrastructure/databases/vector/vector_db_interface.py#L85) interface is already designed to accept precomputed query vectors, I’m submitting an optimization to brute_force_search to take advantage of this. If this is considered good practice, it might be worth implementing a direct query_vector argument in [map_vector_distances_to_graph_edges](https://github.com/topoteretes/cognee/blob/ef1aecd835b1a2044eb724197bbef77f6dee5d3c/cognee/modules/graph/cognee_graph/CogneeGraph.py#L135) , and using it both [here](https://github.com/topoteretes/cognee/blob/ef1aecd835b1a2044eb724197bbef77f6dee5d3c/cognee/modules/retrieval/utils/brute_force_triplet_search.py#L179) and in any future uses of map_vector_distances_to_graph_edges. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Pedro Henrique Thompson Furtado Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Daulet Amirkhanov Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> --- CONTRIBUTING.md | 8 ++++++++ .../modules/retrieval/utils/brute_force_triplet_search.py | 4 +++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9028e2dd1..6d8071b56 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -12,6 +12,7 @@ This guide will help you get started and ensure your contributions can be effici - [Code of Conduct](CODE_OF_CONDUCT.md) - [Discord Community](https://discord.gg/bcy8xFAtfd) - [Issue Tracker](https://github.com/topoteretes/cognee/issues) +- [Cognee Docs](https://docs.cognee.ai) ## 1. šŸš€ Ways to Contribute @@ -69,6 +70,13 @@ Looking for a place to start? Try filtering for [good first issues](https://gith git clone https://github.com//cognee.git cd cognee ``` +In case you are working on Vector and Graph Adapters +1. Fork the [**cognee**](https://github.com/topoteretes/cognee-community) repository +2. Clone your fork: +```shell +git clone https://github.com//cognee-community.git +cd cognee-community +``` ### Create a Branch diff --git a/cognee/modules/retrieval/utils/brute_force_triplet_search.py b/cognee/modules/retrieval/utils/brute_force_triplet_search.py index 381520737..4667f4738 100644 --- a/cognee/modules/retrieval/utils/brute_force_triplet_search.py +++ b/cognee/modules/retrieval/utils/brute_force_triplet_search.py @@ -155,12 +155,14 @@ async def brute_force_search( logger.error("Failed to initialize vector engine: %s", e) raise RuntimeError("Initialization error") from e + query_vector = (await vector_engine.embedding_engine.embed_text([query]))[0] + send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id) async def search_in_collection(collection_name: str): try: return await vector_engine.search( - collection_name=collection_name, query_text=query, limit=0 + collection_name=collection_name, query_vector=query_vector, limit=0 ) except CollectionNotFoundError: return []