* Revert "fix: Add metadata reflection fix to sqlite as well"
This reverts commit 394a0b2dfb.
* COG-810 Implement a top-down dependency graph builder tool (#268)
* feat: parse repo to call graph
* Update/repo_processor/top_down_repo_parse.py task
* fix: minor improvements
* feat: file parsing jedi script optimisation
---------
* Add type to DataPoint metadata (#364)
* Add missing index_fields
* Use DataPoint UUID type in pgvector create_data_points
* Make _metadata mandatory everywhere
* feat: Add search by dataset for cognee
Added ability to search by datasets for cognee users
Feature COG-912
* feat: outsources chunking parameters to extract chunk from documents … (#289)
* feat: outsources chunking parameters to extract chunk from documents task
* fix: Remove backend lock from UI
Removed lock that prevented using multiple datasets in cognify
Fix COG-912
* COG 870 Remove duplicate edges from the code graph (#293)
* feat: turn summarize_code into generator
* feat: extract run_code_graph_pipeline, update the pipeline
* feat: minimal code graph example
* refactor: update argument
* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline
* refactor: indentation and whitespace nits
* refactor: add deprecated use comments and warnings
---------
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
* test: Added test for getting of documents for search
Added test to verify getting documents related to datasets intended for search
Test COG-912
* Structured code summarization (#375)
* feat: turn summarize_code into generator
* feat: extract run_code_graph_pipeline, update the pipeline
* feat: minimal code graph example
* refactor: update argument
* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline
* refactor: indentation and whitespace nits
* refactor: add deprecated use comments and warnings
* Structured code summarization
* add missing prompt file
* Remove summarization_model argument from summarize_code and fix typehinting
* minor refactors
---------
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>
* fix: Resolve issue with cognify router graph model default value
Resolve issue with default value for graph model in cognify endpoint
Fix
* chore: Resolve typo in getting documents code
Resolve typo in code
chore COG-912
* Update .github/workflows/dockerhub.yml
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Update .github/workflows/dockerhub.yml
* Update .github/workflows/dockerhub.yml
* Update .github/workflows/dockerhub.yml
* Update get_cognify_router.py
* fix: Resolve syntax issue with cognify router
Resolve syntax issue with cognify router
Fix
* feat: Add ruff pre-commit hook for linting and formatting
Added formatting and linting on pre-commit hook
Feature COG-650
* chore: Update ruff lint options in pyproject file
Update ruff lint options in pyproject file
Chore
* test: Add ruff linter github action
Added linting check with ruff in github actions
Test COG-650
* feat: deletes executor limit from get_repo_file_dependencies
* feat: implements mock feature in LiteLLM engine
* refactor: Remove changes to cognify router
Remove changes to cognify router
Refactor COG-650
* fix: fixing boolean env for github actions
* test: Add test for ruff format for cognee code
Test if code is formatted for cognee
Test COG-650
* refactor: Rename ruff gh actions
Rename ruff gh actions to be more understandable
Refactor COG-650
* chore: Remove checking of ruff lint and format on push
Remove checking of ruff lint and format on push
Chore COG-650
* feat: Add deletion of local files when deleting data
Delete local files when deleting data from cognee
Feature COG-475
* fix: changes back the max workers to 12
* feat: Adds mock summary for codegraph pipeline
* refacotr: Add current development status
Save current development status
Refactor
* Fix langfuse
* Fix langfuse
* Fix langfuse
* Add evaluation notebook
* Rename eval notebook
* chore: Add temporary state of development
Add temp development state to branch
Chore
* fix: Add poetry.lock file, make langfuse mandatory
Added langfuse as mandatory dependency, added poetry.lock file
Fix
* Fix: fixes langfuse config settings
* feat: Add deletion of local files made by cognee through data endpoint
Delete local files made by cognee when deleting data from database through endpoint
Feature COG-475
* test: Revert changes on test_pgvector
Revert changes on test_pgvector which were made to test deletion of local files
Test COG-475
* chore: deletes the old test for the codegraph pipeline
* test: Add test to verify deletion of local files
Added test that checks local files created by cognee will be deleted and those not created by cognee won't
Test COG-475
* chore: deletes unused old version of the codegraph
* chore: deletes unused imports from code_graph_pipeline
* Ingest non-code files
* Fixing review findings
* Ingest non-code files (#395)
* Ingest non-code files
* Fixing review findings
* test: Update test regarding message
Update assertion message, add veryfing of file existence
* Handle retryerrors in code summary (#396)
* Handle retryerrors in code summary
* Log instead of print
* fix: updates the acreate_structured_output
* chore: Add logging to sentry when file which should exist can't be found
Log to sentry that a file which should exist can't be found
Chore COG-475
* Fix diagram
* fix: refactor mcp
* Add Smithery CLI installation instructions and badge
* Move readme
* Update README.md
* Update README.md
* Cog 813 source code chunks (#383)
* fix: pass the list of all CodeFiles to enrichment task
* feat: introduce SourceCodeChunk, update metadata
* feat: get_source_code_chunks code graph pipeline task
* feat: integrate get_source_code_chunks task, comment out summarize_code
* Fix code summarization (#387)
* feat: update data models
* feat: naive parse long strings in source code
* fix: get_non_py_files instead of get_non_code_files
* fix: limit recursion, add comment
* handle embedding empty input error (#398)
* feat: robustly handle CodeFile source code
* refactor: sort imports
* todo: add support for other embedding models
* feat: add custom logger
* feat: add robustness to get_source_code_chunks
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* feat: improve embedding exceptions
* refactor: format indents, rename module
---------
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Fix diagram
* Fix diagram
* Fix instructions
* Fix instructions
* adding and fixing files
* Update README.md
* ruff format
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Fix linter issues
* Implement PR review
* Comment out profiling
* Comment out profiling
* Comment out profiling
* fix: add allowed extensions
* fix: adhere UnstructuredDocument.read() to Document
* feat: time code graph run and add mock support
* Fix ollama, work on visualization
* fix: Fixes faulty logging format and sets up error logging in dynamic steps example
* Overcome ContextWindowExceededError by checking token count while chunking (#413)
* fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints
* Adjust AudioDocument and handle None token limit
* Handle azure models as well
* Fix visualization
* Fix visualization
* Fix visualization
* Add clean logging to code graph example
* Remove setting envvars from arg
* fix: fixes create_cognee_style_network_with_logo unit test
* fix: removes accidental remained print
* Fix visualization
* Fix visualization
* Fix visualization
* Get embedding engine instead of passing it. Get it from vector engine instead of direct getter.
* Fix visualization
* Fix visualization
* Fix poetry issues
* Get embedding engine instead of passing it in code chunking.
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* chore: Update version of poetry install action
* chore: Update action to trigger on pull request for any branch
* chore: Remove if in github action to allow triggering on push
* chore: Remove if condition to allow gh actions to trigger on push to PR
* chore: Update poetry version in github actions
* chore: Set fixed ubuntu version to 22.04
* chore: Update py lint to use ubuntu 22.04
* chore: update ubuntu version to 22.04
* feat: implements the first version of graph based completion in search
* chore: Update python 3.9 gh action to use 3.12 instead
* chore: Update formatting of utils.py
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Fix poetry issues
* Adjust integration tests
* fix: Fixes ruff formatting
* Handle circular import
* fix: Resolve profiler issue with partial and recursive logger imports
Resolve issue for profiler with partial and recursive logger imports
* fix: Remove logger from __init__.py file
* test: Test profiling on HEAD branch
* test: Return profiler to base branch
* Set max_tokens in config
* Adjust SWE-bench script to code graph pipeline call
* Adjust SWE-bench script to code graph pipeline call
* fix: Add fix for accessing dictionary elements that don't exits
Using get for the text key instead of direct access to handle situation if the text key doesn't exist
* feat: Add ability to change graph database configuration through cognee
* feat: adds pydantic types to graph layer models
* test: Test ubuntu 24.04
* test: change all actions to ubuntu-latest
* feat: adds basic retriever for swe bench
* Match Ruff version in config to the one in github actions
* feat: implements code retreiver
* Fix: fixes unit test for codepart search
* Format with Ruff 0.9.0
* Fix: deleting incorrect repo path
* docs: Add LlamaIndex Cognee integration notebook
Added LlamaIndex Cognee integration notebook
* test: Add github action for testing llama index cognee integration notebook
* fix: resolve issue with langfuse dependency installation when integrating cognee in different packages
* version: Increase version to 0.1.21
* fix: update dependencies of the mcp server
* Update README.md
* Fix: Fixes logging setup
* feat: deletes on the fly embeddings as uses edge collections
* fix: Change nbformat on llama index integration notebook
* fix: Resolve api key issue with llama index integration notebook
* fix: Attempt to resolve issue with Ubuntu 24.04 segmentation fault
* version: Increase version to 0.1.22
---------
Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Rita Aleksziev <alekszievr@gmail.com>
Co-authored-by: Henry Mao <1828968+calclavia@users.noreply.github.com>
135 lines
4.7 KiB
Python
135 lines
4.7 KiB
Python
import asyncio
|
|
import logging
|
|
from typing import List
|
|
|
|
from cognee.infrastructure.databases.graph import get_graph_engine
|
|
from cognee.infrastructure.databases.vector import get_vector_engine
|
|
from cognee.modules.graph.cognee_graph.CogneeGraph import CogneeGraph
|
|
from cognee.modules.users.methods import get_default_user
|
|
from cognee.modules.users.models import User
|
|
from cognee.shared.utils import send_telemetry
|
|
|
|
|
|
def format_triplets(edges):
|
|
print("\n\n\n")
|
|
|
|
def filter_attributes(obj, attributes):
|
|
"""Helper function to filter out non-None properties, including nested dicts."""
|
|
result = {}
|
|
for attr in attributes:
|
|
value = getattr(obj, attr, None)
|
|
if value is not None:
|
|
# If the value is a dict, extract relevant keys from it
|
|
if isinstance(value, dict):
|
|
nested_values = {
|
|
k: v for k, v in value.items() if k in attributes and v is not None
|
|
}
|
|
result[attr] = nested_values
|
|
else:
|
|
result[attr] = value
|
|
return result
|
|
|
|
triplets = []
|
|
for edge in edges:
|
|
node1 = edge.node1
|
|
node2 = edge.node2
|
|
edge_attributes = edge.attributes
|
|
node1_attributes = node1.attributes
|
|
node2_attributes = node2.attributes
|
|
|
|
# Filter only non-None properties
|
|
node1_info = {key: value for key, value in node1_attributes.items() if value is not None}
|
|
node2_info = {key: value for key, value in node2_attributes.items() if value is not None}
|
|
edge_info = {key: value for key, value in edge_attributes.items() if value is not None}
|
|
|
|
# Create the formatted triplet
|
|
triplet = f"Node1: {node1_info}\nEdge: {edge_info}\nNode2: {node2_info}\n\n\n"
|
|
triplets.append(triplet)
|
|
|
|
return "".join(triplets)
|
|
|
|
|
|
async def brute_force_triplet_search(
|
|
query: str, user: User = None, top_k=5, collections=None
|
|
) -> list:
|
|
if user is None:
|
|
user = await get_default_user()
|
|
|
|
if user is None:
|
|
raise PermissionError("No user found in the system. Please create a user.")
|
|
|
|
retrieved_results = await brute_force_search(query, user, top_k, collections=collections)
|
|
return retrieved_results
|
|
|
|
|
|
async def brute_force_search(
|
|
query: str, user: User, top_k: int, collections: List[str] = None
|
|
) -> list:
|
|
"""
|
|
Performs a brute force search to retrieve the top triplets from the graph.
|
|
|
|
Args:
|
|
query (str): The search query.
|
|
user (User): The user performing the search.
|
|
top_k (int): The number of top results to retrieve.
|
|
collections (Optional[List[str]]): List of collections to query. Defaults to predefined collections.
|
|
|
|
Returns:
|
|
list: The top triplet results.
|
|
"""
|
|
if not query or not isinstance(query, str):
|
|
raise ValueError("The query must be a non-empty string.")
|
|
if top_k <= 0:
|
|
raise ValueError("top_k must be a positive integer.")
|
|
|
|
if collections is None:
|
|
collections = [
|
|
"entity_name",
|
|
"text_summary_text",
|
|
"entity_type_name",
|
|
"document_chunk_text",
|
|
]
|
|
|
|
try:
|
|
vector_engine = get_vector_engine()
|
|
graph_engine = await get_graph_engine()
|
|
except Exception as e:
|
|
logging.error("Failed to initialize engines: %s", e)
|
|
raise RuntimeError("Initialization error") from e
|
|
|
|
send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id)
|
|
|
|
try:
|
|
results = await asyncio.gather(
|
|
*[
|
|
vector_engine.get_distance_from_collection_elements(collection, query_text=query)
|
|
for collection in collections
|
|
]
|
|
)
|
|
|
|
node_distances = {collection: result for collection, result in zip(collections, results)}
|
|
|
|
memory_fragment = CogneeGraph()
|
|
|
|
await memory_fragment.project_graph_from_db(
|
|
graph_engine,
|
|
node_properties_to_project=["id", "description", "name", "type", "text"],
|
|
edge_properties_to_project=["relationship_name"],
|
|
)
|
|
|
|
await memory_fragment.map_vector_distances_to_graph_nodes(node_distances=node_distances)
|
|
|
|
await memory_fragment.map_vector_distances_to_graph_edges(vector_engine, query)
|
|
|
|
results = await memory_fragment.calculate_top_triplet_importances(k=top_k)
|
|
|
|
send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id)
|
|
|
|
return results
|
|
|
|
except Exception as e:
|
|
logging.error(
|
|
"Error during brute force search for user: %s, query: %s. Error: %s", user.id, query, e
|
|
)
|
|
send_telemetry("cognee.brute_force_triplet_search EXECUTION FAILED", user.id)
|
|
raise RuntimeError("An error occurred during brute force search") from e
|