Version 0.1.22 (#438)

* Revert "fix: Add metadata reflection fix to sqlite as well" This reverts commit 394a0b2dfb. * COG-810 Implement a top-down dependency graph builder tool (#268) * feat: parse repo to call graph * Update/repo_processor/top_down_repo_parse.py task * fix: minor improvements * feat: file parsing jedi script optimisation --------- * Add type to DataPoint metadata (#364) * Add missing index_fields * Use DataPoint UUID type in pgvector create_data_points * Make _metadata mandatory everywhere * feat: Add search by dataset for cognee Added ability to search by datasets for cognee users Feature COG-912 * feat: outsources chunking parameters to extract chunk from documents … (#289) * feat: outsources chunking parameters to extract chunk from documents task * fix: Remove backend lock from UI Removed lock that prevented using multiple datasets in cognify Fix COG-912 * COG 870 Remove duplicate edges from the code graph (#293) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com> * test: Added test for getting of documents for search Added test to verify getting documents related to datasets intended for search Test COG-912 * Structured code summarization (#375) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings * Structured code summarization * add missing prompt file * Remove summarization_model argument from summarize_code and fix typehinting * minor refactors --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com> * fix: Resolve issue with cognify router graph model default value Resolve issue with default value for graph model in cognify endpoint Fix * chore: Resolve typo in getting documents code Resolve typo in code chore COG-912 * Update .github/workflows/dockerhub.yml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/workflows/dockerhub.yml * Update .github/workflows/dockerhub.yml * Update .github/workflows/dockerhub.yml * Update get_cognify_router.py * fix: Resolve syntax issue with cognify router Resolve syntax issue with cognify router Fix * feat: Add ruff pre-commit hook for linting and formatting Added formatting and linting on pre-commit hook Feature COG-650 * chore: Update ruff lint options in pyproject file Update ruff lint options in pyproject file Chore * test: Add ruff linter github action Added linting check with ruff in github actions Test COG-650 * feat: deletes executor limit from get_repo_file_dependencies * feat: implements mock feature in LiteLLM engine * refactor: Remove changes to cognify router Remove changes to cognify router Refactor COG-650 * fix: fixing boolean env for github actions * test: Add test for ruff format for cognee code Test if code is formatted for cognee Test COG-650 * refactor: Rename ruff gh actions Rename ruff gh actions to be more understandable Refactor COG-650 * chore: Remove checking of ruff lint and format on push Remove checking of ruff lint and format on push Chore COG-650 * feat: Add deletion of local files when deleting data Delete local files when deleting data from cognee Feature COG-475 * fix: changes back the max workers to 12 * feat: Adds mock summary for codegraph pipeline * refacotr: Add current development status Save current development status Refactor * Fix langfuse * Fix langfuse * Fix langfuse * Add evaluation notebook * Rename eval notebook * chore: Add temporary state of development Add temp development state to branch Chore * fix: Add poetry.lock file, make langfuse mandatory Added langfuse as mandatory dependency, added poetry.lock file Fix * Fix: fixes langfuse config settings * feat: Add deletion of local files made by cognee through data endpoint Delete local files made by cognee when deleting data from database through endpoint Feature COG-475 * test: Revert changes on test_pgvector Revert changes on test_pgvector which were made to test deletion of local files Test COG-475 * chore: deletes the old test for the codegraph pipeline * test: Add test to verify deletion of local files Added test that checks local files created by cognee will be deleted and those not created by cognee won't Test COG-475 * chore: deletes unused old version of the codegraph * chore: deletes unused imports from code_graph_pipeline * Ingest non-code files * Fixing review findings * Ingest non-code files (#395) * Ingest non-code files * Fixing review findings * test: Update test regarding message Update assertion message, add veryfing of file existence * Handle retryerrors in code summary (#396) * Handle retryerrors in code summary * Log instead of print * fix: updates the acreate_structured_output * chore: Add logging to sentry when file which should exist can't be found Log to sentry that a file which should exist can't be found Chore COG-475 * Fix diagram * fix: refactor mcp * Add Smithery CLI installation instructions and badge * Move readme * Update README.md * Update README.md * Cog 813 source code chunks (#383) * fix: pass the list of all CodeFiles to enrichment task * feat: introduce SourceCodeChunk, update metadata * feat: get_source_code_chunks code graph pipeline task * feat: integrate get_source_code_chunks task, comment out summarize_code * Fix code summarization (#387) * feat: update data models * feat: naive parse long strings in source code * fix: get_non_py_files instead of get_non_code_files * fix: limit recursion, add comment * handle embedding empty input error (#398) * feat: robustly handle CodeFile source code * refactor: sort imports * todo: add support for other embedding models * feat: add custom logger * feat: add robustness to get_source_code_chunks Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat: improve embedding exceptions * refactor: format indents, rename module --------- Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Fix diagram * Fix diagram * Fix instructions * Fix instructions * adding and fixing files * Update README.md * ruff format * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Implement PR review * Comment out profiling * Comment out profiling * Comment out profiling * fix: add allowed extensions * fix: adhere UnstructuredDocument.read() to Document * feat: time code graph run and add mock support * Fix ollama, work on visualization * fix: Fixes faulty logging format and sets up error logging in dynamic steps example * Overcome ContextWindowExceededError by checking token count while chunking (#413) * fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints * Adjust AudioDocument and handle None token limit * Handle azure models as well * Fix visualization * Fix visualization * Fix visualization * Add clean logging to code graph example * Remove setting envvars from arg * fix: fixes create_cognee_style_network_with_logo unit test * fix: removes accidental remained print * Fix visualization * Fix visualization * Fix visualization * Get embedding engine instead of passing it. Get it from vector engine instead of direct getter. * Fix visualization * Fix visualization * Fix poetry issues * Get embedding engine instead of passing it in code chunking. * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * chore: Update version of poetry install action * chore: Update action to trigger on pull request for any branch * chore: Remove if in github action to allow triggering on push * chore: Remove if condition to allow gh actions to trigger on push to PR * chore: Update poetry version in github actions * chore: Set fixed ubuntu version to 22.04 * chore: Update py lint to use ubuntu 22.04 * chore: update ubuntu version to 22.04 * feat: implements the first version of graph based completion in search * chore: Update python 3.9 gh action to use 3.12 instead * chore: Update formatting of utils.py * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Adjust integration tests * fix: Fixes ruff formatting * Handle circular import * fix: Resolve profiler issue with partial and recursive logger imports Resolve issue for profiler with partial and recursive logger imports * fix: Remove logger from __init__.py file * test: Test profiling on HEAD branch * test: Return profiler to base branch * Set max_tokens in config * Adjust SWE-bench script to code graph pipeline call * Adjust SWE-bench script to code graph pipeline call * fix: Add fix for accessing dictionary elements that don't exits Using get for the text key instead of direct access to handle situation if the text key doesn't exist * feat: Add ability to change graph database configuration through cognee * feat: adds pydantic types to graph layer models * test: Test ubuntu 24.04 * test: change all actions to ubuntu-latest * feat: adds basic retriever for swe bench * Match Ruff version in config to the one in github actions * feat: implements code retreiver * Fix: fixes unit test for codepart search * Format with Ruff 0.9.0 * Fix: deleting incorrect repo path * docs: Add LlamaIndex Cognee integration notebook Added LlamaIndex Cognee integration notebook * test: Add github action for testing llama index cognee integration notebook * fix: resolve issue with langfuse dependency installation when integrating cognee in different packages * version: Increase version to 0.1.21 * fix: update dependencies of the mcp server * Update README.md * Fix: Fixes logging setup * feat: deletes on the fly embeddings as uses edge collections * fix: Change nbformat on llama index integration notebook * fix: Resolve api key issue with llama index integration notebook * fix: Attempt to resolve issue with Ubuntu 24.04 segmentation fault * version: Increase version to 0.1.22 --------- Co-authored-by: vasilije <vas.markovic@gmail.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Rita Aleksziev <alekszievr@gmail.com> Co-authored-by: Henry Mao <1828968+calclavia@users.noreply.github.com>
2025-01-13 22:36:42 +01:00 · 2025-01-13 22:36:42 +01:00 · 0f97f8f71b
commit 0f97f8f71b
parent 886e9c7eb3
17 changed files with 336 additions and 81 deletions
--- a/.github/workflows/dockerhub.yml
+++ b/.github/workflows/dockerhub.yml
@ -7,7 +7,7 @@ on:

 jobs:
  docker-build-and-push:
-    runs-on: ubuntu-22.04
+    runs-on: ubuntu-latest

    steps:
    - name: Checkout repository
--- a/.github/workflows/py_lint.yml
+++ b/.github/workflows/py_lint.yml
@ -16,7 +16,7 @@ jobs:
      fail-fast: true
      matrix:
        os:
-          - ubuntu-22.04
+          - ubuntu-latest
        python-version: ["3.10.x", "3.11.x"]

    defaults:
--- a/.github/workflows/reusable_notebook.yml
+++ b/.github/workflows/reusable_notebook.yml
@ -51,6 +51,7 @@ jobs:
        env:
          ENV: 'dev'
          LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }}
          GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }}
        run: |
--- a/.github/workflows/ruff_format.yaml
+++ b/.github/workflows/ruff_format.yaml
@ -3,7 +3,7 @@ on: [ pull_request ]

 jobs:
  ruff:
-    runs-on: ubuntu-22.04
+    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/ruff-action@v2
--- a/.github/workflows/ruff_lint.yaml
+++ b/.github/workflows/ruff_lint.yaml
@ -3,7 +3,7 @@ on: [ pull_request ]

 jobs:
  ruff:
-    runs-on: ubuntu-22.04
+    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/ruff-action@v2
--- a/.github/workflows/test_deduplication.yml
+++ b/.github/workflows/test_deduplication.yml
@ -16,7 +16,7 @@ env:
 jobs:
  run_deduplication_test:
    name: test
-    runs-on: ubuntu-22.04
+    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash
--- a/.github/workflows/test_llama_index_cognee_integration_notebook.yml
+++ b/.github/workflows/test_llama_index_cognee_integration_notebook.yml
@ -0,0 +1,20 @@
+name: test | llama index cognee integration notebook
+
+on:
+  workflow_dispatch:
+  pull_request:
+    types: [labeled, synchronize]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  run_notebook_test:
+      uses: ./.github/workflows/reusable_notebook.yml
+      with:
+        notebook-location: notebooks/llama_index_cognee_integration.ipynb
+      secrets:
+        OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+        GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }}
+        GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }}
--- a/.github/workflows/test_qdrant.yml
+++ b/.github/workflows/test_qdrant.yml
@ -17,7 +17,7 @@ jobs:

  run_qdrant_integration_test:
    name: test
-    runs-on: ubuntu-22.04
+    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash
--- a/.github/workflows/test_weaviate.yml
+++ b/.github/workflows/test_weaviate.yml
@ -17,7 +17,7 @@ jobs:

  run_weaviate_integration_test:
    name: test
-    runs-on: ubuntu-22.04
+    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash
--- a/README.md
+++ b/README.md
@ -101,15 +101,9 @@ cognee.config.set_graphistry_config({
 })
 ```

-(Optional) To run the UI, go to cognee-frontend directory and run:
-```
-npm run dev
-```
-or run everything in a docker container:
-```
-docker-compose up
-```
-Then navigate to localhost:3000
+(Optional) To run the with an UI, go to cognee-mcp directory and follow the instructions.
+You will be able to use cognee as mcp tool and create graphs and query them.
+

 If you want to use Cognee with PostgreSQL, make sure to set the following values in the .env file:
 ```
--- a/cognee/modules/graph/cognee_graph/CogneeGraph.py
+++ b/cognee/modules/graph/cognee_graph/CogneeGraph.py
@ -8,7 +8,7 @@ from cognee.infrastructure.databases.graph.graph_db_interface import GraphDBInte
 from cognee.modules.graph.cognee_graph.CogneeGraphElements import Node, Edge
 from cognee.modules.graph.cognee_graph.CogneeAbstractGraph import CogneeAbstractGraph
 import heapq
-from graphistry import edges
+import asyncio


 class CogneeGraph(CogneeAbstractGraph):
@ -127,51 +127,25 @@ class CogneeGraph(CogneeAbstractGraph):
                else:
                    print(f"Node with id {node_id} not found in the graph.")

-    async def map_vector_distances_to_graph_edges(
-        self, vector_engine, query
-    ) -> None:  # :TODO: When we calculate edge embeddings in vector db change this similarly to node mapping
+    async def map_vector_distances_to_graph_edges(self, vector_engine, query) -> None:
        try:
-            # Step 1: Generate the query embedding
            query_vector = await vector_engine.embed_data([query])
            query_vector = query_vector[0]
            if query_vector is None or len(query_vector) == 0:
                raise ValueError("Failed to generate query embedding.")

-            # Step 2: Collect all unique relationship types
-            unique_relationship_types = set()
-            for edge in self.edges:
-                relationship_type = edge.attributes.get("relationship_type")
-                if relationship_type:
-                    unique_relationship_types.add(relationship_type)
+            edge_distances = await vector_engine.get_distance_from_collection_elements(
+                "edge_type_relationship_name", query_text=query
+            )

-            # Step 3: Embed all unique relationship types
-            unique_relationship_types = list(unique_relationship_types)
-            relationship_type_embeddings = await vector_engine.embed_data(unique_relationship_types)
+            embedding_map = {result.payload["text"]: result.score for result in edge_distances}

-            # Step 4: Map relationship types to their embeddings and calculate distances
-            embedding_map = {}
-            for relationship_type, embedding in zip(
-                unique_relationship_types, relationship_type_embeddings
-            ):
-                edge_vector = np.array(embedding)
-
-                # Calculate cosine similarity
-                similarity = np.dot(query_vector, edge_vector) / (
-                    np.linalg.norm(query_vector) * np.linalg.norm(edge_vector)
-                )
-                distance = 1 - similarity
-
-                # Round the distance to 4 decimal places and store it
-                embedding_map[relationship_type] = round(distance, 4)
-
-            # Step 4: Assign precomputed distances to edges
            for edge in self.edges:
                relationship_type = edge.attributes.get("relationship_type")
                if not relationship_type or relationship_type not in embedding_map:
                    print(f"Edge {edge} has an unknown or missing relationship type.")
                    continue

-                # Assign the precomputed distance
                edge.attributes["vector_distance"] = embedding_map[relationship_type]

        except Exception as ex:
--- a/cognee/modules/retrieval/brute_force_triplet_search.py
+++ b/cognee/modules/retrieval/brute_force_triplet_search.py
@ -62,24 +62,6 @@ async def brute_force_triplet_search(
    return retrieved_results


-def delete_duplicated_vector_db_elements(
-    collections, results
-):  #:TODO: This is just for now to fix vector db duplicates
-    results_dict = {}
-    for collection, results in zip(collections, results):
-        seen_ids = set()
-        unique_results = []
-        for result in results:
-            if result.id not in seen_ids:
-                unique_results.append(result)
-                seen_ids.add(result.id)
-            else:
-                print(f"Duplicate found in collection '{collection}': {result.id}")
-        results_dict[collection] = unique_results
-
-    return results_dict
-
-
 async def brute_force_search(
    query: str, user: User, top_k: int, collections: List[str] = None
 ) -> list:
@ -125,10 +107,7 @@ async def brute_force_search(
            ]
        )

-        ############################################# :TODO: Change when vector db does not contain duplicates
-        node_distances = delete_duplicated_vector_db_elements(collections, results)
-        # node_distances = {collection: result for collection, result in zip(collections, results)}
-        ##############################################
+        node_distances = {collection: result for collection, result in zip(collections, results)}

        memory_fragment = CogneeGraph()

@ -140,14 +119,12 @@ async def brute_force_search(

        await memory_fragment.map_vector_distances_to_graph_nodes(node_distances=node_distances)

-        #:TODO: Change when vectordb contains edge embeddings
        await memory_fragment.map_vector_distances_to_graph_edges(vector_engine, query)

        results = await memory_fragment.calculate_top_triplet_importances(k=top_k)

        send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id)

-        #:TODO: Once we have Edge pydantic models we should retrieve the exact edge and node objects from graph db
        return results

    except Exception as e:
--- a/cognee/modules/users/methods/get_default_user.py
+++ b/cognee/modules/users/methods/get_default_user.py
@ -1,4 +1,4 @@
-from sqlalchemy.orm import joinedload
+from sqlalchemy.orm import selectinload
 from sqlalchemy.future import select
 from cognee.modules.users.models import User
 from cognee.infrastructure.databases.relational import get_relational_engine
@ -11,7 +11,7 @@ async def get_default_user():
    async with db_engine.get_async_session() as session:
        query = (
            select(User)
-            .options(joinedload(User.groups))
+            .options(selectinload(User.groups))
            .where(User.email == "default_user@example.com")
        )

--- a/cognee/shared/utils.py
+++ b/cognee/shared/utils.py
@ -468,16 +468,20 @@ def graph_to_tuple(graph):


 def setup_logging(log_level=logging.INFO):
-    """This method sets up the logging configuration."""
+    """Sets up the logging configuration."""
    formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s\n")
+
    stream_handler = logging.StreamHandler(sys.stdout)
    stream_handler.setFormatter(formatter)
    stream_handler.setLevel(log_level)

-    logging.basicConfig(
-        level=log_level,
-        handlers=[stream_handler],
-    )
+    root_logger = logging.getLogger()
+
+    if root_logger.hasHandlers():
+        root_logger.handlers.clear()
+
+    root_logger.addHandler(stream_handler)
+    root_logger.setLevel(log_level)


 # ---------------- Example Usage ----------------
--- a/examples/python/dynamic_steps_example.py
+++ b/examples/python/dynamic_steps_example.py
@ -192,7 +192,7 @@ async def main(enable_steps):


 if __name__ == "__main__":
-    setup_logging(logging.INFO)
+    setup_logging(logging.ERROR)

    rebuild_kg = True
    retrieve = True
--- a/notebooks/llama_index_cognee_integration.ipynb
+++ b/notebooks/llama_index_cognee_integration.ipynb
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [tool.poetry]
 name = "cognee"
-version = "0.1.21"
+version = "0.1.22"
 description = "Cognee - is a library for enriching LLM context with a semantic layer for better understanding and reasoning."
 authors = ["Vasilije Markovic", "Boris Arzentar"]
 readme = "README.md"