Version 0.1.22 (#438)

* Revert "fix: Add metadata reflection fix to sqlite as well"

This reverts commit 394a0b2dfb.

* COG-810 Implement a top-down dependency graph builder tool (#268)

* feat: parse repo to call graph

* Update/repo_processor/top_down_repo_parse.py task

* fix: minor improvements

* feat: file parsing jedi script optimisation

---------

* Add type to DataPoint metadata (#364)

* Add missing index_fields

* Use DataPoint UUID type in pgvector create_data_points

* Make _metadata mandatory everywhere

* feat: Add search by dataset for cognee

Added ability to search by datasets for cognee users

Feature COG-912

* feat: outsources chunking parameters to extract chunk from documents … (#289)

* feat: outsources chunking parameters to extract chunk from documents task

* fix: Remove backend lock from UI

Removed lock that prevented using multiple datasets in cognify

Fix COG-912

* COG 870 Remove duplicate edges from the code graph (#293)

* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

---------

Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>

* test: Added test for getting of documents for search

Added test to verify getting documents related to datasets intended for search

Test COG-912

* Structured code summarization (#375)

* feat: turn summarize_code into generator

* feat: extract run_code_graph_pipeline, update the pipeline

* feat: minimal code graph example

* refactor: update argument

* refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline

* refactor: indentation and whitespace nits

* refactor: add deprecated use comments and warnings

* Structured code summarization

* add missing prompt file

* Remove summarization_model argument from summarize_code and fix typehinting

* minor refactors

---------

Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: Boris <boris@topoteretes.com>

* fix: Resolve issue with cognify router graph model default value

Resolve issue with default value for graph model in cognify endpoint

Fix

* chore: Resolve typo in getting documents code

Resolve typo in code

chore COG-912

* Update .github/workflows/dockerhub.yml

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update .github/workflows/dockerhub.yml

* Update .github/workflows/dockerhub.yml

* Update .github/workflows/dockerhub.yml

* Update get_cognify_router.py

* fix: Resolve syntax issue with cognify router

Resolve syntax issue with cognify router

Fix

* feat: Add ruff pre-commit hook for linting and formatting

Added formatting and linting on pre-commit hook

Feature COG-650

* chore: Update ruff lint options in pyproject file

Update ruff lint options in pyproject file

Chore

* test: Add ruff linter github action

Added linting check with ruff in github actions

Test COG-650

* feat: deletes executor limit from get_repo_file_dependencies

* feat: implements mock feature in LiteLLM engine

* refactor: Remove changes to cognify router

Remove changes to cognify router

Refactor COG-650

* fix: fixing boolean env for github actions

* test: Add test for ruff format for cognee code

Test if code is formatted for cognee

Test COG-650

* refactor: Rename ruff gh actions

Rename ruff gh actions to be more understandable

Refactor COG-650

* chore: Remove checking of ruff lint and format on push

Remove checking of ruff lint and format on push

Chore COG-650

* feat: Add deletion of local files when deleting data

Delete local files when deleting data from cognee

Feature COG-475

* fix: changes back the max workers to 12

* feat: Adds mock summary for codegraph pipeline

* refacotr: Add current development status

Save current development status

Refactor

* Fix langfuse

* Fix langfuse

* Fix langfuse

* Add evaluation notebook

* Rename eval notebook

* chore: Add temporary state of development

Add temp development state to branch

Chore

* fix: Add poetry.lock file, make langfuse mandatory

Added langfuse as mandatory dependency, added poetry.lock file

Fix

* Fix: fixes langfuse config settings

* feat: Add deletion of local files made by cognee through data endpoint

Delete local files made by cognee when deleting data from database through endpoint

Feature COG-475

* test: Revert changes on test_pgvector

Revert changes on test_pgvector which were made to test deletion of local files

Test COG-475

* chore: deletes the old test for the codegraph pipeline

* test: Add test to verify deletion of local files

Added test that checks local files created by cognee will be deleted and those not created by cognee won't

Test COG-475

* chore: deletes unused old version of the codegraph

* chore: deletes unused imports from code_graph_pipeline

* Ingest non-code files

* Fixing review findings

* Ingest non-code files (#395)

* Ingest non-code files

* Fixing review findings

* test: Update test regarding message

Update assertion message, add veryfing of file existence

* Handle retryerrors in code summary (#396)

* Handle retryerrors in code summary

* Log instead of print

* fix: updates the acreate_structured_output

* chore: Add logging to sentry when file which should exist can't be found

Log to sentry that a file which should exist can't be found

Chore COG-475

* Fix diagram

* fix: refactor mcp

* Add Smithery CLI installation instructions and badge

* Move readme

* Update README.md

* Update README.md

* Cog 813 source code chunks (#383)

* fix: pass the list of all CodeFiles to enrichment task

* feat: introduce SourceCodeChunk, update metadata

* feat: get_source_code_chunks code graph pipeline task

* feat: integrate get_source_code_chunks task, comment out summarize_code

* Fix code summarization (#387)

* feat: update data models

* feat: naive parse long strings in source code

* fix: get_non_py_files instead of get_non_code_files

* fix: limit recursion, add comment

* handle embedding empty input error (#398)

* feat: robustly handle CodeFile source code

* refactor: sort imports

* todo: add support for other embedding models

* feat: add custom logger

* feat: add robustness to get_source_code_chunks

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* feat: improve embedding exceptions

* refactor: format indents, rename module

---------

Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Fix diagram

* Fix diagram

* Fix instructions

* Fix instructions

* adding and fixing files

* Update README.md

* ruff format

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Fix linter issues

* Implement PR review

* Comment out profiling

* Comment out profiling

* Comment out profiling

* fix: add allowed extensions

* fix: adhere UnstructuredDocument.read() to Document

* feat: time code graph run and add mock support

* Fix ollama, work on visualization

* fix: Fixes faulty logging format and sets up error logging in dynamic steps example

* Overcome ContextWindowExceededError by checking token count while chunking (#413)

* fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints

* Adjust AudioDocument and handle None token limit

* Handle azure models as well

* Fix visualization

* Fix visualization

* Fix visualization

* Add clean logging to code graph example

* Remove setting envvars from arg

* fix: fixes create_cognee_style_network_with_logo unit test

* fix: removes accidental remained print

* Fix visualization

* Fix visualization

* Fix visualization

* Get embedding engine instead of passing it. Get it from vector engine instead of direct getter.

* Fix visualization

* Fix visualization

* Fix poetry issues

* Get embedding engine instead of passing it in code chunking.

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* chore: Update version of poetry install action

* chore: Update action to trigger on pull request for any branch

* chore: Remove if in github action to allow triggering on push

* chore: Remove if condition to allow gh actions to trigger on push to PR

* chore: Update poetry version in github actions

* chore: Set fixed ubuntu version to 22.04

* chore: Update py lint to use ubuntu 22.04

* chore: update ubuntu version to 22.04

* feat: implements the first version of graph based completion in search

* chore: Update python 3.9 gh action to use 3.12 instead

* chore: Update formatting of utils.py

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Fix poetry issues

* Adjust integration tests

* fix: Fixes ruff formatting

* Handle circular import

* fix: Resolve profiler issue with partial and recursive logger imports

Resolve issue for profiler with partial and recursive logger imports

* fix: Remove logger from __init__.py file

* test: Test profiling on HEAD branch

* test: Return profiler to base branch

* Set max_tokens in config

* Adjust SWE-bench script to code graph pipeline call

* Adjust SWE-bench script to code graph pipeline call

* fix: Add fix for accessing dictionary elements that don't exits

Using get for the text key instead of direct access to handle situation if the text key doesn't exist

* feat: Add ability to change graph database configuration through cognee

* feat: adds pydantic types to graph layer models

* test: Test ubuntu 24.04

* test: change all actions to ubuntu-latest

* feat: adds basic retriever for swe bench

* Match Ruff version in config to the one in github actions

* feat: implements code retreiver

* Fix: fixes unit test for codepart search

* Format with Ruff 0.9.0

* Fix: deleting incorrect repo path

* docs: Add LlamaIndex Cognee integration notebook

Added LlamaIndex Cognee integration notebook

* test: Add github action for testing llama index cognee integration notebook

* fix: resolve issue with langfuse dependency installation when integrating cognee in different packages

* version: Increase version to 0.1.21

* fix: update dependencies of the mcp server

* Update README.md

* Fix: Fixes logging setup

* feat: deletes on the fly embeddings as uses edge collections

* fix: Change nbformat on llama index integration notebook

* fix: Resolve api key issue with llama index integration notebook

* fix: Attempt to resolve issue with Ubuntu 24.04 segmentation fault

* version: Increase version to 0.1.22

---------

Co-authored-by: vasilije <vas.markovic@gmail.com>
Co-authored-by: Igor Ilic <igorilic03@gmail.com>
Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com>
Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com>
Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com>
Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Rita Aleksziev <alekszievr@gmail.com>
Co-authored-by: Henry Mao <1828968+calclavia@users.noreply.github.com>
This commit is contained in:
Boris 2025-01-13 22:36:42 +01:00 committed by GitHub
parent 886e9c7eb3
commit 0f97f8f71b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
17 changed files with 336 additions and 81 deletions

View file

@ -7,7 +7,7 @@ on:
jobs: jobs:
docker-build-and-push: docker-build-and-push:
runs-on: ubuntu-22.04 runs-on: ubuntu-latest
steps: steps:
- name: Checkout repository - name: Checkout repository

View file

@ -16,7 +16,7 @@ jobs:
fail-fast: true fail-fast: true
matrix: matrix:
os: os:
- ubuntu-22.04 - ubuntu-latest
python-version: ["3.10.x", "3.11.x"] python-version: ["3.10.x", "3.11.x"]
defaults: defaults:

View file

@ -51,6 +51,7 @@ jobs:
env: env:
ENV: 'dev' ENV: 'dev'
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }} LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }} GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }}
GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }} GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }}
run: | run: |

View file

@ -3,7 +3,7 @@ on: [ pull_request ]
jobs: jobs:
ruff: ruff:
runs-on: ubuntu-22.04 runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- uses: astral-sh/ruff-action@v2 - uses: astral-sh/ruff-action@v2

View file

@ -3,7 +3,7 @@ on: [ pull_request ]
jobs: jobs:
ruff: ruff:
runs-on: ubuntu-22.04 runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- uses: astral-sh/ruff-action@v2 - uses: astral-sh/ruff-action@v2

View file

@ -16,7 +16,7 @@ env:
jobs: jobs:
run_deduplication_test: run_deduplication_test:
name: test name: test
runs-on: ubuntu-22.04 runs-on: ubuntu-latest
defaults: defaults:
run: run:
shell: bash shell: bash

View file

@ -0,0 +1,20 @@
name: test | llama index cognee integration notebook
on:
workflow_dispatch:
pull_request:
types: [labeled, synchronize]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
run_notebook_test:
uses: ./.github/workflows/reusable_notebook.yml
with:
notebook-location: notebooks/llama_index_cognee_integration.ipynb
secrets:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GRAPHISTRY_USERNAME: ${{ secrets.GRAPHISTRY_USERNAME }}
GRAPHISTRY_PASSWORD: ${{ secrets.GRAPHISTRY_PASSWORD }}

View file

@ -17,7 +17,7 @@ jobs:
run_qdrant_integration_test: run_qdrant_integration_test:
name: test name: test
runs-on: ubuntu-22.04 runs-on: ubuntu-latest
defaults: defaults:
run: run:
shell: bash shell: bash

View file

@ -17,7 +17,7 @@ jobs:
run_weaviate_integration_test: run_weaviate_integration_test:
name: test name: test
runs-on: ubuntu-22.04 runs-on: ubuntu-latest
defaults: defaults:
run: run:
shell: bash shell: bash

View file

@ -101,15 +101,9 @@ cognee.config.set_graphistry_config({
}) })
``` ```
(Optional) To run the UI, go to cognee-frontend directory and run: (Optional) To run the with an UI, go to cognee-mcp directory and follow the instructions.
``` You will be able to use cognee as mcp tool and create graphs and query them.
npm run dev
```
or run everything in a docker container:
```
docker-compose up
```
Then navigate to localhost:3000
If you want to use Cognee with PostgreSQL, make sure to set the following values in the .env file: If you want to use Cognee with PostgreSQL, make sure to set the following values in the .env file:
``` ```

View file

@ -8,7 +8,7 @@ from cognee.infrastructure.databases.graph.graph_db_interface import GraphDBInte
from cognee.modules.graph.cognee_graph.CogneeGraphElements import Node, Edge from cognee.modules.graph.cognee_graph.CogneeGraphElements import Node, Edge
from cognee.modules.graph.cognee_graph.CogneeAbstractGraph import CogneeAbstractGraph from cognee.modules.graph.cognee_graph.CogneeAbstractGraph import CogneeAbstractGraph
import heapq import heapq
from graphistry import edges import asyncio
class CogneeGraph(CogneeAbstractGraph): class CogneeGraph(CogneeAbstractGraph):
@ -127,51 +127,25 @@ class CogneeGraph(CogneeAbstractGraph):
else: else:
print(f"Node with id {node_id} not found in the graph.") print(f"Node with id {node_id} not found in the graph.")
async def map_vector_distances_to_graph_edges( async def map_vector_distances_to_graph_edges(self, vector_engine, query) -> None:
self, vector_engine, query
) -> None: # :TODO: When we calculate edge embeddings in vector db change this similarly to node mapping
try: try:
# Step 1: Generate the query embedding
query_vector = await vector_engine.embed_data([query]) query_vector = await vector_engine.embed_data([query])
query_vector = query_vector[0] query_vector = query_vector[0]
if query_vector is None or len(query_vector) == 0: if query_vector is None or len(query_vector) == 0:
raise ValueError("Failed to generate query embedding.") raise ValueError("Failed to generate query embedding.")
# Step 2: Collect all unique relationship types edge_distances = await vector_engine.get_distance_from_collection_elements(
unique_relationship_types = set() "edge_type_relationship_name", query_text=query
for edge in self.edges: )
relationship_type = edge.attributes.get("relationship_type")
if relationship_type:
unique_relationship_types.add(relationship_type)
# Step 3: Embed all unique relationship types embedding_map = {result.payload["text"]: result.score for result in edge_distances}
unique_relationship_types = list(unique_relationship_types)
relationship_type_embeddings = await vector_engine.embed_data(unique_relationship_types)
# Step 4: Map relationship types to their embeddings and calculate distances
embedding_map = {}
for relationship_type, embedding in zip(
unique_relationship_types, relationship_type_embeddings
):
edge_vector = np.array(embedding)
# Calculate cosine similarity
similarity = np.dot(query_vector, edge_vector) / (
np.linalg.norm(query_vector) * np.linalg.norm(edge_vector)
)
distance = 1 - similarity
# Round the distance to 4 decimal places and store it
embedding_map[relationship_type] = round(distance, 4)
# Step 4: Assign precomputed distances to edges
for edge in self.edges: for edge in self.edges:
relationship_type = edge.attributes.get("relationship_type") relationship_type = edge.attributes.get("relationship_type")
if not relationship_type or relationship_type not in embedding_map: if not relationship_type or relationship_type not in embedding_map:
print(f"Edge {edge} has an unknown or missing relationship type.") print(f"Edge {edge} has an unknown or missing relationship type.")
continue continue
# Assign the precomputed distance
edge.attributes["vector_distance"] = embedding_map[relationship_type] edge.attributes["vector_distance"] = embedding_map[relationship_type]
except Exception as ex: except Exception as ex:

View file

@ -62,24 +62,6 @@ async def brute_force_triplet_search(
return retrieved_results return retrieved_results
def delete_duplicated_vector_db_elements(
collections, results
): #:TODO: This is just for now to fix vector db duplicates
results_dict = {}
for collection, results in zip(collections, results):
seen_ids = set()
unique_results = []
for result in results:
if result.id not in seen_ids:
unique_results.append(result)
seen_ids.add(result.id)
else:
print(f"Duplicate found in collection '{collection}': {result.id}")
results_dict[collection] = unique_results
return results_dict
async def brute_force_search( async def brute_force_search(
query: str, user: User, top_k: int, collections: List[str] = None query: str, user: User, top_k: int, collections: List[str] = None
) -> list: ) -> list:
@ -125,10 +107,7 @@ async def brute_force_search(
] ]
) )
############################################# :TODO: Change when vector db does not contain duplicates node_distances = {collection: result for collection, result in zip(collections, results)}
node_distances = delete_duplicated_vector_db_elements(collections, results)
# node_distances = {collection: result for collection, result in zip(collections, results)}
##############################################
memory_fragment = CogneeGraph() memory_fragment = CogneeGraph()
@ -140,14 +119,12 @@ async def brute_force_search(
await memory_fragment.map_vector_distances_to_graph_nodes(node_distances=node_distances) await memory_fragment.map_vector_distances_to_graph_nodes(node_distances=node_distances)
#:TODO: Change when vectordb contains edge embeddings
await memory_fragment.map_vector_distances_to_graph_edges(vector_engine, query) await memory_fragment.map_vector_distances_to_graph_edges(vector_engine, query)
results = await memory_fragment.calculate_top_triplet_importances(k=top_k) results = await memory_fragment.calculate_top_triplet_importances(k=top_k)
send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id) send_telemetry("cognee.brute_force_triplet_search EXECUTION STARTED", user.id)
#:TODO: Once we have Edge pydantic models we should retrieve the exact edge and node objects from graph db
return results return results
except Exception as e: except Exception as e:

View file

@ -1,4 +1,4 @@
from sqlalchemy.orm import joinedload from sqlalchemy.orm import selectinload
from sqlalchemy.future import select from sqlalchemy.future import select
from cognee.modules.users.models import User from cognee.modules.users.models import User
from cognee.infrastructure.databases.relational import get_relational_engine from cognee.infrastructure.databases.relational import get_relational_engine
@ -11,7 +11,7 @@ async def get_default_user():
async with db_engine.get_async_session() as session: async with db_engine.get_async_session() as session:
query = ( query = (
select(User) select(User)
.options(joinedload(User.groups)) .options(selectinload(User.groups))
.where(User.email == "default_user@example.com") .where(User.email == "default_user@example.com")
) )

View file

@ -468,16 +468,20 @@ def graph_to_tuple(graph):
def setup_logging(log_level=logging.INFO): def setup_logging(log_level=logging.INFO):
"""This method sets up the logging configuration.""" """Sets up the logging configuration."""
formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s\n") formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s\n")
stream_handler = logging.StreamHandler(sys.stdout) stream_handler = logging.StreamHandler(sys.stdout)
stream_handler.setFormatter(formatter) stream_handler.setFormatter(formatter)
stream_handler.setLevel(log_level) stream_handler.setLevel(log_level)
logging.basicConfig( root_logger = logging.getLogger()
level=log_level,
handlers=[stream_handler], if root_logger.hasHandlers():
) root_logger.handlers.clear()
root_logger.addHandler(stream_handler)
root_logger.setLevel(log_level)
# ---------------- Example Usage ---------------- # ---------------- Example Usage ----------------

View file

@ -192,7 +192,7 @@ async def main(enable_steps):
if __name__ == "__main__": if __name__ == "__main__":
setup_logging(logging.INFO) setup_logging(logging.ERROR)
rebuild_kg = True rebuild_kg = True
retrieve = True retrieve = True

File diff suppressed because one or more lines are too long

View file

@ -1,6 +1,6 @@
[tool.poetry] [tool.poetry]
name = "cognee" name = "cognee"
version = "0.1.21" version = "0.1.22"
description = "Cognee - is a library for enriching LLM context with a semantic layer for better understanding and reasoning." description = "Cognee - is a library for enriching LLM context with a semantic layer for better understanding and reasoning."
authors = ["Vasilije Markovic", "Boris Arzentar"] authors = ["Vasilije Markovic", "Boris Arzentar"]
readme = "README.md" readme = "README.md"