No description

Find a file

Boris 0f97f8f71b Version 0.1.22 (#438 ) * Revert "fix: Add metadata reflection fix to sqlite as well" This reverts commit `394a0b2dfb`. * COG-810 Implement a top-down dependency graph builder tool (#268) * feat: parse repo to call graph * Update/repo_processor/top_down_repo_parse.py task * fix: minor improvements * feat: file parsing jedi script optimisation --------- * Add type to DataPoint metadata (#364) * Add missing index_fields * Use DataPoint UUID type in pgvector create_data_points * Make _metadata mandatory everywhere * feat: Add search by dataset for cognee Added ability to search by datasets for cognee users Feature COG-912 * feat: outsources chunking parameters to extract chunk from documents … (#289) * feat: outsources chunking parameters to extract chunk from documents task * fix: Remove backend lock from UI Removed lock that prevented using multiple datasets in cognify Fix COG-912 * COG 870 Remove duplicate edges from the code graph (#293) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings --------- Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com> * test: Added test for getting of documents for search Added test to verify getting documents related to datasets intended for search Test COG-912 * Structured code summarization (#375) * feat: turn summarize_code into generator * feat: extract run_code_graph_pipeline, update the pipeline * feat: minimal code graph example * refactor: update argument * refactor: move run_code_graph_pipeline to cognify/code_graph_pipeline * refactor: indentation and whitespace nits * refactor: add deprecated use comments and warnings * Structured code summarization * add missing prompt file * Remove summarization_model argument from summarize_code and fix typehinting * minor refactors --------- Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com> * fix: Resolve issue with cognify router graph model default value Resolve issue with default value for graph model in cognify endpoint Fix * chore: Resolve typo in getting documents code Resolve typo in code chore COG-912 * Update .github/workflows/dockerhub.yml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update .github/workflows/dockerhub.yml * Update .github/workflows/dockerhub.yml * Update .github/workflows/dockerhub.yml * Update get_cognify_router.py * fix: Resolve syntax issue with cognify router Resolve syntax issue with cognify router Fix * feat: Add ruff pre-commit hook for linting and formatting Added formatting and linting on pre-commit hook Feature COG-650 * chore: Update ruff lint options in pyproject file Update ruff lint options in pyproject file Chore * test: Add ruff linter github action Added linting check with ruff in github actions Test COG-650 * feat: deletes executor limit from get_repo_file_dependencies * feat: implements mock feature in LiteLLM engine * refactor: Remove changes to cognify router Remove changes to cognify router Refactor COG-650 * fix: fixing boolean env for github actions * test: Add test for ruff format for cognee code Test if code is formatted for cognee Test COG-650 * refactor: Rename ruff gh actions Rename ruff gh actions to be more understandable Refactor COG-650 * chore: Remove checking of ruff lint and format on push Remove checking of ruff lint and format on push Chore COG-650 * feat: Add deletion of local files when deleting data Delete local files when deleting data from cognee Feature COG-475 * fix: changes back the max workers to 12 * feat: Adds mock summary for codegraph pipeline * refacotr: Add current development status Save current development status Refactor * Fix langfuse * Fix langfuse * Fix langfuse * Add evaluation notebook * Rename eval notebook * chore: Add temporary state of development Add temp development state to branch Chore * fix: Add poetry.lock file, make langfuse mandatory Added langfuse as mandatory dependency, added poetry.lock file Fix * Fix: fixes langfuse config settings * feat: Add deletion of local files made by cognee through data endpoint Delete local files made by cognee when deleting data from database through endpoint Feature COG-475 * test: Revert changes on test_pgvector Revert changes on test_pgvector which were made to test deletion of local files Test COG-475 * chore: deletes the old test for the codegraph pipeline * test: Add test to verify deletion of local files Added test that checks local files created by cognee will be deleted and those not created by cognee won't Test COG-475 * chore: deletes unused old version of the codegraph * chore: deletes unused imports from code_graph_pipeline * Ingest non-code files * Fixing review findings * Ingest non-code files (#395) * Ingest non-code files * Fixing review findings * test: Update test regarding message Update assertion message, add veryfing of file existence * Handle retryerrors in code summary (#396) * Handle retryerrors in code summary * Log instead of print * fix: updates the acreate_structured_output * chore: Add logging to sentry when file which should exist can't be found Log to sentry that a file which should exist can't be found Chore COG-475 * Fix diagram * fix: refactor mcp * Add Smithery CLI installation instructions and badge * Move readme * Update README.md * Update README.md * Cog 813 source code chunks (#383) * fix: pass the list of all CodeFiles to enrichment task * feat: introduce SourceCodeChunk, update metadata * feat: get_source_code_chunks code graph pipeline task * feat: integrate get_source_code_chunks task, comment out summarize_code * Fix code summarization (#387) * feat: update data models * feat: naive parse long strings in source code * fix: get_non_py_files instead of get_non_code_files * fix: limit recursion, add comment * handle embedding empty input error (#398) * feat: robustly handle CodeFile source code * refactor: sort imports * todo: add support for other embedding models * feat: add custom logger * feat: add robustness to get_source_code_chunks Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat: improve embedding exceptions * refactor: format indents, rename module --------- Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Fix diagram * Fix diagram * Fix instructions * Fix instructions * adding and fixing files * Update README.md * ruff format * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Fix linter issues * Implement PR review * Comment out profiling * Comment out profiling * Comment out profiling * fix: add allowed extensions * fix: adhere UnstructuredDocument.read() to Document * feat: time code graph run and add mock support * Fix ollama, work on visualization * fix: Fixes faulty logging format and sets up error logging in dynamic steps example * Overcome ContextWindowExceededError by checking token count while chunking (#413) * fix: Fixes duplicated edges in cognify by limiting the recursion depth in add datapoints * Adjust AudioDocument and handle None token limit * Handle azure models as well * Fix visualization * Fix visualization * Fix visualization * Add clean logging to code graph example * Remove setting envvars from arg * fix: fixes create_cognee_style_network_with_logo unit test * fix: removes accidental remained print * Fix visualization * Fix visualization * Fix visualization * Get embedding engine instead of passing it. Get it from vector engine instead of direct getter. * Fix visualization * Fix visualization * Fix poetry issues * Get embedding engine instead of passing it in code chunking. * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * chore: Update version of poetry install action * chore: Update action to trigger on pull request for any branch * chore: Remove if in github action to allow triggering on push * chore: Remove if condition to allow gh actions to trigger on push to PR * chore: Update poetry version in github actions * chore: Set fixed ubuntu version to 22.04 * chore: Update py lint to use ubuntu 22.04 * chore: update ubuntu version to 22.04 * feat: implements the first version of graph based completion in search * chore: Update python 3.9 gh action to use 3.12 instead * chore: Update formatting of utils.py * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Fix poetry issues * Adjust integration tests * fix: Fixes ruff formatting * Handle circular import * fix: Resolve profiler issue with partial and recursive logger imports Resolve issue for profiler with partial and recursive logger imports * fix: Remove logger from __init__.py file * test: Test profiling on HEAD branch * test: Return profiler to base branch * Set max_tokens in config * Adjust SWE-bench script to code graph pipeline call * Adjust SWE-bench script to code graph pipeline call * fix: Add fix for accessing dictionary elements that don't exits Using get for the text key instead of direct access to handle situation if the text key doesn't exist * feat: Add ability to change graph database configuration through cognee * feat: adds pydantic types to graph layer models * test: Test ubuntu 24.04 * test: change all actions to ubuntu-latest * feat: adds basic retriever for swe bench * Match Ruff version in config to the one in github actions * feat: implements code retreiver * Fix: fixes unit test for codepart search * Format with Ruff 0.9.0 * Fix: deleting incorrect repo path * docs: Add LlamaIndex Cognee integration notebook Added LlamaIndex Cognee integration notebook * test: Add github action for testing llama index cognee integration notebook * fix: resolve issue with langfuse dependency installation when integrating cognee in different packages * version: Increase version to 0.1.21 * fix: update dependencies of the mcp server * Update README.md * Fix: Fixes logging setup * feat: deletes on the fly embeddings as uses edge collections * fix: Change nbformat on llama index integration notebook * fix: Resolve api key issue with llama index integration notebook * fix: Attempt to resolve issue with Ubuntu 24.04 segmentation fault * version: Increase version to 0.1.22 --------- Co-authored-by: vasilije <vas.markovic@gmail.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: lxobr <122801072+lxobr@users.noreply.github.com> Co-authored-by: alekszievr <44192193+alekszievr@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Rita Aleksziev <alekszievr@gmail.com> Co-authored-by: Henry Mao <1828968+calclavia@users.noreply.github.com>		2025-01-13 22:36:42 +01:00
.data	Remove files	2024-12-11 15:34:29 +01:00
.dlt	fix: remove obsolete code	2024-03-13 10:19:03 +01:00
.github	Version 0.1.22 (#438 )	2025-01-13 22:36:42 +01:00
alembic	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
assets	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
bin	chore: enable all origins in cors settings	2024-09-25 14:34:14 +02:00
cognee	Version 0.1.22 (#438 )	2025-01-13 22:36:42 +01:00
cognee-frontend	Switch to gpt-4o-mini by default (#233 )	2024-11-18 17:38:54 +01:00
cognee-mcp	fix: update dependencies of the mcp server	2025-01-12 18:52:58 +01:00
evals	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
examples	Version 0.1.22 (#438 )	2025-01-13 22:36:42 +01:00
licenses	add NOTICE file, reference CoC in contribution guidelines, add licenses folder for external licenses	2024-12-06 13:27:55 +00:00
notebooks	Version 0.1.22 (#438 )	2025-01-13 22:36:42 +01:00
profiling	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
tests	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
tools	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
.dockerignore	chore: add vanilla docker config	2024-06-23 00:36:34 +02:00
.env.template	chore: Make milvus an optional dependency	2024-12-03 10:37:50 +01:00
.gitignore	Merge remote-tracking branch 'origin/main' into code-graph	2024-12-03 21:14:19 +01:00
.pre-commit-config.yaml	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
.pylintrc	fix: enable sqlalchemy adapter	2024-08-04 22:23:28 +02:00
.python-version	chore: update python version to 3.11	2024-03-29 14:10:20 +01:00
alembic.ini	feat: migrate search to tasks (#144 )	2024-10-07 14:41:35 +02:00
CODE_OF_CONDUCT.md	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
CONTRIBUTING.md	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
DCO.md	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
docker-compose.yml	Merge remote-tracking branch 'origin/main' into feature/cog-537-implement-retrieval-algorithm-from-research-paper	2024-11-21 17:07:16 +01:00
Dockerfile	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
entrypoint-old.sh	fix: run frontend in a container	2024-06-23 13:24:58 +02:00
entrypoint.sh	fix: various fixes for the deployment	2024-10-22 11:26:48 +02:00
LICENSE	Update LICENSE	2024-03-30 11:57:07 +01:00
mypy.ini	Improve processing, update networkx client, and Neo4j, and dspy (#69 )	2024-04-20 19:05:40 +02:00
NOTICE.md	add NOTICE file, reference CoC in contribution guidelines, add licenses folder for external licenses	2024-12-06 13:27:55 +00:00
poetry.lock	Version 0.1.21 (#431 )	2025-01-10 19:37:50 +01:00
pyproject.toml	Version 0.1.22 (#438 )	2025-01-13 22:36:42 +01:00
README.md	Version 0.1.22 (#438 )	2025-01-13 22:36:42 +01:00

README.md

cognee

We build for developers who need a reliable, production-ready data layer for AI applications

What is cognee?

Cognee implements scalable, modular ECL (Extract, Cognify, Load) pipelines that allow you to interconnect and retrieve past conversations, documents, and audio transcriptions while reducing hallucinations, developer effort, and cost. Try it in a Google Colab notebook or have a look at our documentation

If you have questions, join our Discord community

📦 Installation

You can install Cognee using either pip or poetry. Support for various databases and vector stores is available through extras.

With pip

pip install cognee

With poetry

poetry add cognee

With pip with specific database support

To install Cognee with support for specific databases use the appropriate command below. Replace <database> with the name of the database you need.

pip install 'cognee[<database>]'

Replace <database> with any of the following databases:

postgres
weaviate
qdrant
neo4j
milvus

Installing Cognee with PostgreSQL and Neo4j support example:

pip install 'cognee[postgres, neo4j]'

With poetry with specific database support

To install Cognee with support for specific databases use the appropriate command below. Replace <database> with the name of the database you need.

poetry add cognee -E <database>

Replace <database> with any of the following databases:

postgres
weaviate
qdrant
neo4j
milvus

Installing Cognee with PostgreSQL and Neo4j support example:

poetry add cognee -E postgres -E neo4j

💻 Basic Usage

Setup

import os

os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"

import cognee
cognee.config.set_llm_api_key("YOUR_OPENAI_API_KEY")

You can also set the variables by creating .env file, here is our template. To use different LLM providers, for more info check out our documentation

If you are using Network, create an account on Graphistry to visualize results:

cognee.config.set_graphistry_config({
    "username": "YOUR_USERNAME",
    "password": "YOUR_PASSWORD"
})

(Optional) To run the with an UI, go to cognee-mcp directory and follow the instructions. You will be able to use cognee as mcp tool and create graphs and query them.

If you want to use Cognee with PostgreSQL, make sure to set the following values in the .env file:

DB_PROVIDER=postgres

DB_HOST=postgres
DB_PORT=5432

DB_NAME=cognee_db
DB_USERNAME=cognee
DB_PASSWORD=cognee

Simple example

First, copy .env.template to .env and add your OpenAI API key to the LLM_API_KEY field.

This script will run the default pipeline:

import cognee
import asyncio
from cognee.api.v1.search import SearchType

async def main():
    # Create a clean slate for cognee -- reset data and system state
    print("Resetting cognee data...")
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)
    print("Data reset complete.\n")

    # cognee knowledge graph will be created based on this text
    text = """
    Natural language processing (NLP) is an interdisciplinary
    subfield of computer science and information retrieval.
    """
    
    print("Adding text to cognee:")
    print(text.strip())  
    # Add the text, and make it available for cognify
    await cognee.add(text)
    print("Text added successfully.\n")

    
    print("Running cognify to create knowledge graph...\n")
    print("Cognify process steps:")
    print("1. Classifying the document: Determining the type and category of the input text.")
    print("2. Checking permissions: Ensuring the user has the necessary rights to process the text.")
    print("3. Extracting text chunks: Breaking down the text into sentences or phrases for analysis.")
    print("4. Adding data points: Storing the extracted chunks for processing.")
    print("5. Generating knowledge graph: Extracting entities and relationships to form a knowledge graph.")
    print("6. Summarizing text: Creating concise summaries of the content for quick insights.\n")
    
    # Use LLMs and cognee to create knowledge graph
    await cognee.cognify()
    print("Cognify process complete.\n")

    
    query_text = 'Tell me about NLP'
    print(f"Searching cognee for insights with query: '{query_text}'")
    # Query cognee for insights on the added text
    search_results = await cognee.search(
        SearchType.INSIGHTS, query_text=query_text
    )
    
    print("Search results:")
    # Display results
    for result_text in search_results:
        print(result_text)

    # Example output:
       # ({'id': UUID('bc338a39-64d6-549a-acec-da60846dd90d'), 'updated_at': datetime.datetime(2024, 11, 21, 12, 23, 1, 211808, tzinfo=datetime.timezone.utc), 'name': 'natural language processing', 'description': 'An interdisciplinary subfield of computer science and information retrieval.'}, {'relationship_name': 'is_a_subfield_of', 'source_node_id': UUID('bc338a39-64d6-549a-acec-da60846dd90d'), 'target_node_id': UUID('6218dbab-eb6a-5759-a864-b3419755ffe0'), 'updated_at': datetime.datetime(2024, 11, 21, 12, 23, 15, 473137, tzinfo=datetime.timezone.utc)}, {'id': UUID('6218dbab-eb6a-5759-a864-b3419755ffe0'), 'updated_at': datetime.datetime(2024, 11, 21, 12, 23, 1, 211808, tzinfo=datetime.timezone.utc), 'name': 'computer science', 'description': 'The study of computation and information processing.'})
       # (...)
        #
        # It represents nodes and relationships in the knowledge graph:
        # - The first element is the source node (e.g., 'natural language processing').
        # - The second element is the relationship between nodes (e.g., 'is_a_subfield_of').
        # - The third element is the target node (e.g., 'computer science').

if __name__ == '__main__':
    asyncio.run(main())

When you run this script, you will see step-by-step messages in the console that help you trace the execution flow and understand what the script is doing at each stage. A version of this example is here: examples/python/simple_example.py

Understand our architecture

cognee framework consists of tasks that can be grouped into pipelines. Each task can be an independent part of business logic, that can be tied to other tasks to form a pipeline. These tasks persist data into your memory store enabling you to search for relevant context of past conversations, documents, or any other data you have stored.

Vector retrieval, Graphs and LLMs

Cognee supports a variety of tools and services for different operations:

Modular: Cognee is modular by nature, using tasks grouped into pipelines
Local Setup: By default, LanceDB runs locally with NetworkX and OpenAI.
Vector Stores: Cognee supports LanceDB, Qdrant, PGVector and Weaviate for vector storage.
Language Models (LLMs): You can use either Anyscale or Ollama as your LLM provider.
Graph Stores: In addition to NetworkX, Neo4j is also supported for graph storage.
User management: Create individual user graphs and manage permissions

Demo

Check out our demo notebook here

Get Started

Install Server

Please see the cognee Quick Start Guide for important configuration information.

docker compose up

Install SDK

Please see the cognee Development Guide for important beta information and usage instructions.

pip install cognee

💫 Contributors

Star History

Vector & Graph Databases Implementation State

Name	Type	Current state
Qdrant	Vector	Stable ✅
Weaviate	Vector	Stable ✅
LanceDB	Vector	Stable ✅
Neo4j	Graph	Stable ✅
NetworkX	Graph	Stable ✅
FalkorDB	Vector/Graph	Unstable ❌
PGVector	Vector	Stable ✅
Milvus	Vector	Stable ✅