cognee

No description

Find a file

lxobr daf7d4ae26 feat: COG-1526 instance filter in eval (#627 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Added _filter_instances to BaseBenchmarkAdapter supporting filtering by IDs, indices, or JSON files. - Updated HotpotQAAdapter and MusiqueQAAdapter to use the base class filtering. - Added instance_filter parameter to corpus builder pipeline. - Extracted _get_raw_corpus method in both adapters for better code organization ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Corpus loading and building now support a flexible filtering option, allowing users to apply custom criteria to tailor the retrieved data. - Refactor - The extraction process has been reorganized to separately handle text content and associated metadata, enhancing clarity and overall workflow efficiency. <!-- end of auto-generated comment: release notes by coderabbit.ai -->		2025-03-13 14:23:13 +01:00
.data	Remove files	2024-12-11 15:34:29 +01:00
.dlt	fix: remove obsolete code	2024-03-13 10:19:03 +01:00
.github	feat: user authorization [COG-1189] (#593 )	2025-03-13 13:33:42 +01:00
alembic	Fix linter issues	2025-01-05 19:48:35 +01:00
assets	fix: simplify installation in readme (#577 )	2025-02-24 20:36:22 +01:00
bin	chore: enable all origins in cors settings	2024-09-25 14:34:14 +02:00
cognee	feat: COG-1526 instance filter in eval (#627 )	2025-03-13 14:23:13 +01:00
cognee-frontend	Switch to gpt-4o-mini by default (#233 )	2024-11-18 17:38:54 +01:00
cognee-mcp	fix: update mcp dependency	2025-03-11 22:17:07 +01:00
evals	Feature/cog 1312 integrating evaluation framework into dreamify (#562 )	2025-03-03 19:55:47 +01:00
examples	feat: productionizing ontology solution [COG-1401] (#623 )	2025-03-12 14:31:19 +01:00
helm	feat: added helm clean push (#606 )	2025-03-08 08:51:57 -08:00
licenses	Delete licenses/DCO.md	2024-12-13 11:29:17 +01:00
notebooks	feat: Eliminate the use of max_chunk_tokens and use a unified max_chunk_size instead [cog-1381] (#626 )	2025-03-12 14:03:41 +01:00
profiling	Fix linter issues	2025-01-05 19:48:35 +01:00
tests	ruff format	2025-01-05 19:09:08 +01:00
tools	ruff format	2025-01-05 19:09:08 +01:00
.dockerignore	chore: add vanilla docker config	2024-06-23 00:36:34 +02:00
.env.template	Comment out the postgres configuration from .env.template (#502 )	2025-02-06 21:35:40 +01:00
.gitignore	fix: custom model pipeline (#508 )	2025-02-08 02:00:15 +01:00
.pre-commit-config.yaml	Feat: log pipeline status and pass it through pipeline [COG-1214] (#501 )	2025-02-11 16:41:40 +01:00
.pylintrc	fix: enable sqlalchemy adapter	2024-08-04 22:23:28 +02:00
.python-version	chore: update python version to 3.11	2024-03-29 14:10:20 +01:00
alembic.ini	feat: migrate search to tasks (#144 )	2024-10-07 14:41:35 +02:00
CODE_OF_CONDUCT.md	Update CODE_OF_CONDUCT.md	2024-12-13 11:30:16 +01:00
cognee-gui.py	Cognee gui (#554 )	2025-02-19 03:06:50 +01:00
CONTRIBUTING.md	Update CONTRIBUTING.md	2025-03-11 03:07:25 +01:00
DCO.md	Create DCO.md	2024-12-13 11:28:44 +01:00
docker-compose.yml	Comment out the postgres configuration from docker-compose.yml (#504 )	2025-02-10 13:45:45 +01:00
Dockerfile	chore: Be explicit on extras to install in Docker (#598 )	2025-03-04 17:18:57 +01:00
Dockerfile_modal	feat: implements modal wrapper + dockerfile for modal containers	2025-01-23 18:06:09 +01:00
entrypoint.sh	feat: codegraph improvements and new CODE search [COG-1351] (#581 )	2025-02-26 20:15:02 +01:00
LICENSE	Update LICENSE	2024-03-30 11:57:07 +01:00
modal_deployment.py	refactor: Refactor search so graph completion is used by default (#505 )	2025-02-07 17:16:34 +01:00
mypy.ini	Improve processing, update networkx client, and Neo4j, and dspy (#69 )	2024-04-20 19:05:40 +02:00
NOTICE.md	add NOTICE file, reference CoC in contribution guidelines, add licenses folder for external licenses	2024-12-06 13:27:55 +00:00
poetry.lock	feat: productionizing ontology solution [COG-1401] (#623 )	2025-03-12 14:31:19 +01:00
pyproject.toml	feat: productionizing ontology solution [COG-1401] (#623 )	2025-03-12 14:31:19 +01:00
README.md	Small clarifications. (#624 )	2025-03-10 16:07:36 +01:00

README.md

cognee - memory layer for AI apps and Agents

Demo . Learn more · Join Discord

AI Agent responses you can rely on.

Build dynamic Agent memory using scalable, modular ECL (Extract, Cognify, Load) pipelines.

Features

Interconnect and retrieve your past conversations, documents, images and audio transcriptions
Reduce hallucinations, developer effort, and cost.
Load data to graph and vector databases using only Pydantic
Manipulate your data while ingesting from 30+ data sources

Get Started

Get started quickly with a Google Colab notebook or starter repo

Contributing

Your contributions are at the core of making this a true open source project. Any contributions you make are greatly appreciated. See CONTRIBUTING.md for more information.

📦 Installation

You can install Cognee using either pip, poetry, uv or any other python package manager.

With pip

pip install cognee

💻 Basic Usage

Setup

import os
os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"

You can also set the variables by creating .env file, using our template. To use different LLM providers, for more info check out our documentation

Simple example

Add LLM_API_KEY to .env using the command bellow.

echo "LLM_API_KEY=YOUR_OPENAI_API_KEY" > .env

You can see available env variables in the repository .env.template file. If you don't specify it otherwise, like in this example, SQLite (relational database), LanceDB (vector database) and NetworkX (graph store) will be used as default components.

This script will run the default pipeline:

import cognee
import asyncio
from cognee.modules.search.types import SearchType

async def main():
    # Create a clean slate for cognee -- reset data and system state
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)
    # cognee knowledge graph will be created based on this text
    text = """
    Natural language processing (NLP) is an interdisciplinary
    subfield of computer science and information retrieval.
    """

    print("Adding text to cognee:")
    print(text.strip())
    # Add the text, and make it available for cognify
    await cognee.add(text)

    # Use LLMs and cognee to create knowledge graph
    await cognee.cognify()
    print("Cognify process complete.\n")


    query_text = "Tell me about NLP"
    print(f"Searching cognee for insights with query: '{query_text}'")
    # Query cognee for insights on the added text
    search_results = await cognee.search(
        query_text=query_text, query_type=SearchType.INSIGHTS
    )

    print("Search results:")
    # Display results
    for result_text in search_results:
        print(result_text)

    # Example output:
       # ({'id': UUID('bc338a39-64d6-549a-acec-da60846dd90d'), 'updated_at': datetime.datetime(2024, 11, 21, 12, 23, 1, 211808, tzinfo=datetime.timezone.utc), 'name': 'natural language processing', 'description': 'An interdisciplinary subfield of computer science and information retrieval.'}, {'relationship_name': 'is_a_subfield_of', 'source_node_id': UUID('bc338a39-64d6-549a-acec-da60846dd90d'), 'target_node_id': UUID('6218dbab-eb6a-5759-a864-b3419755ffe0'), 'updated_at': datetime.datetime(2024, 11, 21, 12, 23, 15, 473137, tzinfo=datetime.timezone.utc)}, {'id': UUID('6218dbab-eb6a-5759-a864-b3419755ffe0'), 'updated_at': datetime.datetime(2024, 11, 21, 12, 23, 1, 211808, tzinfo=datetime.timezone.utc), 'name': 'computer science', 'description': 'The study of computation and information processing.'})
       # (...)
        #
        # It represents nodes and relationships in the knowledge graph:
        # - The first element is the source node (e.g., 'natural language processing').
        # - The second element is the relationship between nodes (e.g., 'is_a_subfield_of').
        # - The third element is the target node (e.g., 'computer science').

if __name__ == '__main__':
    asyncio.run(main())