cognee

Make data processing for LLMs easy

Open-source framework for creating knowledge graphs and data models for LLMs.

cognee makes it easy to reliably enrich data for Large Language Models (LLMs) like GPT-3.5, GPT-4, GPT-4-Vision, including in the future the open source models like Mistral/Mixtral from Together, Anyscale, Ollama, and llama-cpp-python.

By leveraging various tools like graph databases, function calling, tool calling and Pydantic; cognee stands out for its aim to emulate human memory for LLM apps and frameworks.

We leverage Neo4j to do the heavy lifting and dlt to load the data, and we've built a simple, easy-to-use API on top of it by helping you manage your context

Getting Started

Setup

Create .env file in your project root directory in order to store environment variables such as API keys.

Note: Don't push .env file to git repo as it will expose those keys to others.

If cognee is installed with Weaviate as a vector database provider, add Weaviate environment variables:

WEAVIATE_URL = "YOUR_WEAVIATE_URL"
WEAVIATE_API_KEY = "YOUR_WEAVIATE_API_KEY"

Otherwise if cognee is installed with a default (Qdrant) vector database provider, add Qdrant environment variables:

QDRANT_URL = "YOUR_QDRANT_URL"
QDRANT_API_KEY = "YOUR_QDRANT_API_KEY"

Add OpenAI API Key environment variable:

OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"

Cognee stores data and system files inside the library directory, which is lost if the library folder is removed. You can change the directories where cognee will store data and system files by calling config functions:

import cognee

cognee.config.system_root_directory(absolute_path_to_directory)

cognee.config.data_root_directory(absolute_path_to_directory)

Without .env file

import os

os.environ["WEAVIATE_URL"] = "YOUR_WEAVIATE_URL"
os.environ["WEAVIATE_API_KEY"] = "YOUR_WEAVIATE_API_KEY"

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

Run

Add a new piece of information to the storage:

import cognee

cognee.add("some_text", dataset_name)

cognee.add([
    "some_text_1",
    "some_text_2",
    "some_text_3",
    ...
])

cognee.add("file://{absolute_path_to_file}", dataset_name)

cognee.add(
    [
        "file://{absolute_path_to_file_1}",
        "file://{absolute_path_to_file_2}",
        "file://{absolute_path_to_file_3}",
        ...
    ],
    dataset_name
)

cognee.add("data://{absolute_path_to_directory}", dataset_name)

# This is useful if you have a directory with files organized in subdirectories.
# You can target which directory to add by providing dataset_name.
# Example:
#            root
#           /    \
#      reports  bills
#     /       \
#   2024     2023
#
# cognee.add("data://{absolute_path_to_root}", "reports.2024")
# This will add just directory 2024 under reports.

Use LLMs and cognee to create graphs:

cognee.cognify(dataset_name)

Render the graph with our util function:

from cognee.utils import render_graph

graph_url = await render_graph(graph)

print(graph_url)

Query the graph for a piece of information:

search_results = cognee.search('SIMILARITY', "query_search")

print(search_results)

Why use cognee?

The question of using cognee is fundamentally a question of why to structure data inputs and outputs for your llm workflows.

Cost effective — cognee extends the capabilities of your LLMs without the need for expensive data processing tools.
Self contained — cognee runs as a library and is simple to use
Interpretable — Navigate graphs instead of embeddings to understand your data.
User Guided cognee lets you control your input and provide your own Pydantic data models

License

This project is licensed under the terms of the Apache License 2.0.

4 KiB Raw Blame History