cognee

No description

Find a file

Vasilije 4e881cd00f Fix: Handle empty API key in LiteLLMEmbeddingEngine (#1959 ) fix(embeddings): handle empty API key in LiteLLMEmbeddingEngine - Add conditional check for empty API key to prevent authentication errors- Set default API key to "EMPTY" when no valid key is provided- This ensures proper fallback behavior when API key is not configured ``` <!-- .github/pull_request_template.md --> ## Description This PR fixes an issue where the `LiteLLMEmbeddingEngine` throws an authentication error when the `EMBEDDING_API_KEY` environment variable is empty or not set. The error message indicated `"api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"`. Log Error: 2025-12-23T11:36:58.220908 [error ] Error embedding text: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable [LiteLLMEmbeddingEngine] Root Cause: When initializing the embedding engine, if the `api_key` parameter is an empty string, the underlying LiteLLM client doesn't treat it as "no key provided" but instead uses this empty string to make API requests, triggering authentication failure. Solution: Added a conditional check in the code that creates the `LiteLLMEmbeddingEngine` instance. If the `EMBEDDING_API_KEY` read from configuration is empty (`None` or empty string), we explicitly set the `api_key` parameter passed to the engine constructor to a non-empty placeholder string `"EMPTY"`. This aligns with LiteLLM's handling of optional authentication and prevents exceptions in scenarios where keys are not required or need to be obtained from other sources How to Reproduce: Configure the application with the following settings (as shown in the error log): EMBEDDING_PROVIDER="custom" EMBEDDING_MODEL="openai/Qwen/Qwen3-Embedding-xxx" EMBEDDING_ENDPOINT="xxxxx" EMBEDDING_API_VERSION="" EMBEDDING_DIMENSIONS=1024 EMBEDDING_MAX_TOKENS=16384 EMBEDDING_BATCH_SIZE=10 # If embedding key is not provided same key set for LLM_API_KEY will be used EMBEDDING_API_KEY="" ## Acceptance Criteria <!-- * Key requirements to the new feature or modification; * Proof that the changes work and meet the requirements; * Include instructions on how to verify the changes. Describe how to test it locally; * Proof that it's sufficiently tested. --> ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] I have tested my changes thoroughly before submitting this PR - [x] This PR contains minimal changes necessary to address the issue/feature - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Improved API key validation for the embedding service to properly handle blank or missing API keys, ensuring more reliable embedding generation and preventing potential service errors. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->		2026-01-04 10:36:17 +01:00
.github	updates old no asserts test + yml	2025-12-19 10:32:45 +01:00
alembic	feat: redo notebook tutorials (#1922 )	2026-01-01 14:44:04 +01:00
assets	chore: update cognee ui on readme	2025-09-11 11:05:18 +02:00
bin	Revert "Clean up core cognee repo"	2025-05-15 10:46:01 +02:00
cognee	```	2026-01-04 15:18:43 +08:00
cognee-frontend	fix: typos in text and error handling	2025-12-18 22:52:09 +01:00
cognee-mcp	Merge branch 'dev' into add-custom-label-apenade	2025-12-16 19:15:30 +01:00
cognee-starter-kit	refactor: restructure examples and starter kit into new-examples (#1862 )	2025-12-20 02:07:28 +01:00
deployment	Fix/add async lock to all vector databases (#1244 )	2025-08-14 15:57:34 +02:00
distributed	fix: fixes distributed pipeline (#1454 )	2025-10-09 14:06:25 +02:00
evals	Deprecate `SearchType.INSIGHTS`, replace all references to default search type - `SearchType.GRAPH_COMPLETION`	2025-10-08 12:13:59 +01:00
examples	refactor: restructure examples and starter kit into new-examples (#1862 )	2025-12-20 02:07:28 +01:00
licenses	Revert "Clean up core cognee repo"	2025-05-15 10:46:01 +02:00
logs	feat: Add logging to file [COG-1715] (#672 )	2025-03-28 16:13:56 +01:00
new-examples	refactor: restructure examples and starter kit into new-examples (#1862 )	2025-12-20 02:07:28 +01:00
notebooks	Removed check_permissions_on_dataset.py and related references	2025-11-13 08:31:15 -05:00
tools	Revert "Clean up core cognee repo"	2025-05-15 10:46:01 +02:00
working_dir_error_replication	feat: Redis lock integration and Kuzu agentic access fix (#1504 )	2025-10-16 15:48:20 +02:00
.coderabbit.yaml	coderabbit fix	2025-11-25 18:09:43 +01:00
.dockerignore	Revert "Clean up core cognee repo"	2025-05-15 10:46:01 +02:00
.env.template	feat(database): add connect_args support to SqlAlchemyAdapter (#1861 )	2025-12-16 14:50:27 +01:00
.gitattributes	Merge dev with main (#921 )	2025-06-07 07:48:47 -07:00
.gitguardian.yml	fix: Mcp improvements (#1114 )	2025-07-24 21:52:16 +02:00
.gitignore	feat: add welcome tutorial notebook for new users (#1425 )	2025-09-18 18:07:05 +02:00
.mergify.yml	ci(Mergify): configuration update	2025-11-21 17:59:15 +01:00
.pre-commit-config.yaml	Feat: log pipeline status and pass it through pipeline [COG-1214] (#501 )	2025-02-11 16:41:40 +01:00
.pylintrc	fix: enable sqlalchemy adapter	2024-08-04 22:23:28 +02:00
AGENTS.md	Add repository guidelines to AGENTS.md	2025-10-26 11:18:17 +01:00
alembic.ini	fix: Logger suppresion and database logs (#1041 )	2025-07-03 20:08:27 +02:00
CODE_OF_CONDUCT.md	Update CODE_OF_CONDUCT.md	2024-12-13 11:30:16 +01:00
CONTRIBUTING.md	add support for structured outputs with llamma cpp va instructor and litellm	2025-12-30 16:37:31 -08:00
CONTRIBUTORS.md	Merge with main (#892 )	2025-05-30 23:13:04 +02:00
DCO.md	Create DCO.md	2024-12-13 11:28:44 +01:00
docker-compose.yml	added logs	2025-10-25 10:26:46 +02:00
Dockerfile	```	2026-01-04 15:22:21 +08:00
entrypoint.sh	added logs	2025-10-25 10:26:46 +02:00
LICENSE	Update LICENSE	2024-03-30 11:57:07 +01:00
mypy.ini	fix: Remove weaviate (#1139 )	2025-07-23 19:34:35 +02:00
NOTICE.md	add NOTICE file, reference CoC in contribution guidelines, add licenses folder for external licenses	2024-12-06 13:27:55 +00:00
poetry.lock	chore: regen poetry lock file	2025-12-05 19:51:26 +01:00
pyproject.toml	feat: redo notebook tutorials (#1922 )	2026-01-01 14:44:04 +01:00
README.md	Update Python version range in README	2025-11-09 11:42:45 +01:00
SECURITY.md	Merge main vol 2 (#967 )	2025-06-11 09:28:41 -04:00
uv.lock	update lock file	2025-12-30 16:59:59 -08:00

README.md

Cognee - Accurate and Persistent AI Memory

Demo . Docs . Learn More · Join Discord · Join r/AIMemory . Community Plugins & Add-ons

Use your data to build personalized and dynamic memory for AI Agents. Cognee lets you replace RAG with scalable and modular ECL (Extract, Cognify, Load) pipelines.

About Cognee

Cognee is an open-source tool and platform that transforms your raw data into persistent and dynamic AI memory for Agents. It combines vector search with graph databases to make your documents both searchable by meaning and connected by relationships.

You can use Cognee in two ways:

Self-host Cognee Open Source, which stores all data locally by default.
Connect to Cognee Cloud, and get the same OSS stack on managed infrastructure for easier development and productionization.

Cognee Open Source (self-hosted):

Interconnects any type of data — including past conversations, files, images, and audio transcriptions
Replaces traditional RAG systems with a unified memory layer built on graphs and vectors
Reduces developer effort and infrastructure cost while improving quality and precision
Provides Pythonic data pipelines for ingestion from 30+ data sources
Offers high customizability through user-defined tasks, modular pipelines, and built-in search endpoints

Cognee Cloud (managed):

Hosted web UI dashboard
Automatic version updates
Resource usage analytics
GDPR compliant, enterprise-grade security

Basic Usage & Feature Guide

To learn more, check out this short, end-to-end Colab walkthrough of Cognee's core features.

Quickstart

Let’s try Cognee in just a few lines of code. For detailed setup and configuration, see the Cognee Docs.

Prerequisites

Python 3.10 to 3.13

Step 1: Install Cognee

You can install Cognee with pip, poetry, uv, or your preferred Python package manager.

uv pip install cognee

Step 2: Configure the LLM

import os
os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"

Alternatively, create a .env file using our template.

To integrate other LLM providers, see our LLM Provider Documentation.

Step 3: Run the Pipeline

Cognee will take your documents, generate a knowledge graph from them and then query the graph based on combined relationships.

Now, run a minimal pipeline:

import cognee
import asyncio


async def main():
    # Add text to cognee
    await cognee.add("Cognee turns documents into AI memory.")

    # Generate the knowledge graph
    await cognee.cognify()

    # Add memory algorithms to the graph
    await cognee.memify()

    # Query the knowledge graph
    results = await cognee.search("What does Cognee do?")

    # Display the results
    for result in results:
        print(result)


if __name__ == '__main__':
    asyncio.run(main())

As you can see, the output is generated from the document we previously stored in Cognee:

  Cognee turns documents into AI memory.

Use the Cognee CLI

As an alternative, you can get started with these essential commands:

cognee-cli add "Cognee turns documents into AI memory."

cognee-cli cognify

cognee-cli search "What does Cognee do?"
cognee-cli delete --all

To open the local UI, run:

cognee-cli -ui

Demos & Examples

See Cognee in action:

Community & Support

Contributing

We welcome contributions from the community! Your input helps make Cognee better for everyone. See CONTRIBUTING.md to get started.

Code of Conduct

We're committed to fostering an inclusive and respectful community. Read our Code of Conduct for guidelines.

Research & Citation

We recently published a research paper on optimizing knowledge graphs for LLM reasoning:

@misc{markovic2025optimizinginterfaceknowledgegraphs,
      title={Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning},
      author={Vasilije Markovic and Lazar Obradovic and Laszlo Hajdu and Jovan Pavlovic},
      year={2025},
      eprint={2505.24478},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.24478},
}

README.md Unescape Escape