Merge branch 'dev' of github.com:topoteretes/cognee into dev

This commit is contained in:
vasilije 2025-03-11 10:47:34 -07:00
commit 816d99309d
32 changed files with 1264 additions and 132 deletions

View file

@ -1,97 +1,128 @@
# 🚀 How to Contribute to **cognee**
# 🎉 Welcome to **cognee**!
Thank you for investing time in contributing to our project! Here's a guide to get you started.
We're excited that you're interested in contributing to our project!
We want to ensure that every user and contributor feels welcome, included and supported to participate in cognee community.
This guide will help you get started and ensure your contributions can be efficiently integrated into the project.
## 1. 🚀 Getting Started
## 🌟 Quick Links
### 🍴 Fork the Repository
- [Code of Conduct](CODE_OF_CONDUCT.md)
- [Discord Community](https://discord.gg/bcy8xFAtfd)
- [Issue Tracker](https://github.com/topoteretes/cognee/issues)
To start your journey, you'll need your very own copy of **cognee**. Think of it as your own innovation lab. 🧪
## 1. 🚀 Ways to Contribute
1. Navigate to the [**cognee**](https://github.com/topoteretes/cognee) repository on GitHub.
2. In the upper-right corner, click the **'Fork'** button.
You can contribute to **cognee** in many ways:
### 🚀 Clone the Repository
- 📝 Submitting bug reports or feature requests
- 💡 Improving documentation
- 🔍 Reviewing pull requests
- 🛠️ Contributing code or tests
- 🌐 Helping other users
Next, let's bring your newly forked repository to your local machine.
## 📫 Get in Touch
There are several ways to connect with the **cognee** team and community:
### GitHub Collaboration
- [Open an issue](https://github.com/topoteretes/cognee/issues) for bug reports, feature requests, or discussions
- Submit pull requests to contribute code or documentation
- Join ongoing discussions in existing issues and PRs
### Community Channels
- Join our [Discord community](https://discord.gg/bcy8xFAtfd) for real-time discussions
- Participate in community events and discussions
- Get help from other community members
### Direct Contact
- Email: vasilije@cognee.ai
- For business inquiries or sensitive matters, please reach out via email
- For general questions, prefer public channels like GitHub issues or Discord
We aim to respond to all communications within 2 business days. For faster responses, consider using our Discord channel where the whole community can help!
## Issue Labels
To help you find the most appropriate issues to work on, we use the following labels:
- `good first issue` - Perfect for newcomers to the project
- `bug` - Something isn't working as expected
- `documentation` - Improvements or additions to documentation
- `enhancement` - New features or improvements
- `help wanted` - Extra attention or assistance needed
- `question` - Further information is requested
- `wontfix` - This will not be worked on
Looking for a place to start? Try filtering for [good first issues](https://github.com/topoteretes/cognee/labels/good%20first%20issue)!
## 2. 🛠️ Development Setup
### Fork and Clone
1. Fork the [**cognee**](https://github.com/topoteretes/cognee) repository
2. Clone your fork:
```shell
git clone https://github.com/<your-github-username>/cognee.git
cd cognee
```
## 2. 🛠️ Making Changes
### 🌟 Create a Branch
Get ready to channel your creativity. Begin by creating a new branch for your incredible features. 🧞‍♂️
### Create a Branch
Create a new branch for your work:
```shell
git checkout -b feature/your-feature-name
```
### ✏️ Make Your Changes
## 3. 🎯 Making Changes
Now's your chance to shine! Dive in and make your contributions. 🌠
## 3. 🚀 Submitting Changes
After making your changes, follow these steps:
### ✅ Run the Tests
Ensure your changes do not break the existing codebase:
1. **Code Style**: Follow the project's coding standards
2. **Documentation**: Update relevant documentation
3. **Tests**: Add tests for new features
4. **Commits**: Write clear commit messages
### Running Tests
```shell
python cognee/cognee/tests/test_library.py
```
### 🚢 Push Your Feature Branch
## 4. 📤 Submitting Changes
1. Push your changes:
```shell
# Add your changes to the staging area:
git add .
# Commit changes with an adequate description:
git commit -m "Describe your changes here"
# Push your feature branch to your forked repository:
git commit -s -m "Description of your changes"
git push origin feature/your-feature-name
```
### 🚀 Create a Pull Request
2. Create a Pull Request:
- Go to the [**cognee** repository](https://github.com/topoteretes/cognee)
- Click "Compare & Pull Request"
- Fill in the PR template with details about your changes
You're on the verge of completion! It's time to showcase your hard work. 🌐
## 5. 📜 Developer Certificate of Origin (DCO)
1. Go to [**cognee**](https://github.com/topoteretes/cognee) on GitHub.
2. Hit the **"Compare & Pull Request"** button.
3. Select the base branch (main) and the compare branch (the one with your features).
4. Craft a **compelling title** and provide a **detailed description** of your contributions. 🎩
All contributions must be signed-off to indicate agreement with our DCO:
## 4. 🔍 Review and Approval
```shell
git config alias.cos "commit -s" # Create alias for signed commits
```
The project maintainers will review your work, possibly suggest improvements, or request further details. Once you receive approval, your contributions will become part of **cognee**!
When your PR is ready, please include:
> "I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin"
## 6. 🤝 Community Guidelines
## 5. Developer Certificate of Origin
All contributions to the topoteretes codebase must be signed-off to indicate you have read and agreed to the Developer Certificate of Origin (DCO), which is in the root directory under name DCO. To sign the DCO, simply add -s after all commits that you make, to do this easily you can make a git alias from the command line, for example:
- Be respectful and inclusive
- Help others learn and grow
- Follow our [Code of Conduct](CODE_OF_CONDUCT.md)
- Provide constructive feedback
- Ask questions when unsure
$ git config alias.cos "commit -s"
## 7. 📫 Getting Help
Will allow you to write git cos which will automatically sign-off your commit. By signing a commit you are agreeing to the DCO and agree that you will be banned from the topoteretes GitHub organisation and Discord server if you violate the DCO.
- Open an [issue](https://github.com/topoteretes/cognee/issues)
- Join our Discord community
- Check existing documentation
"When a commit is ready to be merged please use the following template to agree to our developer certificate of origin:
'I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin'
We consider the following as violations to the DCO:
Signing the DCO with a fake name or pseudonym, if you are registered on GitHub or another platform with a fake name then you will not be able to contribute to topoteretes before updating your name;
Submitting a contribution that you did not have the right to submit whether due to licensing, copyright, or any other restrictions.
## 6. 📜 Code of Conduct
Ensure you adhere to the project's [Code of Conduct](https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md) throughout your participation.
## 7. 📫 Contact
If you need assistance or simply wish to connect, we're here for you. Contact us by filing an issue on the GitHub repository or by messaging us on our Discord server.
Thanks for helping to evolve **cognee**!
Thank you for contributing to **cognee**! 🌟

View file

@ -8,11 +8,11 @@
cognee - memory layer for AI apps and Agents
<p align="center">
<a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
.
<a href="https://cognee.ai">Learn more</a>
·
<a href="https://discord.gg/NQPKmU5CCg">Join Discord</a>
·
<a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
</p>
@ -89,7 +89,7 @@ Add LLM_API_KEY to .env using the command bellow.
```
echo "LLM_API_KEY=YOUR_OPENAI_API_KEY" > .env
```
You can see available env variables in the repository `.env.template` file.
You can see available env variables in the repository `.env.template` file. If you don't specify it otherwise, like in this example, SQLite (relational database), LanceDB (vector database) and NetworkX (graph store) will be used as default components.
This script will run the default pipeline:

View file

@ -1,34 +1,35 @@
# Use a Python image with uv pre-installed
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS uv
# Set build argument
ARG DEBUG
# Set environment variable based on the build argument
ENV DEBUG=${DEBUG}
ENV PIP_NO_CACHE_DIR=true
# Install the project into `/app`
WORKDIR /app
# Enable bytecode compilation
ENV UV_COMPILE_BYTECODE=1
# ENV UV_COMPILE_BYTECODE=1
# Copy from the cache instead of linking since it's a mounted volume
ENV UV_LINK_MODE=copy
RUN apt-get update && apt-get install -y \
gcc \
libpq-dev
# Install the project's dependencies using the lockfile and settings
RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
uv sync --frozen --no-install-project --no-dev --no-editable
RUN apt-get install -y \
gcc \
libpq-dev
# Then, add the rest of the project source code and install it
# Installing separately from its dependencies allows optimal layer caching
ADD . /app
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev --no-editable
COPY . /app
FROM python:3.12-slim-bookworm
RUN uv sync --reinstall
WORKDIR /app
COPY --from=uv /root/.local /root/.local
COPY --from=uv --chown=app:app /app /app
# Place executables in the environment at the front of the path
ENV PATH="/app/:/app/.venv/bin:$PATH"
ENV PATH="/app/.venv/bin:$PATH"
ENTRYPOINT ["cognee"]

View file

@ -82,5 +82,5 @@ http://localhost:5173?timeout=120000
To apply new changes while developing cognee you need to do:
1. `poetry lock` in cognee folder
2. `uv sync --dev --all-extras --reinstall `
2. `uv sync --dev --all-extras --reinstall`
3. `mcp dev src/server.py`

View file

@ -1,12 +1,12 @@
[project]
name = "cognee-mcp"
version = "0.1.0"
version = "0.2.0"
description = "A MCP server project"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"cognee[codegraph,gemini,huggingface]",
"cognee[postgres,codegraph,gemini,huggingface]",
"mcp==1.2.1",
"uv>=0.6.3",
]

2
cognee-mcp/uv.lock generated
View file

@ -547,7 +547,7 @@ huggingface = [
[[package]]
name = "cognee-mcp"
version = "0.1.0"
version = "0.2.0"
source = { editable = "." }
dependencies = [
{ name = "cognee", extra = ["codegraph", "gemini", "huggingface"] },

View file

@ -29,13 +29,16 @@ class AnswerGeneratorExecutor:
retrieval_context = await retriever.get_context(query_text)
search_results = await retriever.get_completion(query_text, retrieval_context)
answers.append(
{
"question": query_text,
"answer": search_results[0],
"golden_answer": correct_answer,
"retrieval_context": retrieval_context,
}
)
answer = {
"question": query_text,
"answer": search_results[0],
"golden_answer": correct_answer,
"retrieval_context": retrieval_context,
}
if "golden_context" in instance:
answer["golden_context"] = instance["golden_context"]
answers.append(answer)
return answers

View file

@ -1,6 +1,6 @@
import logging
import json
from typing import List
from typing import List, Optional
from cognee.eval_framework.answer_generation.answer_generation_executor import (
AnswerGeneratorExecutor,
retriever_options,
@ -32,7 +32,7 @@ async def create_and_insert_answers_table(questions_payload):
async def run_question_answering(
params: dict, system_prompt="answer_simple_question.txt"
params: dict, system_prompt="answer_simple_question.txt", top_k: Optional[int] = None
) -> List[dict]:
if params.get("answering_questions"):
logging.info("Question answering started...")
@ -48,7 +48,9 @@ async def run_question_answering(
answer_generator = AnswerGeneratorExecutor()
answers = await answer_generator.question_answering_non_parallel(
questions=questions,
retriever=retriever_options[params["qa_engine"]](system_prompt_path=system_prompt),
retriever=retriever_options[params["qa_engine"]](
system_prompt_path=system_prompt, top_k=top_k
),
)
with open(params["answers_path"], "w", encoding="utf-8") as f:
json.dump(answers, f, ensure_ascii=False, indent=4)

View file

@ -5,18 +5,21 @@ from cognee.eval_framework.benchmark_adapters.base_benchmark_adapter import Base
class DummyAdapter(BaseBenchmarkAdapter):
def load_corpus(
self, limit: Optional[int] = None, seed: int = 42
self, limit: Optional[int] = None, seed: int = 42, load_golden_context: bool = False
) -> tuple[list[str], list[dict[str, Any]]]:
corpus_list = [
"The cognee is an AI memory engine that supports different vector and graph databases",
"Neo4j is a graph database supported by cognee",
]
question_answer_pairs = [
{
"answer": "Yes",
"question": "Is Neo4j supported by cognee?",
"type": "dummy",
}
]
qa_pair = {
"answer": "Yes",
"question": "Is Neo4j supported by cognee?",
"type": "dummy",
}
if load_golden_context:
qa_pair["golden_context"] = "Cognee supports Neo4j and NetworkX"
question_answer_pairs = [qa_pair]
return corpus_list, question_answer_pairs

View file

@ -28,14 +28,22 @@ class CorpusBuilderExecutor:
self.questions = None
self.task_getter = task_getter
def load_corpus(self, limit: Optional[int] = None) -> Tuple[List[Dict], List[str]]:
self.raw_corpus, self.questions = self.adapter.load_corpus(limit=limit)
def load_corpus(
self, limit: Optional[int] = None, load_golden_context: bool = False
) -> Tuple[List[Dict], List[str]]:
self.raw_corpus, self.questions = self.adapter.load_corpus(
limit=limit, load_golden_context=load_golden_context
)
return self.raw_corpus, self.questions
async def build_corpus(
self, limit: Optional[int] = None, chunk_size=1024, chunker=TextChunker
self,
limit: Optional[int] = None,
chunk_size=1024,
chunker=TextChunker,
load_golden_context: bool = False,
) -> List[str]:
self.load_corpus(limit=limit)
self.load_corpus(limit=limit, load_golden_context=load_golden_context)
await self.run_cognee(chunk_size=chunk_size, chunker=chunker)
return self.questions

View file

@ -47,7 +47,10 @@ async def run_corpus_builder(params: dict, chunk_size=1024, chunker=TextChunker)
task_getter=task_getter,
)
questions = await corpus_builder.build_corpus(
limit=params.get("number_of_samples_in_corpus"), chunk_size=chunk_size, chunker=chunker
limit=params.get("number_of_samples_in_corpus"),
chunk_size=chunk_size,
chunker=chunker,
load_golden_context=params.get("evaluating_contexts"),
)
with open(params["questions_path"], "w", encoding="utf-8") as f:
json.dump(questions, f, ensure_ascii=False, indent=4)

View file

@ -4,6 +4,7 @@ from cognee.eval_framework.eval_config import EvalConfig
from cognee.eval_framework.evaluation.base_eval_adapter import BaseEvalAdapter
from cognee.eval_framework.evaluation.metrics.exact_match import ExactMatchMetric
from cognee.eval_framework.evaluation.metrics.f1 import F1ScoreMetric
from cognee.eval_framework.evaluation.metrics.context_coverage import ContextCoverageMetric
from typing import Any, Dict, List
from deepeval.metrics import ContextualRelevancyMetric
@ -15,6 +16,7 @@ class DeepEvalAdapter(BaseEvalAdapter):
"EM": ExactMatchMetric(),
"f1": F1ScoreMetric(),
"contextual_relevancy": ContextualRelevancyMetric(),
"context_coverage": ContextCoverageMetric(),
}
async def evaluate_answers(
@ -32,6 +34,7 @@ class DeepEvalAdapter(BaseEvalAdapter):
actual_output=answer["answer"],
expected_output=answer["golden_answer"],
retrieval_context=[answer["retrieval_context"]],
context=[answer["golden_context"]] if "golden_context" in answer else None,
)
metric_results = {}
for metric in evaluator_metrics:

View file

@ -23,5 +23,6 @@ class EvaluationExecutor:
async def execute(self, answers: List[Dict[str, str]], evaluator_metrics: Any) -> Any:
if self.evaluate_contexts:
evaluator_metrics.append("contextual_relevancy")
evaluator_metrics.append("context_coverage")
metrics = await self.eval_adapter.evaluate_answers(answers, evaluator_metrics)
return metrics

View file

@ -0,0 +1,50 @@
from deepeval.metrics import SummarizationMetric
from deepeval.test_case import LLMTestCase
from deepeval.metrics.summarization.schema import ScoreType
from deepeval.metrics.indicator import metric_progress_indicator
from deepeval.utils import get_or_create_event_loop
class ContextCoverageMetric(SummarizationMetric):
def measure(
self,
test_case,
_show_indicator: bool = True,
) -> float:
mapped_test_case = LLMTestCase(
input=test_case.context[0],
actual_output=test_case.retrieval_context[0],
)
self.assessment_questions = None
self.evaluation_cost = 0 if self.using_native_model else None
with metric_progress_indicator(self, _show_indicator=_show_indicator):
if self.async_mode:
loop = get_or_create_event_loop()
return loop.run_until_complete(
self.a_measure(mapped_test_case, _show_indicator=False)
)
else:
self.coverage_verdicts = self._generate_coverage_verdicts(mapped_test_case)
self.alignment_verdicts = []
self.score = self._calculate_score(ScoreType.COVERAGE)
self.reason = self._generate_reason()
self.success = self.score >= self.threshold
return self.score
async def a_measure(
self,
test_case,
_show_indicator: bool = True,
) -> float:
self.evaluation_cost = 0 if self.using_native_model else None
with metric_progress_indicator(
self,
async_mode=True,
_show_indicator=_show_indicator,
):
self.coverage_verdicts = await self._a_generate_coverage_verdicts(test_case)
self.alignment_verdicts = []
self.score = self._calculate_score(ScoreType.COVERAGE)
self.reason = await self._a_generate_reason()
self.success = self.score >= self.threshold
return self.score

View file

@ -3,6 +3,12 @@ import plotly.graph_objects as go
from typing import Dict, List, Tuple
from collections import defaultdict
metrics_fields = {
"contextual_relevancy": ["question", "retrieval_context"],
"context_coverage": ["question", "retrieval_context", "golden_context"],
}
default_metrics_fields = ["question", "answer", "golden_answer"]
def create_distribution_plots(metrics_data: Dict[str, List[float]]) -> List[str]:
"""Create distribution histogram plots for each metric."""
@ -59,38 +65,30 @@ def generate_details_html(metrics_data: List[Dict]) -> List[str]:
for metric, values in entry["metrics"].items():
if metric not in metric_details:
metric_details[metric] = []
current_metrics_fields = metrics_fields.get(metric, default_metrics_fields)
metric_details[metric].append(
{
"question": entry["question"],
"answer": entry["answer"],
"golden_answer": entry["golden_answer"],
{key: entry[key] for key in current_metrics_fields}
| {
"reason": values.get("reason", ""),
"score": values["score"],
}
)
for metric, details in metric_details.items():
formatted_column_names = [key.replace("_", " ").title() for key in details[0].keys()]
details_html.append(f"<h3>{metric} Details</h3>")
details_html.append("""
details_html.append(f"""
<table class="metric-table">
<tr>
<th>Question</th>
<th>Answer</th>
<th>Golden Answer</th>
<th>Reason</th>
<th>Score</th>
{"".join(f"<th>{col}</th>" for col in formatted_column_names)}
</tr>
""")
for item in details:
details_html.append(
f"<tr>"
f"<td>{item['question']}</td>"
f"<td>{item['answer']}</td>"
f"<td>{item['golden_answer']}</td>"
f"<td>{item['reason']}</td>"
f"<td>{item['score']}</td>"
f"</tr>"
)
details_html.append(f"""
<tr>
{"".join(f"<td>{value}</td>" for value in item.values())}
</tr>
""")
details_html.append("</table>")
return details_html

View file

@ -13,15 +13,17 @@ class CompletionRetriever(BaseRetriever):
self,
user_prompt_path: str = "context_for_question.txt",
system_prompt_path: str = "answer_simple_question.txt",
top_k: Optional[int] = 1,
):
"""Initialize retriever with optional custom prompt paths."""
self.user_prompt_path = user_prompt_path
self.system_prompt_path = system_prompt_path
self.top_k = top_k if top_k is not None else 1
async def get_context(self, query: str) -> Any:
"""Retrieves relevant document chunks as context."""
vector_engine = get_vector_engine()
found_chunks = await vector_engine.search("DocumentChunk_text", query, limit=1)
found_chunks = await vector_engine.search("DocumentChunk_text", query, limit=self.top_k)
if len(found_chunks) == 0:
raise NoRelevantDataFound
return found_chunks[0].payload["text"]

View file

@ -15,12 +15,12 @@ class GraphCompletionRetriever(BaseRetriever):
self,
user_prompt_path: str = "graph_context_for_question.txt",
system_prompt_path: str = "answer_simple_question.txt",
top_k: int = 5,
top_k: Optional[int] = 5,
):
"""Initialize retriever with prompt paths and search parameters."""
self.user_prompt_path = user_prompt_path
self.system_prompt_path = system_prompt_path
self.top_k = top_k
self.top_k = top_k if top_k is not None else 5
async def resolve_edges_to_text(self, retrieved_edges: list) -> str:
"""Converts retrieved graph edges into a human-readable string format."""

View file

@ -12,7 +12,7 @@ class GraphSummaryCompletionRetriever(GraphCompletionRetriever):
user_prompt_path: str = "graph_context_for_question.txt",
system_prompt_path: str = "answer_simple_question.txt",
summarize_prompt_path: str = "summarize_search_results.txt",
top_k: int = 5,
top_k: Optional[int] = 5,
):
"""Initialize retriever with default prompt paths and search parameters."""
super().__init__(

View file

@ -5,7 +5,12 @@ import sys
with patch.dict(
sys.modules,
{"deepeval": MagicMock(), "deepeval.metrics": MagicMock(), "deepeval.test_case": MagicMock()},
{
"deepeval": MagicMock(),
"deepeval.metrics": MagicMock(),
"deepeval.test_case": MagicMock(),
"cognee.eval_framework.evaluation.metrics.context_coverage": MagicMock(),
},
):
from cognee.eval_framework.evaluation.deep_eval_adapter import DeepEvalAdapter

View file

@ -0,0 +1,206 @@
# Standard library imports
import os
import json
import asyncio
import pathlib
from uuid import uuid5, NAMESPACE_OID
from typing import List, Optional
from pathlib import Path
import dlt
import requests
import cognee
from cognee.low_level import DataPoint, setup as cognee_setup
from cognee.api.v1.search import SearchType
from cognee.tasks.storage import add_data_points
from cognee.modules.pipelines.tasks.Task import Task
from cognee.modules.pipelines import run_tasks
BASE_URL = "https://pokeapi.co/api/v2/"
os.environ["BUCKET_URL"] = "./.data_storage"
os.environ["DATA_WRITER__DISABLE_COMPRESSION"] = "true"
# Data Models
class Abilities(DataPoint):
name: str = "Abilities"
metadata: dict = {"index_fields": ["name"]}
class PokemonAbility(DataPoint):
name: str
ability__name: str
ability__url: str
is_hidden: bool
slot: int
_dlt_load_id: str
_dlt_id: str
_dlt_parent_id: str
_dlt_list_idx: str
is_type: Abilities
metadata: dict = {"index_fields": ["ability__name"]}
class Pokemons(DataPoint):
name: str = "Pokemons"
have: Abilities
metadata: dict = {"index_fields": ["name"]}
class Pokemon(DataPoint):
name: str
base_experience: int
height: int
weight: int
is_default: bool
order: int
location_area_encounters: str
species__name: str
species__url: str
cries__latest: str
cries__legacy: str
sprites__front_default: str
sprites__front_shiny: str
sprites__back_default: Optional[str]
sprites__back_shiny: Optional[str]
_dlt_load_id: str
_dlt_id: str
is_type: Pokemons
abilities: List[PokemonAbility]
metadata: dict = {"index_fields": ["name"]}
# Data Collection Functions
@dlt.resource(write_disposition="replace")
def pokemon_list(limit: int = 50):
response = requests.get(f"{BASE_URL}pokemon", params={"limit": limit})
response.raise_for_status()
yield response.json()["results"]
@dlt.transformer(data_from=pokemon_list)
def pokemon_details(pokemons):
"""Fetches detailed info for each Pokémon"""
for pokemon in pokemons:
response = requests.get(pokemon["url"])
response.raise_for_status()
yield response.json()
# Data Loading Functions
def load_abilities_data(jsonl_abilities):
abilities_root = Abilities()
pokemon_abilities = []
for jsonl_ability in jsonl_abilities:
with open(jsonl_ability, "r") as f:
for line in f:
ability = json.loads(line)
ability["id"] = uuid5(NAMESPACE_OID, ability["_dlt_id"])
ability["name"] = ability["ability__name"]
ability["is_type"] = abilities_root
pokemon_abilities.append(ability)
return abilities_root, pokemon_abilities
def load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root):
pokemons = []
for jsonl_pokemon in jsonl_pokemons:
with open(jsonl_pokemon, "r") as f:
for line in f:
pokemon_data = json.loads(line)
abilities = [
ability
for ability in pokemon_abilities
if ability["_dlt_parent_id"] == pokemon_data["_dlt_id"]
]
pokemon_data["external_id"] = pokemon_data["id"]
pokemon_data["id"] = uuid5(NAMESPACE_OID, str(pokemon_data["id"]))
pokemon_data["abilities"] = [PokemonAbility(**ability) for ability in abilities]
pokemon_data["is_type"] = pokemon_root
pokemons.append(Pokemon(**pokemon_data))
return pokemons
# Main Application Logic
async def setup_and_process_data():
"""Setup configuration and process Pokemon data"""
# Setup configuration
data_directory_path = str(
pathlib.Path(os.path.join(pathlib.Path(__file__).parent, ".data_storage")).resolve()
)
cognee_directory_path = str(
pathlib.Path(os.path.join(pathlib.Path(__file__).parent, ".cognee_system")).resolve()
)
cognee.config.data_root_directory(data_directory_path)
cognee.config.system_root_directory(cognee_directory_path)
# Initialize pipeline and collect data
pipeline = dlt.pipeline(
pipeline_name="pokemon_pipeline",
destination="filesystem",
dataset_name="pokemon_data",
)
info = pipeline.run([pokemon_list, pokemon_details])
print(info)
# Load and process data
STORAGE_PATH = Path(".data_storage/pokemon_data/pokemon_details")
jsonl_pokemons = sorted(STORAGE_PATH.glob("*.jsonl"))
if not jsonl_pokemons:
raise FileNotFoundError("No JSONL files found in the storage directory.")
ABILITIES_PATH = Path(".data_storage/pokemon_data/pokemon_details__abilities")
jsonl_abilities = sorted(ABILITIES_PATH.glob("*.jsonl"))
if not jsonl_abilities:
raise FileNotFoundError("No JSONL files found in the storage directory.")
# Process data
abilities_root, pokemon_abilities = load_abilities_data(jsonl_abilities)
pokemon_root = Pokemons(have=abilities_root)
pokemons = load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root)
return pokemons
async def pokemon_cognify(pokemons):
"""Process Pokemon data with Cognee and perform search"""
# Setup and run Cognee tasks
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await cognee_setup()
tasks = [Task(add_data_points, task_config={"batch_size": 50})]
results = run_tasks(
tasks=tasks,
data=pokemons,
dataset_id=uuid5(NAMESPACE_OID, "Pokemon"),
pipeline_name="pokemon_pipeline",
)
async for result in results:
print(result)
print("Done")
# Perform search
search_results = await cognee.search(
query_type=SearchType.GRAPH_COMPLETION, query_text="pokemons?"
)
print("Search results:")
for result_text in search_results:
print(result_text)
async def main():
pokemons = await setup_and_process_data()
await pokemon_cognify(pokemons)
if __name__ == "__main__":
asyncio.run(main())

24
helm/Chart.yaml Normal file
View file

@ -0,0 +1,24 @@
apiVersion: v2
name: cognee-chart
description: A helm chart of the cognee backend deployment on Kubernetes environment
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "1.16.0"

59
helm/Dockerfile Normal file
View file

@ -0,0 +1,59 @@
FROM python:3.11-slim
# Define Poetry extras to install
ARG POETRY_EXTRAS="\
# Storage & Databases \
filesystem postgres weaviate qdrant neo4j falkordb milvus \
# Notebooks & Interactive Environments \
notebook \
# LLM & AI Frameworks \
langchain llama-index gemini huggingface ollama mistral groq \
# Evaluation & Monitoring \
deepeval evals posthog \
# Graph Processing & Code Analysis \
codegraph graphiti \
# Document Processing \
docs"
# Set build argument
ARG DEBUG
# Set environment variable based on the build argument
ENV DEBUG=${DEBUG}
ENV PIP_NO_CACHE_DIR=true
ENV PATH="${PATH}:/root/.poetry/bin"
RUN apt-get install -y \
gcc \
libpq-dev
WORKDIR /app
COPY pyproject.toml poetry.lock /app/
RUN pip install poetry
# Don't create virtualenv since docker is already isolated
RUN poetry config virtualenvs.create false
# Install the dependencies
RUN poetry install --extras "${POETRY_EXTRAS}" --no-root --without dev
# Set the PYTHONPATH environment variable to include the /app directory
ENV PYTHONPATH=/app
COPY cognee/ /app/cognee
# Copy Alembic configuration
COPY alembic.ini /app/alembic.ini
COPY alembic/ /app/alembic
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
RUN sed -i 's/\r$//' /app/entrypoint.sh
ENTRYPOINT ["/app/entrypoint.sh"]

25
helm/README.md Normal file
View file

@ -0,0 +1,25 @@
# cognee-infra-helm
General infrastructure setup for Cognee on Kubernetes using a Helm chart.
## Prerequisites
Before deploying the Helm chart, ensure the following prerequisites are met: 
**Kubernetes Cluster**: A running Kubernetes cluster (e.g., Minikube, GKE, EKS).
**Helm**: Installed and configured for your Kubernetes cluster. You can install Helm by following the [official guide](https://helm.sh/docs/intro/install/). 
**kubectl**: Installed and configured to interact with your cluster. Follow the instructions [here](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
Clone the Repository Clone this repository to your local machine and navigate to the directory.
## Deploy Helm Chart:
```bash
helm install cognee ./cognee-chart
```
**Uninstall Helm Release**:
```bash
helm uninstall cognee
```

View file

@ -0,0 +1,46 @@
services:
cognee:
image : cognee-backend:latest
container_name: cognee-backend
networks:
- cognee-network
build:
context: .
dockerfile: Dockerfile
volumes:
- .:/app
- /app/cognee-frontend/ # Ignore frontend code
environment:
- HOST=0.0.0.0
- ENVIRONMENT=local
- PYTHONPATH=.
ports:
- 8000:8000
# - 5678:5678 # Debugging
deploy:
resources:
limits:
cpus: '4.0'
memory: 8GB
postgres:
image: pgvector/pgvector:pg17
container_name: postgres
environment:
POSTGRES_USER: cognee
POSTGRES_PASSWORD: cognee
POSTGRES_DB: cognee_db
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- 5432:5432
networks:
- cognee-network
networks:
cognee-network:
name: cognee-network
volumes:
postgres_data:

View file

@ -0,0 +1,32 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-cognee
labels:
app: {{ .Release.Name }}-cognee
spec:
replicas: 1
selector:
matchLabels:
app: {{ .Release.Name }}-cognee
template:
metadata:
labels:
app: {{ .Release.Name }}-cognee
spec:
containers:
- name: cognee
image: {{ .Values.cognee.image }}
ports:
- containerPort: {{ .Values.cognee.port }}
env:
- name: HOST
value: {{ .Values.cognee.env.HOST }}
- name: ENVIRONMENT
value: {{ .Values.cognee.env.ENVIRONMENT }}
- name: PYTHONPATH
value: {{ .Values.cognee.env.PYTHONPATH }}
resources:
limits:
cpu: {{ .Values.cognee.resources.cpu }}
memory: {{ .Values.cognee.resources.memory }}

View file

@ -0,0 +1,13 @@
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}-cognee
labels:
app: {{ .Release.Name }}-cognee
spec:
type: NodePort
ports:
- port: {{ .Values.cognee.port }}
targetPort: {{ .Values.cognee.port }}
selector:
app: {{ .Release.Name }}-cognee

View file

@ -0,0 +1,35 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-postgres
labels:
app: {{ .Release.Name }}-postgres
spec:
replicas: 1
selector:
matchLabels:
app: {{ .Release.Name }}-postgres
template:
metadata:
labels:
app: {{ .Release.Name }}-postgres
spec:
containers:
- name: postgres
image: {{ .Values.postgres.image }}
ports:
- containerPort: {{ .Values.postgres.port }}
env:
- name: POSTGRES_USER
value: {{ .Values.postgres.env.POSTGRES_USER }}
- name: POSTGRES_PASSWORD
value: {{ .Values.postgres.env.POSTGRES_PASSWORD }}
- name: POSTGRES_DB
value: {{ .Values.postgres.env.POSTGRES_DB }}
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
volumes:
- name: postgres-storage
persistentVolumeClaim:
claimName: {{ .Release.Name }}-postgres-pvc

View file

@ -0,0 +1,10 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ .Release.Name }}-postgres-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: {{ .Values.postgres.storage }}

View file

@ -0,0 +1,14 @@
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}-postgres
labels:
app: {{ .Release.Name }}-postgres
spec:
type: ClusterIP
ports:
- port: {{ .Values.postgres.port }}
targetPort: {{ .Values.postgres.port }}
selector:
app: {{ .Release.Name }}-postgres

22
helm/values.yaml Normal file
View file

@ -0,0 +1,22 @@
# Configuration for the 'cognee' application service
cognee:
# Image name (using the local image well build in Minikube)
image: "hajdul1988/cognee-backend:latest"
port: 8000
env:
HOST: "0.0.0.0"
ENVIRONMENT: "local"
PYTHONPATH: "."
resources:
cpu: "4.0"
memory: "8Gi"
# Configuration for the 'postgres' database service
postgres:
image: "pgvector/pgvector:pg17"
port: 5432
env:
POSTGRES_USER: "cognee"
POSTGRES_PASSWORD: "cognee"
POSTGRES_DB: "cognee_db"
storage: "8Gi"

View file

@ -0,0 +1,536 @@
{
"cells": [
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:58:00.193158Z",
"start_time": "2025-03-04T11:58:00.190238Z"
}
},
"cell_type": "code",
"source": [
"import nest_asyncio\n",
"nest_asyncio.apply()"
],
"id": "2efba278d106bb5f",
"outputs": [],
"execution_count": 2
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"### Environment Configuration\n",
"#### Setup required directories and environment variables.\n"
],
"id": "ccbb2bc23aa456ee"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:59:33.879188Z",
"start_time": "2025-03-04T11:59:33.873682Z"
}
},
"cell_type": "code",
"source": [
"import pathlib\n",
"import os\n",
"import cognee\n",
"\n",
"notebook_dir = pathlib.Path().resolve()\n",
"data_directory_path = str(notebook_dir / \".data_storage\")\n",
"cognee_directory_path = str(notebook_dir / \".cognee_system\")\n",
"\n",
"cognee.config.data_root_directory(data_directory_path)\n",
"cognee.config.system_root_directory(cognee_directory_path)\n",
"\n",
"BASE_URL = \"https://pokeapi.co/api/v2/\"\n",
"os.environ[\"BUCKET_URL\"] = data_directory_path\n",
"os.environ[\"DATA_WRITER__DISABLE_COMPRESSION\"] = \"true\"\n"
],
"id": "662d554f96f211d9",
"outputs": [],
"execution_count": 8
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Initialize DLT Pipeline\n",
"### Create the DLT pipeline to fetch Pokémon data.\n"
],
"id": "36ae0be71f6e9167"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:58:03.982939Z",
"start_time": "2025-03-04T11:58:03.819676Z"
}
},
"cell_type": "code",
"source": [
"import dlt\n",
"from pathlib import Path\n",
"\n",
"pipeline = dlt.pipeline(\n",
" pipeline_name=\"pokemon_pipeline\",\n",
" destination=\"filesystem\",\n",
" dataset_name=\"pokemon_data\",\n",
")\n"
],
"id": "25101ae5f016ce0c",
"outputs": [],
"execution_count": 4
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Fetch Pokémon List\n",
"### Retrieve a list of Pokémon from the API.\n"
],
"id": "9a87ce05a072c48b"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:58:03.990076Z",
"start_time": "2025-03-04T11:58:03.987199Z"
}
},
"cell_type": "code",
"source": [
"@dlt.resource(write_disposition=\"replace\")\n",
"def pokemon_list(limit: int = 50):\n",
" import requests\n",
" response = requests.get(f\"{BASE_URL}pokemon\", params={\"limit\": limit})\n",
" response.raise_for_status()\n",
" yield response.json()[\"results\"]\n"
],
"id": "3b6e60778c61e24a",
"outputs": [],
"execution_count": 5
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Fetch Pokémon Details\n",
"### Fetch detailed information about each Pokémon.\n"
],
"id": "9952767846194e97"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:58:03.996394Z",
"start_time": "2025-03-04T11:58:03.994122Z"
}
},
"cell_type": "code",
"source": [
"@dlt.transformer(data_from=pokemon_list)\n",
"def pokemon_details(pokemons):\n",
" \"\"\"Fetches detailed info for each Pokémon\"\"\"\n",
" import requests\n",
" for pokemon in pokemons:\n",
" response = requests.get(pokemon[\"url\"])\n",
" response.raise_for_status()\n",
" yield response.json()\n"
],
"id": "79ec9fef12267485",
"outputs": [],
"execution_count": 6
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Run Data Pipeline\n",
"### Execute the pipeline and store Pokémon data.\n"
],
"id": "41e05f660bf9e9d2"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:59:41.571015Z",
"start_time": "2025-03-04T11:59:36.840744Z"
}
},
"cell_type": "code",
"source": [
"info = pipeline.run([pokemon_list, pokemon_details])\n",
"print(info)\n"
],
"id": "20a3b2c7f404677f",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Pipeline pokemon_pipeline load step completed in 0.06 seconds\n",
"1 load package(s) were loaded to destination filesystem and into dataset pokemon_data\n",
"The filesystem destination used file:///Users/lazar/PycharmProjects/cognee/.data_storage location to store data\n",
"Load package 1741089576.860229 is LOADED and contains no failed jobs\n"
]
}
],
"execution_count": 9
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Load Pokémon Abilities\n",
"### Load Pokémon ability data from stored files.\n"
],
"id": "937f10b8d1037743"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:59:44.377719Z",
"start_time": "2025-03-04T11:59:44.363718Z"
}
},
"cell_type": "code",
"source": [
"import json\n",
"from cognee.low_level import DataPoint\n",
"from uuid import uuid5, NAMESPACE_OID\n",
"\n",
"class Abilities(DataPoint):\n",
" name: str = \"Abilities\"\n",
" metadata: dict = {\"index_fields\": [\"name\"]}\n",
"\n",
"def load_abilities_data(jsonl_abilities):\n",
" abilities_root = Abilities()\n",
" pokemon_abilities = []\n",
"\n",
" for jsonl_ability in jsonl_abilities:\n",
" with open(jsonl_ability, \"r\") as f:\n",
" for line in f:\n",
" ability = json.loads(line)\n",
" ability[\"id\"] = uuid5(NAMESPACE_OID, ability[\"_dlt_id\"])\n",
" ability[\"name\"] = ability[\"ability__name\"]\n",
" ability[\"is_type\"] = abilities_root\n",
" pokemon_abilities.append(ability)\n",
"\n",
" return abilities_root, pokemon_abilities\n"
],
"id": "be73050036439ea1",
"outputs": [],
"execution_count": 10
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Load Pokémon Data\n",
"### Load Pokémon details and associate them with abilities.\n"
],
"id": "98c97f799f73df77"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:59:46.251306Z",
"start_time": "2025-03-04T11:59:46.238283Z"
}
},
"cell_type": "code",
"source": [
"from typing import List, Optional\n",
"\n",
"class Pokemons(DataPoint):\n",
" name: str = \"Pokemons\"\n",
" have: Abilities\n",
" metadata: dict = {\"index_fields\": [\"name\"]}\n",
"\n",
"class PokemonAbility(DataPoint):\n",
" name: str\n",
" ability__name: str\n",
" ability__url: str\n",
" is_hidden: bool\n",
" slot: int\n",
" _dlt_load_id: str\n",
" _dlt_id: str\n",
" _dlt_parent_id: str\n",
" _dlt_list_idx: str\n",
" is_type: Abilities\n",
" metadata: dict = {\"index_fields\": [\"ability__name\"]}\n",
"\n",
"class Pokemon(DataPoint):\n",
" name: str\n",
" base_experience: int\n",
" height: int\n",
" weight: int\n",
" is_default: bool\n",
" order: int\n",
" location_area_encounters: str\n",
" species__name: str\n",
" species__url: str\n",
" cries__latest: str\n",
" cries__legacy: str\n",
" sprites__front_default: str\n",
" sprites__front_shiny: str\n",
" sprites__back_default: Optional[str]\n",
" sprites__back_shiny: Optional[str]\n",
" _dlt_load_id: str\n",
" _dlt_id: str\n",
" is_type: Pokemons\n",
" abilities: List[PokemonAbility]\n",
" metadata: dict = {\"index_fields\": [\"name\"]}\n",
"\n",
"def load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root):\n",
" pokemons = []\n",
"\n",
" for jsonl_pokemon in jsonl_pokemons:\n",
" with open(jsonl_pokemon, \"r\") as f:\n",
" for line in f:\n",
" pokemon_data = json.loads(line)\n",
" abilities = [\n",
" ability for ability in pokemon_abilities\n",
" if ability[\"_dlt_parent_id\"] == pokemon_data[\"_dlt_id\"]\n",
" ]\n",
" pokemon_data[\"external_id\"] = pokemon_data[\"id\"]\n",
" pokemon_data[\"id\"] = uuid5(NAMESPACE_OID, str(pokemon_data[\"id\"]))\n",
" pokemon_data[\"abilities\"] = [PokemonAbility(**ability) for ability in abilities]\n",
" pokemon_data[\"is_type\"] = pokemon_root\n",
" pokemons.append(Pokemon(**pokemon_data))\n",
"\n",
" return pokemons\n"
],
"id": "7862951248df0bf5",
"outputs": [],
"execution_count": 11
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Process Pokémon Data\n",
"### Load and associate Pokémon abilities.\n"
],
"id": "676fa5a2b61c2107"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:59:47.365226Z",
"start_time": "2025-03-04T11:59:47.356722Z"
}
},
"cell_type": "code",
"source": [
"STORAGE_PATH = Path(\".data_storage/pokemon_data/pokemon_details\")\n",
"jsonl_pokemons = sorted(STORAGE_PATH.glob(\"*.jsonl\"))\n",
"\n",
"ABILITIES_PATH = Path(\".data_storage/pokemon_data/pokemon_details__abilities\")\n",
"jsonl_abilities = sorted(ABILITIES_PATH.glob(\"*.jsonl\"))\n",
"\n",
"abilities_root, pokemon_abilities = load_abilities_data(jsonl_abilities)\n",
"pokemon_root = Pokemons(have=abilities_root)\n",
"pokemons = load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root)\n"
],
"id": "ad14cdecdccd71bb",
"outputs": [],
"execution_count": 12
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Initialize Cognee\n",
"### Setup Cognee for data processing.\n"
],
"id": "59dec67b2ae50f0f"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:59:49.244577Z",
"start_time": "2025-03-04T11:59:48.618261Z"
}
},
"cell_type": "code",
"source": [
"import asyncio\n",
"from cognee.low_level import setup as cognee_setup\n",
"\n",
"async def initialize_cognee():\n",
" await cognee.prune.prune_data()\n",
" await cognee.prune.prune_system(metadata=True)\n",
" await cognee_setup()\n",
"\n",
"await initialize_cognee()\n"
],
"id": "d2e095ae576a02c1",
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter:Database deleted successfully.INFO:cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter:Database deleted successfully."
]
}
],
"execution_count": 13
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Process Pokémon Data\n",
"### Add Pokémon data points to Cognee.\n"
],
"id": "5f0b8090bc7b1fe6"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T11:59:57.744035Z",
"start_time": "2025-03-04T11:59:50.574033Z"
}
},
"cell_type": "code",
"source": [
"from cognee.modules.pipelines.tasks.Task import Task\n",
"from cognee.tasks.storage import add_data_points\n",
"from cognee.modules.pipelines import run_tasks\n",
"\n",
"tasks = [Task(add_data_points, task_config={\"batch_size\": 50})]\n",
"results = run_tasks(\n",
" tasks=tasks,\n",
" data=pokemons,\n",
" dataset_id=uuid5(NAMESPACE_OID, \"Pokemon\"),\n",
" pipeline_name='pokemon_pipeline',\n",
")\n",
"\n",
"async for result in results:\n",
" print(result)\n",
"print(\"Done\")\n"
],
"id": "ffa12fc1f5350d95",
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:run_tasks(tasks: [Task], data):Pipeline run started: `fd2ed59d-b550-5b05-bbe6-7b708fe12483`INFO:run_tasks(tasks: [Task], data):Coroutine task started: `add_data_points`"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x300bb3950>\n",
"User d347ea85-e512-4cae-b9d7-496fe1745424 has registered.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/lazar/PycharmProjects/cognee/cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py:79: SAWarning: This declarative base already contains a class with the same class name and module name as cognee.infrastructure.databases.vector.pgvector.PGVectorAdapter.PGVectorDataPoint, and will be replaced in the string-lookup table.\n",
" class PGVectorDataPoint(Base):\n",
"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"/Users/lazar/PycharmProjects/cognee/cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py:113: SAWarning: This declarative base already contains a class with the same class name and module name as cognee.infrastructure.databases.vector.pgvector.PGVectorAdapter.PGVectorDataPoint, and will be replaced in the string-lookup table.\n",
" class PGVectorDataPoint(Base):\n",
"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 8, column: 16, offset: 335} for query: '\\n UNWIND $nodes AS node\\n MERGE (n {id: node.node_id})\\n ON CREATE SET n += node.properties, n.updated_at = timestamp()\\n ON MATCH SET n += node.properties, n.updated_at = timestamp()\\n WITH n, node.node_id AS label\\n CALL apoc.create.addLabels(n, [label]) YIELD node AS labeledNode\\n RETURN ID(labeledNode) AS internal_id, labeledNode.id AS nodeId\\n 'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 1, column: 18, offset: 17} for query: 'MATCH (n) RETURN ID(n) AS id, labels(n) AS labels, properties(n) AS properties'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 16, offset: 43} for query: '\\n MATCH (n)-[r]->(m)\\n RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n 'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 33, offset: 60} for query: '\\n MATCH (n)-[r]->(m)\\n RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n 'INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:run_tasks(tasks: [Task], data):Coroutine task completed: `add_data_points`INFO:run_tasks(tasks: [Task], data):Pipeline run completed: `fd2ed59d-b550-5b05-bbe6-7b708fe12483`"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x30016fd40>\n",
"Done\n"
]
}
],
"execution_count": 14
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Search Pokémon Data\n",
"### Execute a search query using Cognee.\n"
],
"id": "e0d98d9832a2797a"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2025-03-04T12:00:02.878871Z",
"start_time": "2025-03-04T11:59:59.571965Z"
}
},
"cell_type": "code",
"source": [
"from cognee.api.v1.search import SearchType\n",
"\n",
"search_results = await cognee.search(\n",
" query_type=SearchType.GRAPH_COMPLETION,\n",
" query_text=\"pokemons?\"\n",
")\n",
"\n",
"print(\"Search results:\")\n",
"for result_text in search_results:\n",
" print(result_text)"
],
"id": "bb2476b6b0c2aff",
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 1, column: 18, offset: 17} for query: 'MATCH (n) RETURN ID(n) AS id, labels(n) AS labels, properties(n) AS properties'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 16, offset: 43} for query: '\\n MATCH (n)-[r]->(m)\\n RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n 'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 33, offset: 60} for query: '\\n MATCH (n)-[r]->(m)\\n RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n 'INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\u001B[92m13:00:02 - LiteLLM:INFO\u001B[0m: utils.py:2784 - \n",
"LiteLLM completion() model= gpt-4o-mini; provider = openaiINFO:LiteLLM:\n",
"LiteLLM completion() model= gpt-4o-mini; provider = openai"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Search results:\n",
"The Pokemons mentioned are: golbat, jigglypuff, raichu, vulpix, and pikachu.\n"
]
}
],
"execution_count": 15
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": "",
"id": "a4c2d3e9c15b017"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View file

@ -1,6 +1,6 @@
[tool.poetry]
name = "cognee"
version = "0.1.32"
version = "0.1.33"
description = "Cognee - is a library for enriching LLM context with a semantic layer for better understanding and reasoning."
authors = ["Vasilije Markovic", "Boris Arzentar"]
readme = "README.md"