Merge branch 'dev' of github.com:topoteretes/cognee into dev
This commit is contained in:
commit
816d99309d
32 changed files with 1264 additions and 132 deletions
149
CONTRIBUTING.md
149
CONTRIBUTING.md
|
|
@ -1,97 +1,128 @@
|
||||||
# 🚀 How to Contribute to **cognee**
|
# 🎉 Welcome to **cognee**!
|
||||||
|
|
||||||
Thank you for investing time in contributing to our project! Here's a guide to get you started.
|
We're excited that you're interested in contributing to our project!
|
||||||
|
We want to ensure that every user and contributor feels welcome, included and supported to participate in cognee community.
|
||||||
|
This guide will help you get started and ensure your contributions can be efficiently integrated into the project.
|
||||||
|
|
||||||
## 1. 🚀 Getting Started
|
## 🌟 Quick Links
|
||||||
|
|
||||||
### 🍴 Fork the Repository
|
- [Code of Conduct](CODE_OF_CONDUCT.md)
|
||||||
|
- [Discord Community](https://discord.gg/bcy8xFAtfd)
|
||||||
|
- [Issue Tracker](https://github.com/topoteretes/cognee/issues)
|
||||||
|
|
||||||
To start your journey, you'll need your very own copy of **cognee**. Think of it as your own innovation lab. 🧪
|
## 1. 🚀 Ways to Contribute
|
||||||
|
|
||||||
1. Navigate to the [**cognee**](https://github.com/topoteretes/cognee) repository on GitHub.
|
You can contribute to **cognee** in many ways:
|
||||||
2. In the upper-right corner, click the **'Fork'** button.
|
|
||||||
|
|
||||||
### 🚀 Clone the Repository
|
- 📝 Submitting bug reports or feature requests
|
||||||
|
- 💡 Improving documentation
|
||||||
|
- 🔍 Reviewing pull requests
|
||||||
|
- 🛠️ Contributing code or tests
|
||||||
|
- 🌐 Helping other users
|
||||||
|
|
||||||
Next, let's bring your newly forked repository to your local machine.
|
## 📫 Get in Touch
|
||||||
|
|
||||||
|
There are several ways to connect with the **cognee** team and community:
|
||||||
|
|
||||||
|
### GitHub Collaboration
|
||||||
|
- [Open an issue](https://github.com/topoteretes/cognee/issues) for bug reports, feature requests, or discussions
|
||||||
|
- Submit pull requests to contribute code or documentation
|
||||||
|
- Join ongoing discussions in existing issues and PRs
|
||||||
|
|
||||||
|
### Community Channels
|
||||||
|
- Join our [Discord community](https://discord.gg/bcy8xFAtfd) for real-time discussions
|
||||||
|
- Participate in community events and discussions
|
||||||
|
- Get help from other community members
|
||||||
|
|
||||||
|
### Direct Contact
|
||||||
|
- Email: vasilije@cognee.ai
|
||||||
|
- For business inquiries or sensitive matters, please reach out via email
|
||||||
|
- For general questions, prefer public channels like GitHub issues or Discord
|
||||||
|
|
||||||
|
We aim to respond to all communications within 2 business days. For faster responses, consider using our Discord channel where the whole community can help!
|
||||||
|
|
||||||
|
## Issue Labels
|
||||||
|
|
||||||
|
To help you find the most appropriate issues to work on, we use the following labels:
|
||||||
|
|
||||||
|
- `good first issue` - Perfect for newcomers to the project
|
||||||
|
- `bug` - Something isn't working as expected
|
||||||
|
- `documentation` - Improvements or additions to documentation
|
||||||
|
- `enhancement` - New features or improvements
|
||||||
|
- `help wanted` - Extra attention or assistance needed
|
||||||
|
- `question` - Further information is requested
|
||||||
|
- `wontfix` - This will not be worked on
|
||||||
|
|
||||||
|
Looking for a place to start? Try filtering for [good first issues](https://github.com/topoteretes/cognee/labels/good%20first%20issue)!
|
||||||
|
|
||||||
|
|
||||||
|
## 2. 🛠️ Development Setup
|
||||||
|
|
||||||
|
### Fork and Clone
|
||||||
|
|
||||||
|
1. Fork the [**cognee**](https://github.com/topoteretes/cognee) repository
|
||||||
|
2. Clone your fork:
|
||||||
```shell
|
```shell
|
||||||
git clone https://github.com/<your-github-username>/cognee.git
|
git clone https://github.com/<your-github-username>/cognee.git
|
||||||
|
cd cognee
|
||||||
```
|
```
|
||||||
|
|
||||||
## 2. 🛠️ Making Changes
|
### Create a Branch
|
||||||
|
|
||||||
### 🌟 Create a Branch
|
|
||||||
|
|
||||||
Get ready to channel your creativity. Begin by creating a new branch for your incredible features. 🧞♂️
|
|
||||||
|
|
||||||
|
Create a new branch for your work:
|
||||||
```shell
|
```shell
|
||||||
git checkout -b feature/your-feature-name
|
git checkout -b feature/your-feature-name
|
||||||
```
|
```
|
||||||
|
|
||||||
### ✏️ Make Your Changes
|
## 3. 🎯 Making Changes
|
||||||
|
|
||||||
Now's your chance to shine! Dive in and make your contributions. 🌠
|
1. **Code Style**: Follow the project's coding standards
|
||||||
|
2. **Documentation**: Update relevant documentation
|
||||||
## 3. 🚀 Submitting Changes
|
3. **Tests**: Add tests for new features
|
||||||
|
4. **Commits**: Write clear commit messages
|
||||||
After making your changes, follow these steps:
|
|
||||||
|
|
||||||
### ✅ Run the Tests
|
|
||||||
|
|
||||||
Ensure your changes do not break the existing codebase:
|
|
||||||
|
|
||||||
|
### Running Tests
|
||||||
```shell
|
```shell
|
||||||
python cognee/cognee/tests/test_library.py
|
python cognee/cognee/tests/test_library.py
|
||||||
```
|
```
|
||||||
|
|
||||||
### 🚢 Push Your Feature Branch
|
## 4. 📤 Submitting Changes
|
||||||
|
|
||||||
|
1. Push your changes:
|
||||||
```shell
|
```shell
|
||||||
# Add your changes to the staging area:
|
|
||||||
git add .
|
git add .
|
||||||
|
git commit -s -m "Description of your changes"
|
||||||
# Commit changes with an adequate description:
|
|
||||||
git commit -m "Describe your changes here"
|
|
||||||
|
|
||||||
# Push your feature branch to your forked repository:
|
|
||||||
git push origin feature/your-feature-name
|
git push origin feature/your-feature-name
|
||||||
```
|
```
|
||||||
|
|
||||||
### 🚀 Create a Pull Request
|
2. Create a Pull Request:
|
||||||
|
- Go to the [**cognee** repository](https://github.com/topoteretes/cognee)
|
||||||
|
- Click "Compare & Pull Request"
|
||||||
|
- Fill in the PR template with details about your changes
|
||||||
|
|
||||||
You're on the verge of completion! It's time to showcase your hard work. 🌐
|
## 5. 📜 Developer Certificate of Origin (DCO)
|
||||||
|
|
||||||
1. Go to [**cognee**](https://github.com/topoteretes/cognee) on GitHub.
|
All contributions must be signed-off to indicate agreement with our DCO:
|
||||||
2. Hit the **"Compare & Pull Request"** button.
|
|
||||||
3. Select the base branch (main) and the compare branch (the one with your features).
|
|
||||||
4. Craft a **compelling title** and provide a **detailed description** of your contributions. 🎩
|
|
||||||
|
|
||||||
## 4. 🔍 Review and Approval
|
```shell
|
||||||
|
git config alias.cos "commit -s" # Create alias for signed commits
|
||||||
|
```
|
||||||
|
|
||||||
The project maintainers will review your work, possibly suggest improvements, or request further details. Once you receive approval, your contributions will become part of **cognee**!
|
When your PR is ready, please include:
|
||||||
|
> "I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin"
|
||||||
|
|
||||||
|
## 6. 🤝 Community Guidelines
|
||||||
|
|
||||||
## 5. Developer Certificate of Origin
|
- Be respectful and inclusive
|
||||||
All contributions to the topoteretes codebase must be signed-off to indicate you have read and agreed to the Developer Certificate of Origin (DCO), which is in the root directory under name DCO. To sign the DCO, simply add -s after all commits that you make, to do this easily you can make a git alias from the command line, for example:
|
- Help others learn and grow
|
||||||
|
- Follow our [Code of Conduct](CODE_OF_CONDUCT.md)
|
||||||
|
- Provide constructive feedback
|
||||||
|
- Ask questions when unsure
|
||||||
|
|
||||||
$ git config alias.cos "commit -s"
|
## 7. 📫 Getting Help
|
||||||
|
|
||||||
Will allow you to write git cos which will automatically sign-off your commit. By signing a commit you are agreeing to the DCO and agree that you will be banned from the topoteretes GitHub organisation and Discord server if you violate the DCO.
|
- Open an [issue](https://github.com/topoteretes/cognee/issues)
|
||||||
|
- Join our Discord community
|
||||||
|
- Check existing documentation
|
||||||
|
|
||||||
"When a commit is ready to be merged please use the following template to agree to our developer certificate of origin:
|
Thank you for contributing to **cognee**! 🌟
|
||||||
'I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin'
|
|
||||||
|
|
||||||
We consider the following as violations to the DCO:
|
|
||||||
|
|
||||||
Signing the DCO with a fake name or pseudonym, if you are registered on GitHub or another platform with a fake name then you will not be able to contribute to topoteretes before updating your name;
|
|
||||||
Submitting a contribution that you did not have the right to submit whether due to licensing, copyright, or any other restrictions.
|
|
||||||
|
|
||||||
## 6. 📜 Code of Conduct
|
|
||||||
Ensure you adhere to the project's [Code of Conduct](https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md) throughout your participation.
|
|
||||||
|
|
||||||
## 7. 📫 Contact
|
|
||||||
|
|
||||||
If you need assistance or simply wish to connect, we're here for you. Contact us by filing an issue on the GitHub repository or by messaging us on our Discord server.
|
|
||||||
|
|
||||||
Thanks for helping to evolve **cognee**!
|
|
||||||
|
|
|
||||||
|
|
@ -8,11 +8,11 @@
|
||||||
cognee - memory layer for AI apps and Agents
|
cognee - memory layer for AI apps and Agents
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
|
<a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
|
||||||
|
.
|
||||||
<a href="https://cognee.ai">Learn more</a>
|
<a href="https://cognee.ai">Learn more</a>
|
||||||
·
|
·
|
||||||
<a href="https://discord.gg/NQPKmU5CCg">Join Discord</a>
|
<a href="https://discord.gg/NQPKmU5CCg">Join Discord</a>
|
||||||
·
|
|
||||||
<a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -89,7 +89,7 @@ Add LLM_API_KEY to .env using the command bellow.
|
||||||
```
|
```
|
||||||
echo "LLM_API_KEY=YOUR_OPENAI_API_KEY" > .env
|
echo "LLM_API_KEY=YOUR_OPENAI_API_KEY" > .env
|
||||||
```
|
```
|
||||||
You can see available env variables in the repository `.env.template` file.
|
You can see available env variables in the repository `.env.template` file. If you don't specify it otherwise, like in this example, SQLite (relational database), LanceDB (vector database) and NetworkX (graph store) will be used as default components.
|
||||||
|
|
||||||
This script will run the default pipeline:
|
This script will run the default pipeline:
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,34 +1,35 @@
|
||||||
# Use a Python image with uv pre-installed
|
# Use a Python image with uv pre-installed
|
||||||
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS uv
|
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS uv
|
||||||
|
|
||||||
# Set build argument
|
# Install the project into `/app`
|
||||||
ARG DEBUG
|
|
||||||
|
|
||||||
# Set environment variable based on the build argument
|
|
||||||
ENV DEBUG=${DEBUG}
|
|
||||||
ENV PIP_NO_CACHE_DIR=true
|
|
||||||
|
|
||||||
WORKDIR /app
|
WORKDIR /app
|
||||||
|
|
||||||
# Enable bytecode compilation
|
# Enable bytecode compilation
|
||||||
ENV UV_COMPILE_BYTECODE=1
|
# ENV UV_COMPILE_BYTECODE=1
|
||||||
|
|
||||||
# Copy from the cache instead of linking since it's a mounted volume
|
# Copy from the cache instead of linking since it's a mounted volume
|
||||||
ENV UV_LINK_MODE=copy
|
ENV UV_LINK_MODE=copy
|
||||||
|
|
||||||
RUN apt-get update && apt-get install -y \
|
# Install the project's dependencies using the lockfile and settings
|
||||||
gcc \
|
RUN --mount=type=cache,target=/root/.cache/uv \
|
||||||
libpq-dev
|
--mount=type=bind,source=uv.lock,target=uv.lock \
|
||||||
|
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
|
||||||
|
uv sync --frozen --no-install-project --no-dev --no-editable
|
||||||
|
|
||||||
RUN apt-get install -y \
|
# Then, add the rest of the project source code and install it
|
||||||
gcc \
|
# Installing separately from its dependencies allows optimal layer caching
|
||||||
libpq-dev
|
ADD . /app
|
||||||
|
RUN --mount=type=cache,target=/root/.cache/uv \
|
||||||
|
uv sync --frozen --no-dev --no-editable
|
||||||
|
|
||||||
COPY . /app
|
FROM python:3.12-slim-bookworm
|
||||||
|
|
||||||
RUN uv sync --reinstall
|
WORKDIR /app
|
||||||
|
|
||||||
|
COPY --from=uv /root/.local /root/.local
|
||||||
|
COPY --from=uv --chown=app:app /app /app
|
||||||
|
|
||||||
# Place executables in the environment at the front of the path
|
# Place executables in the environment at the front of the path
|
||||||
ENV PATH="/app/:/app/.venv/bin:$PATH"
|
ENV PATH="/app/.venv/bin:$PATH"
|
||||||
|
|
||||||
ENTRYPOINT ["cognee"]
|
ENTRYPOINT ["cognee"]
|
||||||
|
|
|
||||||
|
|
@ -82,5 +82,5 @@ http://localhost:5173?timeout=120000
|
||||||
To apply new changes while developing cognee you need to do:
|
To apply new changes while developing cognee you need to do:
|
||||||
|
|
||||||
1. `poetry lock` in cognee folder
|
1. `poetry lock` in cognee folder
|
||||||
2. `uv sync --dev --all-extras --reinstall `
|
2. `uv sync --dev --all-extras --reinstall`
|
||||||
3. `mcp dev src/server.py`
|
3. `mcp dev src/server.py`
|
||||||
|
|
|
||||||
|
|
@ -1,12 +1,12 @@
|
||||||
[project]
|
[project]
|
||||||
name = "cognee-mcp"
|
name = "cognee-mcp"
|
||||||
version = "0.1.0"
|
version = "0.2.0"
|
||||||
description = "A MCP server project"
|
description = "A MCP server project"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.10"
|
requires-python = ">=3.10"
|
||||||
|
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"cognee[codegraph,gemini,huggingface]",
|
"cognee[postgres,codegraph,gemini,huggingface]",
|
||||||
"mcp==1.2.1",
|
"mcp==1.2.1",
|
||||||
"uv>=0.6.3",
|
"uv>=0.6.3",
|
||||||
]
|
]
|
||||||
|
|
|
||||||
2
cognee-mcp/uv.lock
generated
2
cognee-mcp/uv.lock
generated
|
|
@ -547,7 +547,7 @@ huggingface = [
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "cognee-mcp"
|
name = "cognee-mcp"
|
||||||
version = "0.1.0"
|
version = "0.2.0"
|
||||||
source = { editable = "." }
|
source = { editable = "." }
|
||||||
dependencies = [
|
dependencies = [
|
||||||
{ name = "cognee", extra = ["codegraph", "gemini", "huggingface"] },
|
{ name = "cognee", extra = ["codegraph", "gemini", "huggingface"] },
|
||||||
|
|
|
||||||
|
|
@ -29,13 +29,16 @@ class AnswerGeneratorExecutor:
|
||||||
retrieval_context = await retriever.get_context(query_text)
|
retrieval_context = await retriever.get_context(query_text)
|
||||||
search_results = await retriever.get_completion(query_text, retrieval_context)
|
search_results = await retriever.get_completion(query_text, retrieval_context)
|
||||||
|
|
||||||
answers.append(
|
answer = {
|
||||||
{
|
"question": query_text,
|
||||||
"question": query_text,
|
"answer": search_results[0],
|
||||||
"answer": search_results[0],
|
"golden_answer": correct_answer,
|
||||||
"golden_answer": correct_answer,
|
"retrieval_context": retrieval_context,
|
||||||
"retrieval_context": retrieval_context,
|
}
|
||||||
}
|
|
||||||
)
|
if "golden_context" in instance:
|
||||||
|
answer["golden_context"] = instance["golden_context"]
|
||||||
|
|
||||||
|
answers.append(answer)
|
||||||
|
|
||||||
return answers
|
return answers
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
import logging
|
import logging
|
||||||
import json
|
import json
|
||||||
from typing import List
|
from typing import List, Optional
|
||||||
from cognee.eval_framework.answer_generation.answer_generation_executor import (
|
from cognee.eval_framework.answer_generation.answer_generation_executor import (
|
||||||
AnswerGeneratorExecutor,
|
AnswerGeneratorExecutor,
|
||||||
retriever_options,
|
retriever_options,
|
||||||
|
|
@ -32,7 +32,7 @@ async def create_and_insert_answers_table(questions_payload):
|
||||||
|
|
||||||
|
|
||||||
async def run_question_answering(
|
async def run_question_answering(
|
||||||
params: dict, system_prompt="answer_simple_question.txt"
|
params: dict, system_prompt="answer_simple_question.txt", top_k: Optional[int] = None
|
||||||
) -> List[dict]:
|
) -> List[dict]:
|
||||||
if params.get("answering_questions"):
|
if params.get("answering_questions"):
|
||||||
logging.info("Question answering started...")
|
logging.info("Question answering started...")
|
||||||
|
|
@ -48,7 +48,9 @@ async def run_question_answering(
|
||||||
answer_generator = AnswerGeneratorExecutor()
|
answer_generator = AnswerGeneratorExecutor()
|
||||||
answers = await answer_generator.question_answering_non_parallel(
|
answers = await answer_generator.question_answering_non_parallel(
|
||||||
questions=questions,
|
questions=questions,
|
||||||
retriever=retriever_options[params["qa_engine"]](system_prompt_path=system_prompt),
|
retriever=retriever_options[params["qa_engine"]](
|
||||||
|
system_prompt_path=system_prompt, top_k=top_k
|
||||||
|
),
|
||||||
)
|
)
|
||||||
with open(params["answers_path"], "w", encoding="utf-8") as f:
|
with open(params["answers_path"], "w", encoding="utf-8") as f:
|
||||||
json.dump(answers, f, ensure_ascii=False, indent=4)
|
json.dump(answers, f, ensure_ascii=False, indent=4)
|
||||||
|
|
|
||||||
|
|
@ -5,18 +5,21 @@ from cognee.eval_framework.benchmark_adapters.base_benchmark_adapter import Base
|
||||||
|
|
||||||
class DummyAdapter(BaseBenchmarkAdapter):
|
class DummyAdapter(BaseBenchmarkAdapter):
|
||||||
def load_corpus(
|
def load_corpus(
|
||||||
self, limit: Optional[int] = None, seed: int = 42
|
self, limit: Optional[int] = None, seed: int = 42, load_golden_context: bool = False
|
||||||
) -> tuple[list[str], list[dict[str, Any]]]:
|
) -> tuple[list[str], list[dict[str, Any]]]:
|
||||||
corpus_list = [
|
corpus_list = [
|
||||||
"The cognee is an AI memory engine that supports different vector and graph databases",
|
"The cognee is an AI memory engine that supports different vector and graph databases",
|
||||||
"Neo4j is a graph database supported by cognee",
|
"Neo4j is a graph database supported by cognee",
|
||||||
]
|
]
|
||||||
question_answer_pairs = [
|
qa_pair = {
|
||||||
{
|
"answer": "Yes",
|
||||||
"answer": "Yes",
|
"question": "Is Neo4j supported by cognee?",
|
||||||
"question": "Is Neo4j supported by cognee?",
|
"type": "dummy",
|
||||||
"type": "dummy",
|
}
|
||||||
}
|
|
||||||
]
|
if load_golden_context:
|
||||||
|
qa_pair["golden_context"] = "Cognee supports Neo4j and NetworkX"
|
||||||
|
|
||||||
|
question_answer_pairs = [qa_pair]
|
||||||
|
|
||||||
return corpus_list, question_answer_pairs
|
return corpus_list, question_answer_pairs
|
||||||
|
|
|
||||||
|
|
@ -28,14 +28,22 @@ class CorpusBuilderExecutor:
|
||||||
self.questions = None
|
self.questions = None
|
||||||
self.task_getter = task_getter
|
self.task_getter = task_getter
|
||||||
|
|
||||||
def load_corpus(self, limit: Optional[int] = None) -> Tuple[List[Dict], List[str]]:
|
def load_corpus(
|
||||||
self.raw_corpus, self.questions = self.adapter.load_corpus(limit=limit)
|
self, limit: Optional[int] = None, load_golden_context: bool = False
|
||||||
|
) -> Tuple[List[Dict], List[str]]:
|
||||||
|
self.raw_corpus, self.questions = self.adapter.load_corpus(
|
||||||
|
limit=limit, load_golden_context=load_golden_context
|
||||||
|
)
|
||||||
return self.raw_corpus, self.questions
|
return self.raw_corpus, self.questions
|
||||||
|
|
||||||
async def build_corpus(
|
async def build_corpus(
|
||||||
self, limit: Optional[int] = None, chunk_size=1024, chunker=TextChunker
|
self,
|
||||||
|
limit: Optional[int] = None,
|
||||||
|
chunk_size=1024,
|
||||||
|
chunker=TextChunker,
|
||||||
|
load_golden_context: bool = False,
|
||||||
) -> List[str]:
|
) -> List[str]:
|
||||||
self.load_corpus(limit=limit)
|
self.load_corpus(limit=limit, load_golden_context=load_golden_context)
|
||||||
await self.run_cognee(chunk_size=chunk_size, chunker=chunker)
|
await self.run_cognee(chunk_size=chunk_size, chunker=chunker)
|
||||||
return self.questions
|
return self.questions
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -47,7 +47,10 @@ async def run_corpus_builder(params: dict, chunk_size=1024, chunker=TextChunker)
|
||||||
task_getter=task_getter,
|
task_getter=task_getter,
|
||||||
)
|
)
|
||||||
questions = await corpus_builder.build_corpus(
|
questions = await corpus_builder.build_corpus(
|
||||||
limit=params.get("number_of_samples_in_corpus"), chunk_size=chunk_size, chunker=chunker
|
limit=params.get("number_of_samples_in_corpus"),
|
||||||
|
chunk_size=chunk_size,
|
||||||
|
chunker=chunker,
|
||||||
|
load_golden_context=params.get("evaluating_contexts"),
|
||||||
)
|
)
|
||||||
with open(params["questions_path"], "w", encoding="utf-8") as f:
|
with open(params["questions_path"], "w", encoding="utf-8") as f:
|
||||||
json.dump(questions, f, ensure_ascii=False, indent=4)
|
json.dump(questions, f, ensure_ascii=False, indent=4)
|
||||||
|
|
|
||||||
|
|
@ -4,6 +4,7 @@ from cognee.eval_framework.eval_config import EvalConfig
|
||||||
from cognee.eval_framework.evaluation.base_eval_adapter import BaseEvalAdapter
|
from cognee.eval_framework.evaluation.base_eval_adapter import BaseEvalAdapter
|
||||||
from cognee.eval_framework.evaluation.metrics.exact_match import ExactMatchMetric
|
from cognee.eval_framework.evaluation.metrics.exact_match import ExactMatchMetric
|
||||||
from cognee.eval_framework.evaluation.metrics.f1 import F1ScoreMetric
|
from cognee.eval_framework.evaluation.metrics.f1 import F1ScoreMetric
|
||||||
|
from cognee.eval_framework.evaluation.metrics.context_coverage import ContextCoverageMetric
|
||||||
from typing import Any, Dict, List
|
from typing import Any, Dict, List
|
||||||
from deepeval.metrics import ContextualRelevancyMetric
|
from deepeval.metrics import ContextualRelevancyMetric
|
||||||
|
|
||||||
|
|
@ -15,6 +16,7 @@ class DeepEvalAdapter(BaseEvalAdapter):
|
||||||
"EM": ExactMatchMetric(),
|
"EM": ExactMatchMetric(),
|
||||||
"f1": F1ScoreMetric(),
|
"f1": F1ScoreMetric(),
|
||||||
"contextual_relevancy": ContextualRelevancyMetric(),
|
"contextual_relevancy": ContextualRelevancyMetric(),
|
||||||
|
"context_coverage": ContextCoverageMetric(),
|
||||||
}
|
}
|
||||||
|
|
||||||
async def evaluate_answers(
|
async def evaluate_answers(
|
||||||
|
|
@ -32,6 +34,7 @@ class DeepEvalAdapter(BaseEvalAdapter):
|
||||||
actual_output=answer["answer"],
|
actual_output=answer["answer"],
|
||||||
expected_output=answer["golden_answer"],
|
expected_output=answer["golden_answer"],
|
||||||
retrieval_context=[answer["retrieval_context"]],
|
retrieval_context=[answer["retrieval_context"]],
|
||||||
|
context=[answer["golden_context"]] if "golden_context" in answer else None,
|
||||||
)
|
)
|
||||||
metric_results = {}
|
metric_results = {}
|
||||||
for metric in evaluator_metrics:
|
for metric in evaluator_metrics:
|
||||||
|
|
|
||||||
|
|
@ -23,5 +23,6 @@ class EvaluationExecutor:
|
||||||
async def execute(self, answers: List[Dict[str, str]], evaluator_metrics: Any) -> Any:
|
async def execute(self, answers: List[Dict[str, str]], evaluator_metrics: Any) -> Any:
|
||||||
if self.evaluate_contexts:
|
if self.evaluate_contexts:
|
||||||
evaluator_metrics.append("contextual_relevancy")
|
evaluator_metrics.append("contextual_relevancy")
|
||||||
|
evaluator_metrics.append("context_coverage")
|
||||||
metrics = await self.eval_adapter.evaluate_answers(answers, evaluator_metrics)
|
metrics = await self.eval_adapter.evaluate_answers(answers, evaluator_metrics)
|
||||||
return metrics
|
return metrics
|
||||||
|
|
|
||||||
50
cognee/eval_framework/evaluation/metrics/context_coverage.py
Normal file
50
cognee/eval_framework/evaluation/metrics/context_coverage.py
Normal file
|
|
@ -0,0 +1,50 @@
|
||||||
|
from deepeval.metrics import SummarizationMetric
|
||||||
|
from deepeval.test_case import LLMTestCase
|
||||||
|
from deepeval.metrics.summarization.schema import ScoreType
|
||||||
|
from deepeval.metrics.indicator import metric_progress_indicator
|
||||||
|
from deepeval.utils import get_or_create_event_loop
|
||||||
|
|
||||||
|
|
||||||
|
class ContextCoverageMetric(SummarizationMetric):
|
||||||
|
def measure(
|
||||||
|
self,
|
||||||
|
test_case,
|
||||||
|
_show_indicator: bool = True,
|
||||||
|
) -> float:
|
||||||
|
mapped_test_case = LLMTestCase(
|
||||||
|
input=test_case.context[0],
|
||||||
|
actual_output=test_case.retrieval_context[0],
|
||||||
|
)
|
||||||
|
self.assessment_questions = None
|
||||||
|
self.evaluation_cost = 0 if self.using_native_model else None
|
||||||
|
with metric_progress_indicator(self, _show_indicator=_show_indicator):
|
||||||
|
if self.async_mode:
|
||||||
|
loop = get_or_create_event_loop()
|
||||||
|
return loop.run_until_complete(
|
||||||
|
self.a_measure(mapped_test_case, _show_indicator=False)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
self.coverage_verdicts = self._generate_coverage_verdicts(mapped_test_case)
|
||||||
|
self.alignment_verdicts = []
|
||||||
|
self.score = self._calculate_score(ScoreType.COVERAGE)
|
||||||
|
self.reason = self._generate_reason()
|
||||||
|
self.success = self.score >= self.threshold
|
||||||
|
return self.score
|
||||||
|
|
||||||
|
async def a_measure(
|
||||||
|
self,
|
||||||
|
test_case,
|
||||||
|
_show_indicator: bool = True,
|
||||||
|
) -> float:
|
||||||
|
self.evaluation_cost = 0 if self.using_native_model else None
|
||||||
|
with metric_progress_indicator(
|
||||||
|
self,
|
||||||
|
async_mode=True,
|
||||||
|
_show_indicator=_show_indicator,
|
||||||
|
):
|
||||||
|
self.coverage_verdicts = await self._a_generate_coverage_verdicts(test_case)
|
||||||
|
self.alignment_verdicts = []
|
||||||
|
self.score = self._calculate_score(ScoreType.COVERAGE)
|
||||||
|
self.reason = await self._a_generate_reason()
|
||||||
|
self.success = self.score >= self.threshold
|
||||||
|
return self.score
|
||||||
|
|
@ -3,6 +3,12 @@ import plotly.graph_objects as go
|
||||||
from typing import Dict, List, Tuple
|
from typing import Dict, List, Tuple
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
|
|
||||||
|
metrics_fields = {
|
||||||
|
"contextual_relevancy": ["question", "retrieval_context"],
|
||||||
|
"context_coverage": ["question", "retrieval_context", "golden_context"],
|
||||||
|
}
|
||||||
|
default_metrics_fields = ["question", "answer", "golden_answer"]
|
||||||
|
|
||||||
|
|
||||||
def create_distribution_plots(metrics_data: Dict[str, List[float]]) -> List[str]:
|
def create_distribution_plots(metrics_data: Dict[str, List[float]]) -> List[str]:
|
||||||
"""Create distribution histogram plots for each metric."""
|
"""Create distribution histogram plots for each metric."""
|
||||||
|
|
@ -59,38 +65,30 @@ def generate_details_html(metrics_data: List[Dict]) -> List[str]:
|
||||||
for metric, values in entry["metrics"].items():
|
for metric, values in entry["metrics"].items():
|
||||||
if metric not in metric_details:
|
if metric not in metric_details:
|
||||||
metric_details[metric] = []
|
metric_details[metric] = []
|
||||||
|
current_metrics_fields = metrics_fields.get(metric, default_metrics_fields)
|
||||||
metric_details[metric].append(
|
metric_details[metric].append(
|
||||||
{
|
{key: entry[key] for key in current_metrics_fields}
|
||||||
"question": entry["question"],
|
| {
|
||||||
"answer": entry["answer"],
|
|
||||||
"golden_answer": entry["golden_answer"],
|
|
||||||
"reason": values.get("reason", ""),
|
"reason": values.get("reason", ""),
|
||||||
"score": values["score"],
|
"score": values["score"],
|
||||||
}
|
}
|
||||||
)
|
)
|
||||||
|
|
||||||
for metric, details in metric_details.items():
|
for metric, details in metric_details.items():
|
||||||
|
formatted_column_names = [key.replace("_", " ").title() for key in details[0].keys()]
|
||||||
details_html.append(f"<h3>{metric} Details</h3>")
|
details_html.append(f"<h3>{metric} Details</h3>")
|
||||||
details_html.append("""
|
details_html.append(f"""
|
||||||
<table class="metric-table">
|
<table class="metric-table">
|
||||||
<tr>
|
<tr>
|
||||||
<th>Question</th>
|
{"".join(f"<th>{col}</th>" for col in formatted_column_names)}
|
||||||
<th>Answer</th>
|
|
||||||
<th>Golden Answer</th>
|
|
||||||
<th>Reason</th>
|
|
||||||
<th>Score</th>
|
|
||||||
</tr>
|
</tr>
|
||||||
""")
|
""")
|
||||||
for item in details:
|
for item in details:
|
||||||
details_html.append(
|
details_html.append(f"""
|
||||||
f"<tr>"
|
<tr>
|
||||||
f"<td>{item['question']}</td>"
|
{"".join(f"<td>{value}</td>" for value in item.values())}
|
||||||
f"<td>{item['answer']}</td>"
|
</tr>
|
||||||
f"<td>{item['golden_answer']}</td>"
|
""")
|
||||||
f"<td>{item['reason']}</td>"
|
|
||||||
f"<td>{item['score']}</td>"
|
|
||||||
f"</tr>"
|
|
||||||
)
|
|
||||||
details_html.append("</table>")
|
details_html.append("</table>")
|
||||||
return details_html
|
return details_html
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -13,15 +13,17 @@ class CompletionRetriever(BaseRetriever):
|
||||||
self,
|
self,
|
||||||
user_prompt_path: str = "context_for_question.txt",
|
user_prompt_path: str = "context_for_question.txt",
|
||||||
system_prompt_path: str = "answer_simple_question.txt",
|
system_prompt_path: str = "answer_simple_question.txt",
|
||||||
|
top_k: Optional[int] = 1,
|
||||||
):
|
):
|
||||||
"""Initialize retriever with optional custom prompt paths."""
|
"""Initialize retriever with optional custom prompt paths."""
|
||||||
self.user_prompt_path = user_prompt_path
|
self.user_prompt_path = user_prompt_path
|
||||||
self.system_prompt_path = system_prompt_path
|
self.system_prompt_path = system_prompt_path
|
||||||
|
self.top_k = top_k if top_k is not None else 1
|
||||||
|
|
||||||
async def get_context(self, query: str) -> Any:
|
async def get_context(self, query: str) -> Any:
|
||||||
"""Retrieves relevant document chunks as context."""
|
"""Retrieves relevant document chunks as context."""
|
||||||
vector_engine = get_vector_engine()
|
vector_engine = get_vector_engine()
|
||||||
found_chunks = await vector_engine.search("DocumentChunk_text", query, limit=1)
|
found_chunks = await vector_engine.search("DocumentChunk_text", query, limit=self.top_k)
|
||||||
if len(found_chunks) == 0:
|
if len(found_chunks) == 0:
|
||||||
raise NoRelevantDataFound
|
raise NoRelevantDataFound
|
||||||
return found_chunks[0].payload["text"]
|
return found_chunks[0].payload["text"]
|
||||||
|
|
|
||||||
|
|
@ -15,12 +15,12 @@ class GraphCompletionRetriever(BaseRetriever):
|
||||||
self,
|
self,
|
||||||
user_prompt_path: str = "graph_context_for_question.txt",
|
user_prompt_path: str = "graph_context_for_question.txt",
|
||||||
system_prompt_path: str = "answer_simple_question.txt",
|
system_prompt_path: str = "answer_simple_question.txt",
|
||||||
top_k: int = 5,
|
top_k: Optional[int] = 5,
|
||||||
):
|
):
|
||||||
"""Initialize retriever with prompt paths and search parameters."""
|
"""Initialize retriever with prompt paths and search parameters."""
|
||||||
self.user_prompt_path = user_prompt_path
|
self.user_prompt_path = user_prompt_path
|
||||||
self.system_prompt_path = system_prompt_path
|
self.system_prompt_path = system_prompt_path
|
||||||
self.top_k = top_k
|
self.top_k = top_k if top_k is not None else 5
|
||||||
|
|
||||||
async def resolve_edges_to_text(self, retrieved_edges: list) -> str:
|
async def resolve_edges_to_text(self, retrieved_edges: list) -> str:
|
||||||
"""Converts retrieved graph edges into a human-readable string format."""
|
"""Converts retrieved graph edges into a human-readable string format."""
|
||||||
|
|
|
||||||
|
|
@ -12,7 +12,7 @@ class GraphSummaryCompletionRetriever(GraphCompletionRetriever):
|
||||||
user_prompt_path: str = "graph_context_for_question.txt",
|
user_prompt_path: str = "graph_context_for_question.txt",
|
||||||
system_prompt_path: str = "answer_simple_question.txt",
|
system_prompt_path: str = "answer_simple_question.txt",
|
||||||
summarize_prompt_path: str = "summarize_search_results.txt",
|
summarize_prompt_path: str = "summarize_search_results.txt",
|
||||||
top_k: int = 5,
|
top_k: Optional[int] = 5,
|
||||||
):
|
):
|
||||||
"""Initialize retriever with default prompt paths and search parameters."""
|
"""Initialize retriever with default prompt paths and search parameters."""
|
||||||
super().__init__(
|
super().__init__(
|
||||||
|
|
|
||||||
|
|
@ -5,7 +5,12 @@ import sys
|
||||||
|
|
||||||
with patch.dict(
|
with patch.dict(
|
||||||
sys.modules,
|
sys.modules,
|
||||||
{"deepeval": MagicMock(), "deepeval.metrics": MagicMock(), "deepeval.test_case": MagicMock()},
|
{
|
||||||
|
"deepeval": MagicMock(),
|
||||||
|
"deepeval.metrics": MagicMock(),
|
||||||
|
"deepeval.test_case": MagicMock(),
|
||||||
|
"cognee.eval_framework.evaluation.metrics.context_coverage": MagicMock(),
|
||||||
|
},
|
||||||
):
|
):
|
||||||
from cognee.eval_framework.evaluation.deep_eval_adapter import DeepEvalAdapter
|
from cognee.eval_framework.evaluation.deep_eval_adapter import DeepEvalAdapter
|
||||||
|
|
||||||
|
|
|
||||||
206
examples/python/pokemon_datapoints_example.py
Normal file
206
examples/python/pokemon_datapoints_example.py
Normal file
|
|
@ -0,0 +1,206 @@
|
||||||
|
# Standard library imports
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
import asyncio
|
||||||
|
import pathlib
|
||||||
|
from uuid import uuid5, NAMESPACE_OID
|
||||||
|
from typing import List, Optional
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import dlt
|
||||||
|
import requests
|
||||||
|
import cognee
|
||||||
|
from cognee.low_level import DataPoint, setup as cognee_setup
|
||||||
|
from cognee.api.v1.search import SearchType
|
||||||
|
from cognee.tasks.storage import add_data_points
|
||||||
|
from cognee.modules.pipelines.tasks.Task import Task
|
||||||
|
from cognee.modules.pipelines import run_tasks
|
||||||
|
|
||||||
|
|
||||||
|
BASE_URL = "https://pokeapi.co/api/v2/"
|
||||||
|
os.environ["BUCKET_URL"] = "./.data_storage"
|
||||||
|
os.environ["DATA_WRITER__DISABLE_COMPRESSION"] = "true"
|
||||||
|
|
||||||
|
|
||||||
|
# Data Models
|
||||||
|
class Abilities(DataPoint):
|
||||||
|
name: str = "Abilities"
|
||||||
|
metadata: dict = {"index_fields": ["name"]}
|
||||||
|
|
||||||
|
|
||||||
|
class PokemonAbility(DataPoint):
|
||||||
|
name: str
|
||||||
|
ability__name: str
|
||||||
|
ability__url: str
|
||||||
|
is_hidden: bool
|
||||||
|
slot: int
|
||||||
|
_dlt_load_id: str
|
||||||
|
_dlt_id: str
|
||||||
|
_dlt_parent_id: str
|
||||||
|
_dlt_list_idx: str
|
||||||
|
is_type: Abilities
|
||||||
|
metadata: dict = {"index_fields": ["ability__name"]}
|
||||||
|
|
||||||
|
|
||||||
|
class Pokemons(DataPoint):
|
||||||
|
name: str = "Pokemons"
|
||||||
|
have: Abilities
|
||||||
|
metadata: dict = {"index_fields": ["name"]}
|
||||||
|
|
||||||
|
|
||||||
|
class Pokemon(DataPoint):
|
||||||
|
name: str
|
||||||
|
base_experience: int
|
||||||
|
height: int
|
||||||
|
weight: int
|
||||||
|
is_default: bool
|
||||||
|
order: int
|
||||||
|
location_area_encounters: str
|
||||||
|
species__name: str
|
||||||
|
species__url: str
|
||||||
|
cries__latest: str
|
||||||
|
cries__legacy: str
|
||||||
|
sprites__front_default: str
|
||||||
|
sprites__front_shiny: str
|
||||||
|
sprites__back_default: Optional[str]
|
||||||
|
sprites__back_shiny: Optional[str]
|
||||||
|
_dlt_load_id: str
|
||||||
|
_dlt_id: str
|
||||||
|
is_type: Pokemons
|
||||||
|
abilities: List[PokemonAbility]
|
||||||
|
metadata: dict = {"index_fields": ["name"]}
|
||||||
|
|
||||||
|
|
||||||
|
# Data Collection Functions
|
||||||
|
@dlt.resource(write_disposition="replace")
|
||||||
|
def pokemon_list(limit: int = 50):
|
||||||
|
response = requests.get(f"{BASE_URL}pokemon", params={"limit": limit})
|
||||||
|
response.raise_for_status()
|
||||||
|
yield response.json()["results"]
|
||||||
|
|
||||||
|
|
||||||
|
@dlt.transformer(data_from=pokemon_list)
|
||||||
|
def pokemon_details(pokemons):
|
||||||
|
"""Fetches detailed info for each Pokémon"""
|
||||||
|
for pokemon in pokemons:
|
||||||
|
response = requests.get(pokemon["url"])
|
||||||
|
response.raise_for_status()
|
||||||
|
yield response.json()
|
||||||
|
|
||||||
|
|
||||||
|
# Data Loading Functions
|
||||||
|
def load_abilities_data(jsonl_abilities):
|
||||||
|
abilities_root = Abilities()
|
||||||
|
pokemon_abilities = []
|
||||||
|
|
||||||
|
for jsonl_ability in jsonl_abilities:
|
||||||
|
with open(jsonl_ability, "r") as f:
|
||||||
|
for line in f:
|
||||||
|
ability = json.loads(line)
|
||||||
|
ability["id"] = uuid5(NAMESPACE_OID, ability["_dlt_id"])
|
||||||
|
ability["name"] = ability["ability__name"]
|
||||||
|
ability["is_type"] = abilities_root
|
||||||
|
pokemon_abilities.append(ability)
|
||||||
|
|
||||||
|
return abilities_root, pokemon_abilities
|
||||||
|
|
||||||
|
|
||||||
|
def load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root):
|
||||||
|
pokemons = []
|
||||||
|
|
||||||
|
for jsonl_pokemon in jsonl_pokemons:
|
||||||
|
with open(jsonl_pokemon, "r") as f:
|
||||||
|
for line in f:
|
||||||
|
pokemon_data = json.loads(line)
|
||||||
|
abilities = [
|
||||||
|
ability
|
||||||
|
for ability in pokemon_abilities
|
||||||
|
if ability["_dlt_parent_id"] == pokemon_data["_dlt_id"]
|
||||||
|
]
|
||||||
|
pokemon_data["external_id"] = pokemon_data["id"]
|
||||||
|
pokemon_data["id"] = uuid5(NAMESPACE_OID, str(pokemon_data["id"]))
|
||||||
|
pokemon_data["abilities"] = [PokemonAbility(**ability) for ability in abilities]
|
||||||
|
pokemon_data["is_type"] = pokemon_root
|
||||||
|
pokemons.append(Pokemon(**pokemon_data))
|
||||||
|
|
||||||
|
return pokemons
|
||||||
|
|
||||||
|
|
||||||
|
# Main Application Logic
|
||||||
|
async def setup_and_process_data():
|
||||||
|
"""Setup configuration and process Pokemon data"""
|
||||||
|
# Setup configuration
|
||||||
|
data_directory_path = str(
|
||||||
|
pathlib.Path(os.path.join(pathlib.Path(__file__).parent, ".data_storage")).resolve()
|
||||||
|
)
|
||||||
|
cognee_directory_path = str(
|
||||||
|
pathlib.Path(os.path.join(pathlib.Path(__file__).parent, ".cognee_system")).resolve()
|
||||||
|
)
|
||||||
|
|
||||||
|
cognee.config.data_root_directory(data_directory_path)
|
||||||
|
cognee.config.system_root_directory(cognee_directory_path)
|
||||||
|
|
||||||
|
# Initialize pipeline and collect data
|
||||||
|
pipeline = dlt.pipeline(
|
||||||
|
pipeline_name="pokemon_pipeline",
|
||||||
|
destination="filesystem",
|
||||||
|
dataset_name="pokemon_data",
|
||||||
|
)
|
||||||
|
info = pipeline.run([pokemon_list, pokemon_details])
|
||||||
|
print(info)
|
||||||
|
|
||||||
|
# Load and process data
|
||||||
|
STORAGE_PATH = Path(".data_storage/pokemon_data/pokemon_details")
|
||||||
|
jsonl_pokemons = sorted(STORAGE_PATH.glob("*.jsonl"))
|
||||||
|
if not jsonl_pokemons:
|
||||||
|
raise FileNotFoundError("No JSONL files found in the storage directory.")
|
||||||
|
|
||||||
|
ABILITIES_PATH = Path(".data_storage/pokemon_data/pokemon_details__abilities")
|
||||||
|
jsonl_abilities = sorted(ABILITIES_PATH.glob("*.jsonl"))
|
||||||
|
if not jsonl_abilities:
|
||||||
|
raise FileNotFoundError("No JSONL files found in the storage directory.")
|
||||||
|
|
||||||
|
# Process data
|
||||||
|
abilities_root, pokemon_abilities = load_abilities_data(jsonl_abilities)
|
||||||
|
pokemon_root = Pokemons(have=abilities_root)
|
||||||
|
pokemons = load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root)
|
||||||
|
|
||||||
|
return pokemons
|
||||||
|
|
||||||
|
|
||||||
|
async def pokemon_cognify(pokemons):
|
||||||
|
"""Process Pokemon data with Cognee and perform search"""
|
||||||
|
# Setup and run Cognee tasks
|
||||||
|
await cognee.prune.prune_data()
|
||||||
|
await cognee.prune.prune_system(metadata=True)
|
||||||
|
await cognee_setup()
|
||||||
|
|
||||||
|
tasks = [Task(add_data_points, task_config={"batch_size": 50})]
|
||||||
|
results = run_tasks(
|
||||||
|
tasks=tasks,
|
||||||
|
data=pokemons,
|
||||||
|
dataset_id=uuid5(NAMESPACE_OID, "Pokemon"),
|
||||||
|
pipeline_name="pokemon_pipeline",
|
||||||
|
)
|
||||||
|
|
||||||
|
async for result in results:
|
||||||
|
print(result)
|
||||||
|
print("Done")
|
||||||
|
|
||||||
|
# Perform search
|
||||||
|
search_results = await cognee.search(
|
||||||
|
query_type=SearchType.GRAPH_COMPLETION, query_text="pokemons?"
|
||||||
|
)
|
||||||
|
|
||||||
|
print("Search results:")
|
||||||
|
for result_text in search_results:
|
||||||
|
print(result_text)
|
||||||
|
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
pokemons = await setup_and_process_data()
|
||||||
|
await pokemon_cognify(pokemons)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
24
helm/Chart.yaml
Normal file
24
helm/Chart.yaml
Normal file
|
|
@ -0,0 +1,24 @@
|
||||||
|
apiVersion: v2
|
||||||
|
name: cognee-chart
|
||||||
|
description: A helm chart of the cognee backend deployment on Kubernetes environment
|
||||||
|
|
||||||
|
# A chart can be either an 'application' or a 'library' chart.
|
||||||
|
#
|
||||||
|
# Application charts are a collection of templates that can be packaged into versioned archives
|
||||||
|
# to be deployed.
|
||||||
|
#
|
||||||
|
# Library charts provide useful utilities or functions for the chart developer. They're included as
|
||||||
|
# a dependency of application charts to inject those utilities and functions into the rendering
|
||||||
|
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
|
||||||
|
type: application
|
||||||
|
|
||||||
|
# This is the chart version. This version number should be incremented each time you make changes
|
||||||
|
# to the chart and its templates, including the app version.
|
||||||
|
# Versions are expected to follow Semantic Versioning (https://semver.org/)
|
||||||
|
version: 0.1.0
|
||||||
|
|
||||||
|
# This is the version number of the application being deployed. This version number should be
|
||||||
|
# incremented each time you make changes to the application. Versions are not expected to
|
||||||
|
# follow Semantic Versioning. They should reflect the version the application is using.
|
||||||
|
# It is recommended to use it with quotes.
|
||||||
|
appVersion: "1.16.0"
|
||||||
59
helm/Dockerfile
Normal file
59
helm/Dockerfile
Normal file
|
|
@ -0,0 +1,59 @@
|
||||||
|
FROM python:3.11-slim
|
||||||
|
|
||||||
|
# Define Poetry extras to install
|
||||||
|
ARG POETRY_EXTRAS="\
|
||||||
|
# Storage & Databases \
|
||||||
|
filesystem postgres weaviate qdrant neo4j falkordb milvus \
|
||||||
|
# Notebooks & Interactive Environments \
|
||||||
|
notebook \
|
||||||
|
# LLM & AI Frameworks \
|
||||||
|
langchain llama-index gemini huggingface ollama mistral groq \
|
||||||
|
# Evaluation & Monitoring \
|
||||||
|
deepeval evals posthog \
|
||||||
|
# Graph Processing & Code Analysis \
|
||||||
|
codegraph graphiti \
|
||||||
|
# Document Processing \
|
||||||
|
docs"
|
||||||
|
|
||||||
|
# Set build argument
|
||||||
|
ARG DEBUG
|
||||||
|
|
||||||
|
# Set environment variable based on the build argument
|
||||||
|
ENV DEBUG=${DEBUG}
|
||||||
|
ENV PIP_NO_CACHE_DIR=true
|
||||||
|
ENV PATH="${PATH}:/root/.poetry/bin"
|
||||||
|
|
||||||
|
|
||||||
|
RUN apt-get install -y \
|
||||||
|
gcc \
|
||||||
|
libpq-dev
|
||||||
|
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
COPY pyproject.toml poetry.lock /app/
|
||||||
|
|
||||||
|
|
||||||
|
RUN pip install poetry
|
||||||
|
|
||||||
|
# Don't create virtualenv since docker is already isolated
|
||||||
|
RUN poetry config virtualenvs.create false
|
||||||
|
|
||||||
|
# Install the dependencies
|
||||||
|
RUN poetry install --extras "${POETRY_EXTRAS}" --no-root --without dev
|
||||||
|
|
||||||
|
|
||||||
|
# Set the PYTHONPATH environment variable to include the /app directory
|
||||||
|
ENV PYTHONPATH=/app
|
||||||
|
|
||||||
|
COPY cognee/ /app/cognee
|
||||||
|
|
||||||
|
# Copy Alembic configuration
|
||||||
|
COPY alembic.ini /app/alembic.ini
|
||||||
|
COPY alembic/ /app/alembic
|
||||||
|
|
||||||
|
COPY entrypoint.sh /app/entrypoint.sh
|
||||||
|
RUN chmod +x /app/entrypoint.sh
|
||||||
|
|
||||||
|
RUN sed -i 's/\r$//' /app/entrypoint.sh
|
||||||
|
|
||||||
|
ENTRYPOINT ["/app/entrypoint.sh"]
|
||||||
25
helm/README.md
Normal file
25
helm/README.md
Normal file
|
|
@ -0,0 +1,25 @@
|
||||||
|
|
||||||
|
# cognee-infra-helm
|
||||||
|
General infrastructure setup for Cognee on Kubernetes using a Helm chart.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
Before deploying the Helm chart, ensure the following prerequisites are met:
|
||||||
|
|
||||||
|
**Kubernetes Cluster**: A running Kubernetes cluster (e.g., Minikube, GKE, EKS).
|
||||||
|
|
||||||
|
**Helm**: Installed and configured for your Kubernetes cluster. You can install Helm by following the [official guide](https://helm.sh/docs/intro/install/).
|
||||||
|
|
||||||
|
**kubectl**: Installed and configured to interact with your cluster. Follow the instructions [here](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
|
||||||
|
|
||||||
|
Clone the Repository Clone this repository to your local machine and navigate to the directory.
|
||||||
|
|
||||||
|
## Deploy Helm Chart:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
helm install cognee ./cognee-chart
|
||||||
|
```
|
||||||
|
|
||||||
|
**Uninstall Helm Release**:
|
||||||
|
```bash
|
||||||
|
helm uninstall cognee
|
||||||
|
```
|
||||||
46
helm/docker-compose-helm.yml
Normal file
46
helm/docker-compose-helm.yml
Normal file
|
|
@ -0,0 +1,46 @@
|
||||||
|
services:
|
||||||
|
cognee:
|
||||||
|
image : cognee-backend:latest
|
||||||
|
container_name: cognee-backend
|
||||||
|
networks:
|
||||||
|
- cognee-network
|
||||||
|
build:
|
||||||
|
context: .
|
||||||
|
dockerfile: Dockerfile
|
||||||
|
volumes:
|
||||||
|
- .:/app
|
||||||
|
- /app/cognee-frontend/ # Ignore frontend code
|
||||||
|
environment:
|
||||||
|
- HOST=0.0.0.0
|
||||||
|
- ENVIRONMENT=local
|
||||||
|
- PYTHONPATH=.
|
||||||
|
ports:
|
||||||
|
- 8000:8000
|
||||||
|
# - 5678:5678 # Debugging
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
cpus: '4.0'
|
||||||
|
memory: 8GB
|
||||||
|
|
||||||
|
postgres:
|
||||||
|
image: pgvector/pgvector:pg17
|
||||||
|
container_name: postgres
|
||||||
|
environment:
|
||||||
|
POSTGRES_USER: cognee
|
||||||
|
POSTGRES_PASSWORD: cognee
|
||||||
|
POSTGRES_DB: cognee_db
|
||||||
|
volumes:
|
||||||
|
- postgres_data:/var/lib/postgresql/data
|
||||||
|
ports:
|
||||||
|
- 5432:5432
|
||||||
|
networks:
|
||||||
|
- cognee-network
|
||||||
|
|
||||||
|
networks:
|
||||||
|
cognee-network:
|
||||||
|
name: cognee-network
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
postgres_data:
|
||||||
|
|
||||||
32
helm/templates/cognee_deployment.yaml
Normal file
32
helm/templates/cognee_deployment.yaml
Normal file
|
|
@ -0,0 +1,32 @@
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: {{ .Release.Name }}-cognee
|
||||||
|
labels:
|
||||||
|
app: {{ .Release.Name }}-cognee
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: {{ .Release.Name }}-cognee
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: {{ .Release.Name }}-cognee
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: cognee
|
||||||
|
image: {{ .Values.cognee.image }}
|
||||||
|
ports:
|
||||||
|
- containerPort: {{ .Values.cognee.port }}
|
||||||
|
env:
|
||||||
|
- name: HOST
|
||||||
|
value: {{ .Values.cognee.env.HOST }}
|
||||||
|
- name: ENVIRONMENT
|
||||||
|
value: {{ .Values.cognee.env.ENVIRONMENT }}
|
||||||
|
- name: PYTHONPATH
|
||||||
|
value: {{ .Values.cognee.env.PYTHONPATH }}
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
cpu: {{ .Values.cognee.resources.cpu }}
|
||||||
|
memory: {{ .Values.cognee.resources.memory }}
|
||||||
13
helm/templates/cognee_service.yaml
Normal file
13
helm/templates/cognee_service.yaml
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: {{ .Release.Name }}-cognee
|
||||||
|
labels:
|
||||||
|
app: {{ .Release.Name }}-cognee
|
||||||
|
spec:
|
||||||
|
type: NodePort
|
||||||
|
ports:
|
||||||
|
- port: {{ .Values.cognee.port }}
|
||||||
|
targetPort: {{ .Values.cognee.port }}
|
||||||
|
selector:
|
||||||
|
app: {{ .Release.Name }}-cognee
|
||||||
35
helm/templates/postgres_deployment.yaml
Normal file
35
helm/templates/postgres_deployment.yaml
Normal file
|
|
@ -0,0 +1,35 @@
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: {{ .Release.Name }}-postgres
|
||||||
|
labels:
|
||||||
|
app: {{ .Release.Name }}-postgres
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: {{ .Release.Name }}-postgres
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: {{ .Release.Name }}-postgres
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: postgres
|
||||||
|
image: {{ .Values.postgres.image }}
|
||||||
|
ports:
|
||||||
|
- containerPort: {{ .Values.postgres.port }}
|
||||||
|
env:
|
||||||
|
- name: POSTGRES_USER
|
||||||
|
value: {{ .Values.postgres.env.POSTGRES_USER }}
|
||||||
|
- name: POSTGRES_PASSWORD
|
||||||
|
value: {{ .Values.postgres.env.POSTGRES_PASSWORD }}
|
||||||
|
- name: POSTGRES_DB
|
||||||
|
value: {{ .Values.postgres.env.POSTGRES_DB }}
|
||||||
|
volumeMounts:
|
||||||
|
- name: postgres-storage
|
||||||
|
mountPath: /var/lib/postgresql/data
|
||||||
|
volumes:
|
||||||
|
- name: postgres-storage
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: {{ .Release.Name }}-postgres-pvc
|
||||||
10
helm/templates/postgres_pvc.yaml
Normal file
10
helm/templates/postgres_pvc.yaml
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: {{ .Release.Name }}-postgres-pvc
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: {{ .Values.postgres.storage }}
|
||||||
14
helm/templates/postgres_service.yaml
Normal file
14
helm/templates/postgres_service.yaml
Normal file
|
|
@ -0,0 +1,14 @@
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: {{ .Release.Name }}-postgres
|
||||||
|
labels:
|
||||||
|
app: {{ .Release.Name }}-postgres
|
||||||
|
spec:
|
||||||
|
type: ClusterIP
|
||||||
|
ports:
|
||||||
|
- port: {{ .Values.postgres.port }}
|
||||||
|
targetPort: {{ .Values.postgres.port }}
|
||||||
|
selector:
|
||||||
|
app: {{ .Release.Name }}-postgres
|
||||||
|
|
||||||
22
helm/values.yaml
Normal file
22
helm/values.yaml
Normal file
|
|
@ -0,0 +1,22 @@
|
||||||
|
# Configuration for the 'cognee' application service
|
||||||
|
cognee:
|
||||||
|
# Image name (using the local image we’ll build in Minikube)
|
||||||
|
image: "hajdul1988/cognee-backend:latest"
|
||||||
|
port: 8000
|
||||||
|
env:
|
||||||
|
HOST: "0.0.0.0"
|
||||||
|
ENVIRONMENT: "local"
|
||||||
|
PYTHONPATH: "."
|
||||||
|
resources:
|
||||||
|
cpu: "4.0"
|
||||||
|
memory: "8Gi"
|
||||||
|
|
||||||
|
# Configuration for the 'postgres' database service
|
||||||
|
postgres:
|
||||||
|
image: "pgvector/pgvector:pg17"
|
||||||
|
port: 5432
|
||||||
|
env:
|
||||||
|
POSTGRES_USER: "cognee"
|
||||||
|
POSTGRES_PASSWORD: "cognee"
|
||||||
|
POSTGRES_DB: "cognee_db"
|
||||||
|
storage: "8Gi"
|
||||||
536
notebooks/pokemon_datapoints_notebook.ipynb
Normal file
536
notebooks/pokemon_datapoints_notebook.ipynb
Normal file
|
|
@ -0,0 +1,536 @@
|
||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:58:00.193158Z",
|
||||||
|
"start_time": "2025-03-04T11:58:00.190238Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import nest_asyncio\n",
|
||||||
|
"nest_asyncio.apply()"
|
||||||
|
],
|
||||||
|
"id": "2efba278d106bb5f",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"### Environment Configuration\n",
|
||||||
|
"#### Setup required directories and environment variables.\n"
|
||||||
|
],
|
||||||
|
"id": "ccbb2bc23aa456ee"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:59:33.879188Z",
|
||||||
|
"start_time": "2025-03-04T11:59:33.873682Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import pathlib\n",
|
||||||
|
"import os\n",
|
||||||
|
"import cognee\n",
|
||||||
|
"\n",
|
||||||
|
"notebook_dir = pathlib.Path().resolve()\n",
|
||||||
|
"data_directory_path = str(notebook_dir / \".data_storage\")\n",
|
||||||
|
"cognee_directory_path = str(notebook_dir / \".cognee_system\")\n",
|
||||||
|
"\n",
|
||||||
|
"cognee.config.data_root_directory(data_directory_path)\n",
|
||||||
|
"cognee.config.system_root_directory(cognee_directory_path)\n",
|
||||||
|
"\n",
|
||||||
|
"BASE_URL = \"https://pokeapi.co/api/v2/\"\n",
|
||||||
|
"os.environ[\"BUCKET_URL\"] = data_directory_path\n",
|
||||||
|
"os.environ[\"DATA_WRITER__DISABLE_COMPRESSION\"] = \"true\"\n"
|
||||||
|
],
|
||||||
|
"id": "662d554f96f211d9",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": 8
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Initialize DLT Pipeline\n",
|
||||||
|
"### Create the DLT pipeline to fetch Pokémon data.\n"
|
||||||
|
],
|
||||||
|
"id": "36ae0be71f6e9167"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:58:03.982939Z",
|
||||||
|
"start_time": "2025-03-04T11:58:03.819676Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import dlt\n",
|
||||||
|
"from pathlib import Path\n",
|
||||||
|
"\n",
|
||||||
|
"pipeline = dlt.pipeline(\n",
|
||||||
|
" pipeline_name=\"pokemon_pipeline\",\n",
|
||||||
|
" destination=\"filesystem\",\n",
|
||||||
|
" dataset_name=\"pokemon_data\",\n",
|
||||||
|
")\n"
|
||||||
|
],
|
||||||
|
"id": "25101ae5f016ce0c",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": 4
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Fetch Pokémon List\n",
|
||||||
|
"### Retrieve a list of Pokémon from the API.\n"
|
||||||
|
],
|
||||||
|
"id": "9a87ce05a072c48b"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:58:03.990076Z",
|
||||||
|
"start_time": "2025-03-04T11:58:03.987199Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"@dlt.resource(write_disposition=\"replace\")\n",
|
||||||
|
"def pokemon_list(limit: int = 50):\n",
|
||||||
|
" import requests\n",
|
||||||
|
" response = requests.get(f\"{BASE_URL}pokemon\", params={\"limit\": limit})\n",
|
||||||
|
" response.raise_for_status()\n",
|
||||||
|
" yield response.json()[\"results\"]\n"
|
||||||
|
],
|
||||||
|
"id": "3b6e60778c61e24a",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": 5
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Fetch Pokémon Details\n",
|
||||||
|
"### Fetch detailed information about each Pokémon.\n"
|
||||||
|
],
|
||||||
|
"id": "9952767846194e97"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:58:03.996394Z",
|
||||||
|
"start_time": "2025-03-04T11:58:03.994122Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"@dlt.transformer(data_from=pokemon_list)\n",
|
||||||
|
"def pokemon_details(pokemons):\n",
|
||||||
|
" \"\"\"Fetches detailed info for each Pokémon\"\"\"\n",
|
||||||
|
" import requests\n",
|
||||||
|
" for pokemon in pokemons:\n",
|
||||||
|
" response = requests.get(pokemon[\"url\"])\n",
|
||||||
|
" response.raise_for_status()\n",
|
||||||
|
" yield response.json()\n"
|
||||||
|
],
|
||||||
|
"id": "79ec9fef12267485",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": 6
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Run Data Pipeline\n",
|
||||||
|
"### Execute the pipeline and store Pokémon data.\n"
|
||||||
|
],
|
||||||
|
"id": "41e05f660bf9e9d2"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:59:41.571015Z",
|
||||||
|
"start_time": "2025-03-04T11:59:36.840744Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"info = pipeline.run([pokemon_list, pokemon_details])\n",
|
||||||
|
"print(info)\n"
|
||||||
|
],
|
||||||
|
"id": "20a3b2c7f404677f",
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Pipeline pokemon_pipeline load step completed in 0.06 seconds\n",
|
||||||
|
"1 load package(s) were loaded to destination filesystem and into dataset pokemon_data\n",
|
||||||
|
"The filesystem destination used file:///Users/lazar/PycharmProjects/cognee/.data_storage location to store data\n",
|
||||||
|
"Load package 1741089576.860229 is LOADED and contains no failed jobs\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"execution_count": 9
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Load Pokémon Abilities\n",
|
||||||
|
"### Load Pokémon ability data from stored files.\n"
|
||||||
|
],
|
||||||
|
"id": "937f10b8d1037743"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:59:44.377719Z",
|
||||||
|
"start_time": "2025-03-04T11:59:44.363718Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import json\n",
|
||||||
|
"from cognee.low_level import DataPoint\n",
|
||||||
|
"from uuid import uuid5, NAMESPACE_OID\n",
|
||||||
|
"\n",
|
||||||
|
"class Abilities(DataPoint):\n",
|
||||||
|
" name: str = \"Abilities\"\n",
|
||||||
|
" metadata: dict = {\"index_fields\": [\"name\"]}\n",
|
||||||
|
"\n",
|
||||||
|
"def load_abilities_data(jsonl_abilities):\n",
|
||||||
|
" abilities_root = Abilities()\n",
|
||||||
|
" pokemon_abilities = []\n",
|
||||||
|
"\n",
|
||||||
|
" for jsonl_ability in jsonl_abilities:\n",
|
||||||
|
" with open(jsonl_ability, \"r\") as f:\n",
|
||||||
|
" for line in f:\n",
|
||||||
|
" ability = json.loads(line)\n",
|
||||||
|
" ability[\"id\"] = uuid5(NAMESPACE_OID, ability[\"_dlt_id\"])\n",
|
||||||
|
" ability[\"name\"] = ability[\"ability__name\"]\n",
|
||||||
|
" ability[\"is_type\"] = abilities_root\n",
|
||||||
|
" pokemon_abilities.append(ability)\n",
|
||||||
|
"\n",
|
||||||
|
" return abilities_root, pokemon_abilities\n"
|
||||||
|
],
|
||||||
|
"id": "be73050036439ea1",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": 10
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Load Pokémon Data\n",
|
||||||
|
"### Load Pokémon details and associate them with abilities.\n"
|
||||||
|
],
|
||||||
|
"id": "98c97f799f73df77"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:59:46.251306Z",
|
||||||
|
"start_time": "2025-03-04T11:59:46.238283Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"from typing import List, Optional\n",
|
||||||
|
"\n",
|
||||||
|
"class Pokemons(DataPoint):\n",
|
||||||
|
" name: str = \"Pokemons\"\n",
|
||||||
|
" have: Abilities\n",
|
||||||
|
" metadata: dict = {\"index_fields\": [\"name\"]}\n",
|
||||||
|
"\n",
|
||||||
|
"class PokemonAbility(DataPoint):\n",
|
||||||
|
" name: str\n",
|
||||||
|
" ability__name: str\n",
|
||||||
|
" ability__url: str\n",
|
||||||
|
" is_hidden: bool\n",
|
||||||
|
" slot: int\n",
|
||||||
|
" _dlt_load_id: str\n",
|
||||||
|
" _dlt_id: str\n",
|
||||||
|
" _dlt_parent_id: str\n",
|
||||||
|
" _dlt_list_idx: str\n",
|
||||||
|
" is_type: Abilities\n",
|
||||||
|
" metadata: dict = {\"index_fields\": [\"ability__name\"]}\n",
|
||||||
|
"\n",
|
||||||
|
"class Pokemon(DataPoint):\n",
|
||||||
|
" name: str\n",
|
||||||
|
" base_experience: int\n",
|
||||||
|
" height: int\n",
|
||||||
|
" weight: int\n",
|
||||||
|
" is_default: bool\n",
|
||||||
|
" order: int\n",
|
||||||
|
" location_area_encounters: str\n",
|
||||||
|
" species__name: str\n",
|
||||||
|
" species__url: str\n",
|
||||||
|
" cries__latest: str\n",
|
||||||
|
" cries__legacy: str\n",
|
||||||
|
" sprites__front_default: str\n",
|
||||||
|
" sprites__front_shiny: str\n",
|
||||||
|
" sprites__back_default: Optional[str]\n",
|
||||||
|
" sprites__back_shiny: Optional[str]\n",
|
||||||
|
" _dlt_load_id: str\n",
|
||||||
|
" _dlt_id: str\n",
|
||||||
|
" is_type: Pokemons\n",
|
||||||
|
" abilities: List[PokemonAbility]\n",
|
||||||
|
" metadata: dict = {\"index_fields\": [\"name\"]}\n",
|
||||||
|
"\n",
|
||||||
|
"def load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root):\n",
|
||||||
|
" pokemons = []\n",
|
||||||
|
"\n",
|
||||||
|
" for jsonl_pokemon in jsonl_pokemons:\n",
|
||||||
|
" with open(jsonl_pokemon, \"r\") as f:\n",
|
||||||
|
" for line in f:\n",
|
||||||
|
" pokemon_data = json.loads(line)\n",
|
||||||
|
" abilities = [\n",
|
||||||
|
" ability for ability in pokemon_abilities\n",
|
||||||
|
" if ability[\"_dlt_parent_id\"] == pokemon_data[\"_dlt_id\"]\n",
|
||||||
|
" ]\n",
|
||||||
|
" pokemon_data[\"external_id\"] = pokemon_data[\"id\"]\n",
|
||||||
|
" pokemon_data[\"id\"] = uuid5(NAMESPACE_OID, str(pokemon_data[\"id\"]))\n",
|
||||||
|
" pokemon_data[\"abilities\"] = [PokemonAbility(**ability) for ability in abilities]\n",
|
||||||
|
" pokemon_data[\"is_type\"] = pokemon_root\n",
|
||||||
|
" pokemons.append(Pokemon(**pokemon_data))\n",
|
||||||
|
"\n",
|
||||||
|
" return pokemons\n"
|
||||||
|
],
|
||||||
|
"id": "7862951248df0bf5",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": 11
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Process Pokémon Data\n",
|
||||||
|
"### Load and associate Pokémon abilities.\n"
|
||||||
|
],
|
||||||
|
"id": "676fa5a2b61c2107"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:59:47.365226Z",
|
||||||
|
"start_time": "2025-03-04T11:59:47.356722Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"STORAGE_PATH = Path(\".data_storage/pokemon_data/pokemon_details\")\n",
|
||||||
|
"jsonl_pokemons = sorted(STORAGE_PATH.glob(\"*.jsonl\"))\n",
|
||||||
|
"\n",
|
||||||
|
"ABILITIES_PATH = Path(\".data_storage/pokemon_data/pokemon_details__abilities\")\n",
|
||||||
|
"jsonl_abilities = sorted(ABILITIES_PATH.glob(\"*.jsonl\"))\n",
|
||||||
|
"\n",
|
||||||
|
"abilities_root, pokemon_abilities = load_abilities_data(jsonl_abilities)\n",
|
||||||
|
"pokemon_root = Pokemons(have=abilities_root)\n",
|
||||||
|
"pokemons = load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root)\n"
|
||||||
|
],
|
||||||
|
"id": "ad14cdecdccd71bb",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": 12
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Initialize Cognee\n",
|
||||||
|
"### Setup Cognee for data processing.\n"
|
||||||
|
],
|
||||||
|
"id": "59dec67b2ae50f0f"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:59:49.244577Z",
|
||||||
|
"start_time": "2025-03-04T11:59:48.618261Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"import asyncio\n",
|
||||||
|
"from cognee.low_level import setup as cognee_setup\n",
|
||||||
|
"\n",
|
||||||
|
"async def initialize_cognee():\n",
|
||||||
|
" await cognee.prune.prune_data()\n",
|
||||||
|
" await cognee.prune.prune_system(metadata=True)\n",
|
||||||
|
" await cognee_setup()\n",
|
||||||
|
"\n",
|
||||||
|
"await initialize_cognee()\n"
|
||||||
|
],
|
||||||
|
"id": "d2e095ae576a02c1",
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stderr",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"INFO:cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter:Database deleted successfully.INFO:cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter:Database deleted successfully."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"execution_count": 13
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Process Pokémon Data\n",
|
||||||
|
"### Add Pokémon data points to Cognee.\n"
|
||||||
|
],
|
||||||
|
"id": "5f0b8090bc7b1fe6"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T11:59:57.744035Z",
|
||||||
|
"start_time": "2025-03-04T11:59:50.574033Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"from cognee.modules.pipelines.tasks.Task import Task\n",
|
||||||
|
"from cognee.tasks.storage import add_data_points\n",
|
||||||
|
"from cognee.modules.pipelines import run_tasks\n",
|
||||||
|
"\n",
|
||||||
|
"tasks = [Task(add_data_points, task_config={\"batch_size\": 50})]\n",
|
||||||
|
"results = run_tasks(\n",
|
||||||
|
" tasks=tasks,\n",
|
||||||
|
" data=pokemons,\n",
|
||||||
|
" dataset_id=uuid5(NAMESPACE_OID, \"Pokemon\"),\n",
|
||||||
|
" pipeline_name='pokemon_pipeline',\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"async for result in results:\n",
|
||||||
|
" print(result)\n",
|
||||||
|
"print(\"Done\")\n"
|
||||||
|
],
|
||||||
|
"id": "ffa12fc1f5350d95",
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stderr",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"INFO:run_tasks(tasks: [Task], data):Pipeline run started: `fd2ed59d-b550-5b05-bbe6-7b708fe12483`INFO:run_tasks(tasks: [Task], data):Coroutine task started: `add_data_points`"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x300bb3950>\n",
|
||||||
|
"User d347ea85-e512-4cae-b9d7-496fe1745424 has registered.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "stderr",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"/Users/lazar/PycharmProjects/cognee/cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py:79: SAWarning: This declarative base already contains a class with the same class name and module name as cognee.infrastructure.databases.vector.pgvector.PGVectorAdapter.PGVectorDataPoint, and will be replaced in the string-lookup table.\n",
|
||||||
|
" class PGVectorDataPoint(Base):\n",
|
||||||
|
"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"/Users/lazar/PycharmProjects/cognee/cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py:113: SAWarning: This declarative base already contains a class with the same class name and module name as cognee.infrastructure.databases.vector.pgvector.PGVectorAdapter.PGVectorDataPoint, and will be replaced in the string-lookup table.\n",
|
||||||
|
" class PGVectorDataPoint(Base):\n",
|
||||||
|
"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 8, column: 16, offset: 335} for query: '\\n UNWIND $nodes AS node\\n MERGE (n {id: node.node_id})\\n ON CREATE SET n += node.properties, n.updated_at = timestamp()\\n ON MATCH SET n += node.properties, n.updated_at = timestamp()\\n WITH n, node.node_id AS label\\n CALL apoc.create.addLabels(n, [label]) YIELD node AS labeledNode\\n RETURN ID(labeledNode) AS internal_id, labeledNode.id AS nodeId\\n 'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 1, column: 18, offset: 17} for query: 'MATCH (n) RETURN ID(n) AS id, labels(n) AS labels, properties(n) AS properties'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 16, offset: 43} for query: '\\n MATCH (n)-[r]->(m)\\n RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n 'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 33, offset: 60} for query: '\\n MATCH (n)-[r]->(m)\\n RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n 'INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:run_tasks(tasks: [Task], data):Coroutine task completed: `add_data_points`INFO:run_tasks(tasks: [Task], data):Pipeline run completed: `fd2ed59d-b550-5b05-bbe6-7b708fe12483`"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x30016fd40>\n",
|
||||||
|
"Done\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"execution_count": 14
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Search Pokémon Data\n",
|
||||||
|
"### Execute a search query using Cognee.\n"
|
||||||
|
],
|
||||||
|
"id": "e0d98d9832a2797a"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"ExecuteTime": {
|
||||||
|
"end_time": "2025-03-04T12:00:02.878871Z",
|
||||||
|
"start_time": "2025-03-04T11:59:59.571965Z"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cell_type": "code",
|
||||||
|
"source": [
|
||||||
|
"from cognee.api.v1.search import SearchType\n",
|
||||||
|
"\n",
|
||||||
|
"search_results = await cognee.search(\n",
|
||||||
|
" query_type=SearchType.GRAPH_COMPLETION,\n",
|
||||||
|
" query_text=\"pokemons?\"\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"Search results:\")\n",
|
||||||
|
"for result_text in search_results:\n",
|
||||||
|
" print(result_text)"
|
||||||
|
],
|
||||||
|
"id": "bb2476b6b0c2aff",
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stderr",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 1, column: 18, offset: 17} for query: 'MATCH (n) RETURN ID(n) AS id, labels(n) AS labels, properties(n) AS properties'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 16, offset: 43} for query: '\\n MATCH (n)-[r]->(m)\\n RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n 'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 33, offset: 60} for query: '\\n MATCH (n)-[r]->(m)\\n RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n 'INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\u001B[92m13:00:02 - LiteLLM:INFO\u001B[0m: utils.py:2784 - \n",
|
||||||
|
"LiteLLM completion() model= gpt-4o-mini; provider = openaiINFO:LiteLLM:\n",
|
||||||
|
"LiteLLM completion() model= gpt-4o-mini; provider = openai"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Search results:\n",
|
||||||
|
"The Pokemons mentioned are: golbat, jigglypuff, raichu, vulpix, and pikachu.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"execution_count": 15
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"metadata": {},
|
||||||
|
"cell_type": "code",
|
||||||
|
"outputs": [],
|
||||||
|
"execution_count": null,
|
||||||
|
"source": "",
|
||||||
|
"id": "a4c2d3e9c15b017"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"version": "3.8.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
[tool.poetry]
|
[tool.poetry]
|
||||||
name = "cognee"
|
name = "cognee"
|
||||||
version = "0.1.32"
|
version = "0.1.33"
|
||||||
description = "Cognee - is a library for enriching LLM context with a semantic layer for better understanding and reasoning."
|
description = "Cognee - is a library for enriching LLM context with a semantic layer for better understanding and reasoning."
|
||||||
authors = ["Vasilije Markovic", "Boris Arzentar"]
|
authors = ["Vasilije Markovic", "Boris Arzentar"]
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue