Merge branch 'dev' of github.com:topoteretes/cognee into dev

2025-03-11 10:47:34 -07:00 · 2025-03-11 10:47:34 -07:00 · 816d99309d
commit 816d99309d
parent ce01945e6e 68b337f0b6
32 changed files with 1264 additions and 132 deletions
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -1,97 +1,128 @@
-# 🚀 How to Contribute to **cognee**
+# 🎉 Welcome to **cognee**! 
-Thank you for investing time in contributing to our project! Here's a guide to get you started.
+We're excited that you're interested in contributing to our project! 
 We want to ensure that every user and contributor feels welcome, included and supported to participate in cognee community. 
 This guide will help you get started and ensure your contributions can be efficiently integrated into the project.
-## 1. 🚀 Getting Started
+## 🌟 Quick Links
-### 🍴 Fork the Repository
+- [Code of Conduct](CODE_OF_CONDUCT.md)
 - [Discord Community](https://discord.gg/bcy8xFAtfd)  
 - [Issue Tracker](https://github.com/topoteretes/cognee/issues)
-To start your journey, you'll need your very own copy of **cognee**. Think of it as your own innovation lab. 🧪
+## 1. 🚀 Ways to Contribute
-1. Navigate to the [**cognee**](https://github.com/topoteretes/cognee) repository on GitHub.
+You can contribute to **cognee** in many ways:
 2. In the upper-right corner, click the **'Fork'** button.
-### 🚀 Clone the Repository
+- 📝 Submitting bug reports or feature requests
 - 💡 Improving documentation
 - 🔍 Reviewing pull requests
 - 🛠️ Contributing code or tests
 - 🌐 Helping other users
-Next, let's bring your newly forked repository to your local machine.
+## 📫 Get in Touch
 There are several ways to connect with the **cognee** team and community:
 ### GitHub Collaboration
 - [Open an issue](https://github.com/topoteretes/cognee/issues) for bug reports, feature requests, or discussions
 - Submit pull requests to contribute code or documentation
 - Join ongoing discussions in existing issues and PRs
 ### Community Channels
 - Join our [Discord community](https://discord.gg/bcy8xFAtfd) for real-time discussions
 - Participate in community events and discussions
 - Get help from other community members
 ### Direct Contact
 - Email: vasilije@cognee.ai
 - For business inquiries or sensitive matters, please reach out via email
 - For general questions, prefer public channels like GitHub issues or Discord
 We aim to respond to all communications within 2 business days. For faster responses, consider using our Discord channel where the whole community can help!
 ## Issue Labels
 To help you find the most appropriate issues to work on, we use the following labels:
 - `good first issue` - Perfect for newcomers to the project
 - `bug` - Something isn't working as expected
 - `documentation` - Improvements or additions to documentation
 - `enhancement` - New features or improvements
 - `help wanted` - Extra attention or assistance needed
 - `question` - Further information is requested
 - `wontfix` - This will not be worked on
 Looking for a place to start? Try filtering for [good first issues](https://github.com/topoteretes/cognee/labels/good%20first%20issue)!
 ## 2. 🛠️ Development Setup
 ### Fork and Clone
 1. Fork the [**cognee**](https://github.com/topoteretes/cognee) repository
 2. Clone your fork:
 ```shell
 git clone https://github.com/<your-github-username>/cognee.git
 cd cognee
 ```
-## 2. 🛠️ Making Changes
+### Create a Branch
 ### 🌟 Create a Branch
 Get ready to channel your creativity. Begin by creating a new branch for your incredible features. 🧞‍♂️
 Create a new branch for your work:
 ```shell
 git checkout -b feature/your-feature-name
 ```
-### ✏️ Make Your Changes
+## 3. 🎯 Making Changes
-Now's your chance to shine! Dive in and make your contributions. 🌠
+1. **Code Style**: Follow the project's coding standards
-
+2. **Documentation**: Update relevant documentation
-## 3. 🚀 Submitting Changes
+3. **Tests**: Add tests for new features
-
+4. **Commits**: Write clear commit messages
 After making your changes, follow these steps:
 ### ✅ Run the Tests
 Ensure your changes do not break the existing codebase:
 ### Running Tests
 ```shell
 python cognee/cognee/tests/test_library.py
 ```
-### 🚢 Push Your Feature Branch
+## 4. 📤 Submitting Changes
 1. Push your changes:
 ```shell
 # Add your changes to the staging area:
 git add .
-
+git commit -s -m "Description of your changes"
 # Commit changes with an adequate description:
 git commit -m "Describe your changes here"
 # Push your feature branch to your forked repository:
 git push origin feature/your-feature-name
 ```
-### 🚀 Create a Pull Request
+2. Create a Pull Request:
   - Go to the [**cognee** repository](https://github.com/topoteretes/cognee)
   - Click "Compare & Pull Request"
   - Fill in the PR template with details about your changes
-You're on the verge of completion! It's time to showcase your hard work. 🌐
+## 5. 📜 Developer Certificate of Origin (DCO)
-1. Go to [**cognee**](https://github.com/topoteretes/cognee) on GitHub.
+All contributions must be signed-off to indicate agreement with our DCO:
 2. Hit the **"Compare & Pull Request"** button.
 3. Select the base branch (main) and the compare branch (the one with your features).
 4. Craft a **compelling title** and provide a **detailed description** of your contributions. 🎩
-## 4. 🔍 Review and Approval
+```shell
 git config alias.cos "commit -s"  # Create alias for signed commits
 ```
-The project maintainers will review your work, possibly suggest improvements, or request further details. Once you receive approval, your contributions will become part of **cognee**!
+When your PR is ready, please include:
 > "I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin"
 ## 6. 🤝 Community Guidelines
-## 5. Developer Certificate of Origin
+- Be respectful and inclusive
-All contributions to the topoteretes codebase must be signed-off to indicate you have read and agreed to the Developer Certificate of Origin (DCO), which is in the root directory under name DCO. To sign the DCO, simply add -s after all commits that you make, to do this easily you can make a git alias from the command line, for example:
+- Help others learn and grow
 - Follow our [Code of Conduct](CODE_OF_CONDUCT.md)
 - Provide constructive feedback
 - Ask questions when unsure
-$ git config alias.cos "commit -s"
+## 7. 📫 Getting Help
-Will allow you to write git cos which will automatically sign-off your commit. By signing a commit you are agreeing to the DCO and agree that you will be banned from the topoteretes GitHub organisation and Discord server if you violate the DCO.
+- Open an [issue](https://github.com/topoteretes/cognee/issues)
 - Join our Discord community
 - Check existing documentation
-"When a commit is ready to be merged please use the following template to agree to our developer certificate of origin:
+Thank you for contributing to **cognee**! 🌟
  'I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin'
 We consider the following as violations to the DCO:
 Signing the DCO with a fake name or pseudonym, if you are registered on GitHub or another platform with a fake name then you will not be able to contribute to topoteretes before updating your name;
 Submitting a contribution that you did not have the right to submit whether due to licensing, copyright, or any other restrictions.
 ## 6. 📜 Code of Conduct
 Ensure you adhere to the project's [Code of Conduct](https://github.com/topoteretes/cognee/blob/main/CODE_OF_CONDUCT.md) throughout your participation.
 ## 7. 📫 Contact
 If you need assistance or simply wish to connect, we're here for you. Contact us by filing an issue on the GitHub repository or by messaging us on our Discord server.
 Thanks for helping to evolve **cognee**!
--- a/README.md
+++ b/README.md
@ -8,11 +8,11 @@
  cognee - memory layer for AI apps and Agents
  <p align="center">
  <a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
  .
  <a href="https://cognee.ai">Learn more</a>
  ·
  <a href="https://discord.gg/NQPKmU5CCg">Join Discord</a>
  ·
  <a href="https://www.youtube.com/watch?v=1bezuvLwJmw&t=2s">Demo</a>
  </p>
@ -89,7 +89,7 @@ Add LLM_API_KEY to .env using the command bellow.
 ```
 echo "LLM_API_KEY=YOUR_OPENAI_API_KEY" > .env
 ```
-You can see available env variables in the repository `.env.template` file.
+You can see available env variables in the repository `.env.template` file. If you don't specify it otherwise, like in this example, SQLite (relational database), LanceDB (vector database) and NetworkX (graph store) will be used as default components.
 This script will run the default pipeline:
--- a/cognee-mcp/Dockerfile
+++ b/cognee-mcp/Dockerfile
@ -1,34 +1,35 @@
 # Use a Python image with uv pre-installed
 FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS uv
-# Set build argument
+# Install the project into `/app`
 ARG DEBUG
 # Set environment variable based on the build argument
 ENV DEBUG=${DEBUG}
 ENV PIP_NO_CACHE_DIR=true
 WORKDIR /app
 # Enable bytecode compilation
-ENV UV_COMPILE_BYTECODE=1
+# ENV UV_COMPILE_BYTECODE=1
 # Copy from the cache instead of linking since it's a mounted volume
 ENV UV_LINK_MODE=copy
-RUN apt-get update && apt-get install -y \
+# Install the project's dependencies using the lockfile and settings
-    gcc \
+RUN --mount=type=cache,target=/root/.cache/uv \
-    libpq-dev
+    --mount=type=bind,source=uv.lock,target=uv.lock \
    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
    uv sync --frozen --no-install-project --no-dev --no-editable
-RUN apt-get install -y \
+# Then, add the rest of the project source code and install it
-    gcc \
+# Installing separately from its dependencies allows optimal layer caching
-    libpq-dev
+ADD . /app
 RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen --no-dev --no-editable
-COPY . /app
+FROM python:3.12-slim-bookworm
-RUN uv sync --reinstall
+WORKDIR /app
 COPY --from=uv /root/.local /root/.local
 COPY --from=uv --chown=app:app /app /app
 # Place executables in the environment at the front of the path
-ENV PATH="/app/:/app/.venv/bin:$PATH"
+ENV PATH="/app/.venv/bin:$PATH"
 ENTRYPOINT ["cognee"]
--- a/cognee-mcp/README.md
+++ b/cognee-mcp/README.md
@ -82,5 +82,5 @@ http://localhost:5173?timeout=120000
 To apply new changes while developing cognee you need to do:
 1. `poetry lock` in cognee folder
-2. `uv sync --dev --all-extras --reinstall `
+2. `uv sync --dev --all-extras --reinstall`
 3. `mcp dev src/server.py`
--- a/cognee-mcp/pyproject.toml
+++ b/cognee-mcp/pyproject.toml
@ -1,12 +1,12 @@
 [project]
 name = "cognee-mcp"
-version = "0.1.0"
+version = "0.2.0"
 description = "A MCP server project"
 readme = "README.md"
 requires-python = ">=3.10"
 dependencies = [
-    "cognee[codegraph,gemini,huggingface]",
+    "cognee[postgres,codegraph,gemini,huggingface]",
    "mcp==1.2.1",
    "uv>=0.6.3",
 ]
--- a/cognee-mcp/uv.lock
+++ b/cognee-mcp/uv.lock
@ -547,7 +547,7 @@ huggingface = [
 [[package]]
 name = "cognee-mcp"
-version = "0.1.0"
+version = "0.2.0"
 source = { editable = "." }
 dependencies = [
    { name = "cognee", extra = ["codegraph", "gemini", "huggingface"] },
--- a/cognee/eval_framework/answer_generation/answer_generation_executor.py
+++ b/cognee/eval_framework/answer_generation/answer_generation_executor.py
@ -29,13 +29,16 @@ class AnswerGeneratorExecutor:
            retrieval_context = await retriever.get_context(query_text)
            search_results = await retriever.get_completion(query_text, retrieval_context)
-            answers.append(
+            answer = {
-                {
+                "question": query_text,
-                    "question": query_text,
+                "answer": search_results[0],
-                    "answer": search_results[0],
+                "golden_answer": correct_answer,
-                    "golden_answer": correct_answer,
+                "retrieval_context": retrieval_context,
-                    "retrieval_context": retrieval_context,
+            }
-                }
+
-            )
+            if "golden_context" in instance:
                answer["golden_context"] = instance["golden_context"]
            answers.append(answer)
        return answers
--- a/cognee/eval_framework/answer_generation/run_question_answering_module.py
+++ b/cognee/eval_framework/answer_generation/run_question_answering_module.py
@ -1,6 +1,6 @@
 import logging
 import json
-from typing import List
+from typing import List, Optional
 from cognee.eval_framework.answer_generation.answer_generation_executor import (
    AnswerGeneratorExecutor,
    retriever_options,
@ -32,7 +32,7 @@ async def create_and_insert_answers_table(questions_payload):
 async def run_question_answering(
-    params: dict, system_prompt="answer_simple_question.txt"
+    params: dict, system_prompt="answer_simple_question.txt", top_k: Optional[int] = None
 ) -> List[dict]:
    if params.get("answering_questions"):
        logging.info("Question answering started...")
@ -48,7 +48,9 @@ async def run_question_answering(
        answer_generator = AnswerGeneratorExecutor()
        answers = await answer_generator.question_answering_non_parallel(
            questions=questions,
-            retriever=retriever_options[params["qa_engine"]](system_prompt_path=system_prompt),
+            retriever=retriever_options[params["qa_engine"]](
                system_prompt_path=system_prompt, top_k=top_k
            ),
        )
        with open(params["answers_path"], "w", encoding="utf-8") as f:
            json.dump(answers, f, ensure_ascii=False, indent=4)
--- a/cognee/eval_framework/benchmark_adapters/dummy_adapter.py
+++ b/cognee/eval_framework/benchmark_adapters/dummy_adapter.py
@ -5,18 +5,21 @@ from cognee.eval_framework.benchmark_adapters.base_benchmark_adapter import Base
 class DummyAdapter(BaseBenchmarkAdapter):
    def load_corpus(
-        self, limit: Optional[int] = None, seed: int = 42
+        self, limit: Optional[int] = None, seed: int = 42, load_golden_context: bool = False
    ) -> tuple[list[str], list[dict[str, Any]]]:
        corpus_list = [
            "The cognee is an AI memory engine that supports different vector and graph databases",
            "Neo4j is a graph database supported by cognee",
        ]
-        question_answer_pairs = [
+        qa_pair = {
-            {
+            "answer": "Yes",
-                "answer": "Yes",
+            "question": "Is Neo4j supported by cognee?",
-                "question": "Is Neo4j supported by cognee?",
+            "type": "dummy",
-                "type": "dummy",
+        }
-            }
+
-        ]
+        if load_golden_context:
            qa_pair["golden_context"] = "Cognee supports Neo4j and NetworkX"
        question_answer_pairs = [qa_pair]
        return corpus_list, question_answer_pairs
--- a/cognee/eval_framework/corpus_builder/corpus_builder_executor.py
+++ b/cognee/eval_framework/corpus_builder/corpus_builder_executor.py
@ -28,14 +28,22 @@ class CorpusBuilderExecutor:
        self.questions = None
        self.task_getter = task_getter
-    def load_corpus(self, limit: Optional[int] = None) -> Tuple[List[Dict], List[str]]:
+    def load_corpus(
-        self.raw_corpus, self.questions = self.adapter.load_corpus(limit=limit)
+        self, limit: Optional[int] = None, load_golden_context: bool = False
    ) -> Tuple[List[Dict], List[str]]:
        self.raw_corpus, self.questions = self.adapter.load_corpus(
            limit=limit, load_golden_context=load_golden_context
        )
        return self.raw_corpus, self.questions
    async def build_corpus(
-        self, limit: Optional[int] = None, chunk_size=1024, chunker=TextChunker
+        self,
        limit: Optional[int] = None,
        chunk_size=1024,
        chunker=TextChunker,
        load_golden_context: bool = False,
    ) -> List[str]:
-        self.load_corpus(limit=limit)
+        self.load_corpus(limit=limit, load_golden_context=load_golden_context)
        await self.run_cognee(chunk_size=chunk_size, chunker=chunker)
        return self.questions
--- a/cognee/eval_framework/corpus_builder/run_corpus_builder.py
+++ b/cognee/eval_framework/corpus_builder/run_corpus_builder.py
@ -47,7 +47,10 @@ async def run_corpus_builder(params: dict, chunk_size=1024, chunker=TextChunker)
            task_getter=task_getter,
        )
        questions = await corpus_builder.build_corpus(
-            limit=params.get("number_of_samples_in_corpus"), chunk_size=chunk_size, chunker=chunker
+            limit=params.get("number_of_samples_in_corpus"),
            chunk_size=chunk_size,
            chunker=chunker,
            load_golden_context=params.get("evaluating_contexts"),
        )
        with open(params["questions_path"], "w", encoding="utf-8") as f:
            json.dump(questions, f, ensure_ascii=False, indent=4)
--- a/cognee/eval_framework/evaluation/deep_eval_adapter.py
+++ b/cognee/eval_framework/evaluation/deep_eval_adapter.py
@ -4,6 +4,7 @@ from cognee.eval_framework.eval_config import EvalConfig
 from cognee.eval_framework.evaluation.base_eval_adapter import BaseEvalAdapter
 from cognee.eval_framework.evaluation.metrics.exact_match import ExactMatchMetric
 from cognee.eval_framework.evaluation.metrics.f1 import F1ScoreMetric
 from cognee.eval_framework.evaluation.metrics.context_coverage import ContextCoverageMetric
 from typing import Any, Dict, List
 from deepeval.metrics import ContextualRelevancyMetric
@ -15,6 +16,7 @@ class DeepEvalAdapter(BaseEvalAdapter):
            "EM": ExactMatchMetric(),
            "f1": F1ScoreMetric(),
            "contextual_relevancy": ContextualRelevancyMetric(),
            "context_coverage": ContextCoverageMetric(),
        }
    async def evaluate_answers(
@ -32,6 +34,7 @@ class DeepEvalAdapter(BaseEvalAdapter):
                actual_output=answer["answer"],
                expected_output=answer["golden_answer"],
                retrieval_context=[answer["retrieval_context"]],
                context=[answer["golden_context"]] if "golden_context" in answer else None,
            )
            metric_results = {}
            for metric in evaluator_metrics:
--- a/cognee/eval_framework/evaluation/evaluation_executor.py
+++ b/cognee/eval_framework/evaluation/evaluation_executor.py
@ -23,5 +23,6 @@ class EvaluationExecutor:
    async def execute(self, answers: List[Dict[str, str]], evaluator_metrics: Any) -> Any:
        if self.evaluate_contexts:
            evaluator_metrics.append("contextual_relevancy")
            evaluator_metrics.append("context_coverage")
        metrics = await self.eval_adapter.evaluate_answers(answers, evaluator_metrics)
        return metrics
--- a/cognee/eval_framework/evaluation/metrics/context_coverage.py
+++ b/cognee/eval_framework/evaluation/metrics/context_coverage.py
@ -0,0 +1,50 @@
 from deepeval.metrics import SummarizationMetric
 from deepeval.test_case import LLMTestCase
 from deepeval.metrics.summarization.schema import ScoreType
 from deepeval.metrics.indicator import metric_progress_indicator
 from deepeval.utils import get_or_create_event_loop
 class ContextCoverageMetric(SummarizationMetric):
    def measure(
        self,
        test_case,
        _show_indicator: bool = True,
    ) -> float:
        mapped_test_case = LLMTestCase(
            input=test_case.context[0],
            actual_output=test_case.retrieval_context[0],
        )
        self.assessment_questions = None
        self.evaluation_cost = 0 if self.using_native_model else None
        with metric_progress_indicator(self, _show_indicator=_show_indicator):
            if self.async_mode:
                loop = get_or_create_event_loop()
                return loop.run_until_complete(
                    self.a_measure(mapped_test_case, _show_indicator=False)
                )
            else:
                self.coverage_verdicts = self._generate_coverage_verdicts(mapped_test_case)
                self.alignment_verdicts = []
                self.score = self._calculate_score(ScoreType.COVERAGE)
                self.reason = self._generate_reason()
                self.success = self.score >= self.threshold
                return self.score
    async def a_measure(
        self,
        test_case,
        _show_indicator: bool = True,
    ) -> float:
        self.evaluation_cost = 0 if self.using_native_model else None
        with metric_progress_indicator(
            self,
            async_mode=True,
            _show_indicator=_show_indicator,
        ):
            self.coverage_verdicts = await self._a_generate_coverage_verdicts(test_case)
            self.alignment_verdicts = []
            self.score = self._calculate_score(ScoreType.COVERAGE)
            self.reason = await self._a_generate_reason()
            self.success = self.score >= self.threshold
            return self.score
--- a/cognee/eval_framework/metrics_dashboard.py
+++ b/cognee/eval_framework/metrics_dashboard.py
@ -3,6 +3,12 @@ import plotly.graph_objects as go
 from typing import Dict, List, Tuple
 from collections import defaultdict
 metrics_fields = {
    "contextual_relevancy": ["question", "retrieval_context"],
    "context_coverage": ["question", "retrieval_context", "golden_context"],
 }
 default_metrics_fields = ["question", "answer", "golden_answer"]
 def create_distribution_plots(metrics_data: Dict[str, List[float]]) -> List[str]:
    """Create distribution histogram plots for each metric."""
@ -59,38 +65,30 @@ def generate_details_html(metrics_data: List[Dict]) -> List[str]:
        for metric, values in entry["metrics"].items():
            if metric not in metric_details:
                metric_details[metric] = []
            current_metrics_fields = metrics_fields.get(metric, default_metrics_fields)
            metric_details[metric].append(
-                {
+                {key: entry[key] for key in current_metrics_fields}
-                    "question": entry["question"],
+                | {
                    "answer": entry["answer"],
                    "golden_answer": entry["golden_answer"],
                    "reason": values.get("reason", ""),
                    "score": values["score"],
                }
            )
    for metric, details in metric_details.items():
        formatted_column_names = [key.replace("_", " ").title() for key in details[0].keys()]
        details_html.append(f"<h3>{metric} Details</h3>")
-        details_html.append("""
+        details_html.append(f"""
            <table class="metric-table">
                <tr>
-                    <th>Question</th>
+                    {"".join(f"<th>{col}</th>" for col in formatted_column_names)}
                    <th>Answer</th>
                    <th>Golden Answer</th>
                    <th>Reason</th>
                    <th>Score</th>
                </tr>
        """)
        for item in details:
-            details_html.append(
+            details_html.append(f"""
-                f"<tr>"
+                <tr>
-                f"<td>{item['question']}</td>"
+                    {"".join(f"<td>{value}</td>" for value in item.values())}
-                f"<td>{item['answer']}</td>"
+                </tr>
-                f"<td>{item['golden_answer']}</td>"
+            """)
                f"<td>{item['reason']}</td>"
                f"<td>{item['score']}</td>"
                f"</tr>"
            )
        details_html.append("</table>")
    return details_html
--- a/cognee/modules/retrieval/completion_retriever.py
+++ b/cognee/modules/retrieval/completion_retriever.py
@ -13,15 +13,17 @@ class CompletionRetriever(BaseRetriever):
        self,
        user_prompt_path: str = "context_for_question.txt",
        system_prompt_path: str = "answer_simple_question.txt",
        top_k: Optional[int] = 1,
    ):
        """Initialize retriever with optional custom prompt paths."""
        self.user_prompt_path = user_prompt_path
        self.system_prompt_path = system_prompt_path
        self.top_k = top_k if top_k is not None else 1
    async def get_context(self, query: str) -> Any:
        """Retrieves relevant document chunks as context."""
        vector_engine = get_vector_engine()
-        found_chunks = await vector_engine.search("DocumentChunk_text", query, limit=1)
+        found_chunks = await vector_engine.search("DocumentChunk_text", query, limit=self.top_k)
        if len(found_chunks) == 0:
            raise NoRelevantDataFound
        return found_chunks[0].payload["text"]
--- a/cognee/modules/retrieval/graph_completion_retriever.py
+++ b/cognee/modules/retrieval/graph_completion_retriever.py
@ -15,12 +15,12 @@ class GraphCompletionRetriever(BaseRetriever):
        self,
        user_prompt_path: str = "graph_context_for_question.txt",
        system_prompt_path: str = "answer_simple_question.txt",
-        top_k: int = 5,
+        top_k: Optional[int] = 5,
    ):
        """Initialize retriever with prompt paths and search parameters."""
        self.user_prompt_path = user_prompt_path
        self.system_prompt_path = system_prompt_path
-        self.top_k = top_k
+        self.top_k = top_k if top_k is not None else 5
    async def resolve_edges_to_text(self, retrieved_edges: list) -> str:
        """Converts retrieved graph edges into a human-readable string format."""
--- a/cognee/modules/retrieval/graph_summary_completion_retriever.py
+++ b/cognee/modules/retrieval/graph_summary_completion_retriever.py
@ -12,7 +12,7 @@ class GraphSummaryCompletionRetriever(GraphCompletionRetriever):
        user_prompt_path: str = "graph_context_for_question.txt",
        system_prompt_path: str = "answer_simple_question.txt",
        summarize_prompt_path: str = "summarize_search_results.txt",
-        top_k: int = 5,
+        top_k: Optional[int] = 5,
    ):
        """Initialize retriever with default prompt paths and search parameters."""
        super().__init__(
--- a/cognee/tests/unit/eval_framework/deepeval_adapter_test.py
+++ b/cognee/tests/unit/eval_framework/deepeval_adapter_test.py
@ -5,7 +5,12 @@ import sys
 with patch.dict(
    sys.modules,
-    {"deepeval": MagicMock(), "deepeval.metrics": MagicMock(), "deepeval.test_case": MagicMock()},
+    {
        "deepeval": MagicMock(),
        "deepeval.metrics": MagicMock(),
        "deepeval.test_case": MagicMock(),
        "cognee.eval_framework.evaluation.metrics.context_coverage": MagicMock(),
    },
 ):
    from cognee.eval_framework.evaluation.deep_eval_adapter import DeepEvalAdapter
--- a/examples/python/pokemon_datapoints_example.py
+++ b/examples/python/pokemon_datapoints_example.py
@ -0,0 +1,206 @@
 # Standard library imports
 import os
 import json
 import asyncio
 import pathlib
 from uuid import uuid5, NAMESPACE_OID
 from typing import List, Optional
 from pathlib import Path
 import dlt
 import requests
 import cognee
 from cognee.low_level import DataPoint, setup as cognee_setup
 from cognee.api.v1.search import SearchType
 from cognee.tasks.storage import add_data_points
 from cognee.modules.pipelines.tasks.Task import Task
 from cognee.modules.pipelines import run_tasks
 BASE_URL = "https://pokeapi.co/api/v2/"
 os.environ["BUCKET_URL"] = "./.data_storage"
 os.environ["DATA_WRITER__DISABLE_COMPRESSION"] = "true"
 # Data Models
 class Abilities(DataPoint):
    name: str = "Abilities"
    metadata: dict = {"index_fields": ["name"]}
 class PokemonAbility(DataPoint):
    name: str
    ability__name: str
    ability__url: str
    is_hidden: bool
    slot: int
    _dlt_load_id: str
    _dlt_id: str
    _dlt_parent_id: str
    _dlt_list_idx: str
    is_type: Abilities
    metadata: dict = {"index_fields": ["ability__name"]}
 class Pokemons(DataPoint):
    name: str = "Pokemons"
    have: Abilities
    metadata: dict = {"index_fields": ["name"]}
 class Pokemon(DataPoint):
    name: str
    base_experience: int
    height: int
    weight: int
    is_default: bool
    order: int
    location_area_encounters: str
    species__name: str
    species__url: str
    cries__latest: str
    cries__legacy: str
    sprites__front_default: str
    sprites__front_shiny: str
    sprites__back_default: Optional[str]
    sprites__back_shiny: Optional[str]
    _dlt_load_id: str
    _dlt_id: str
    is_type: Pokemons
    abilities: List[PokemonAbility]
    metadata: dict = {"index_fields": ["name"]}
 # Data Collection Functions
@dlt.resource(write_disposition="replace")
 def pokemon_list(limit: int = 50):
    response = requests.get(f"{BASE_URL}pokemon", params={"limit": limit})
    response.raise_for_status()
    yield response.json()["results"]
@dlt.transformer(data_from=pokemon_list)
 def pokemon_details(pokemons):
    """Fetches detailed info for each Pokémon"""
    for pokemon in pokemons:
        response = requests.get(pokemon["url"])
        response.raise_for_status()
        yield response.json()
 # Data Loading Functions
 def load_abilities_data(jsonl_abilities):
    abilities_root = Abilities()
    pokemon_abilities = []
    for jsonl_ability in jsonl_abilities:
        with open(jsonl_ability, "r") as f:
            for line in f:
                ability = json.loads(line)
                ability["id"] = uuid5(NAMESPACE_OID, ability["_dlt_id"])
                ability["name"] = ability["ability__name"]
                ability["is_type"] = abilities_root
                pokemon_abilities.append(ability)
    return abilities_root, pokemon_abilities
 def load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root):
    pokemons = []
    for jsonl_pokemon in jsonl_pokemons:
        with open(jsonl_pokemon, "r") as f:
            for line in f:
                pokemon_data = json.loads(line)
                abilities = [
                    ability
                    for ability in pokemon_abilities
                    if ability["_dlt_parent_id"] == pokemon_data["_dlt_id"]
                ]
                pokemon_data["external_id"] = pokemon_data["id"]
                pokemon_data["id"] = uuid5(NAMESPACE_OID, str(pokemon_data["id"]))
                pokemon_data["abilities"] = [PokemonAbility(**ability) for ability in abilities]
                pokemon_data["is_type"] = pokemon_root
                pokemons.append(Pokemon(**pokemon_data))
    return pokemons
 # Main Application Logic
 async def setup_and_process_data():
    """Setup configuration and process Pokemon data"""
    # Setup configuration
    data_directory_path = str(
        pathlib.Path(os.path.join(pathlib.Path(__file__).parent, ".data_storage")).resolve()
    )
    cognee_directory_path = str(
        pathlib.Path(os.path.join(pathlib.Path(__file__).parent, ".cognee_system")).resolve()
    )
    cognee.config.data_root_directory(data_directory_path)
    cognee.config.system_root_directory(cognee_directory_path)
    # Initialize pipeline and collect data
    pipeline = dlt.pipeline(
        pipeline_name="pokemon_pipeline",
        destination="filesystem",
        dataset_name="pokemon_data",
    )
    info = pipeline.run([pokemon_list, pokemon_details])
    print(info)
    # Load and process data
    STORAGE_PATH = Path(".data_storage/pokemon_data/pokemon_details")
    jsonl_pokemons = sorted(STORAGE_PATH.glob("*.jsonl"))
    if not jsonl_pokemons:
        raise FileNotFoundError("No JSONL files found in the storage directory.")
    ABILITIES_PATH = Path(".data_storage/pokemon_data/pokemon_details__abilities")
    jsonl_abilities = sorted(ABILITIES_PATH.glob("*.jsonl"))
    if not jsonl_abilities:
        raise FileNotFoundError("No JSONL files found in the storage directory.")
    # Process data
    abilities_root, pokemon_abilities = load_abilities_data(jsonl_abilities)
    pokemon_root = Pokemons(have=abilities_root)
    pokemons = load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root)
    return pokemons
 async def pokemon_cognify(pokemons):
    """Process Pokemon data with Cognee and perform search"""
    # Setup and run Cognee tasks
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)
    await cognee_setup()
    tasks = [Task(add_data_points, task_config={"batch_size": 50})]
    results = run_tasks(
        tasks=tasks,
        data=pokemons,
        dataset_id=uuid5(NAMESPACE_OID, "Pokemon"),
        pipeline_name="pokemon_pipeline",
    )
    async for result in results:
        print(result)
    print("Done")
    # Perform search
    search_results = await cognee.search(
        query_type=SearchType.GRAPH_COMPLETION, query_text="pokemons?"
    )
    print("Search results:")
    for result_text in search_results:
        print(result_text)
 async def main():
    pokemons = await setup_and_process_data()
    await pokemon_cognify(pokemons)
 if __name__ == "__main__":
    asyncio.run(main())
--- a/helm/Chart.yaml
+++ b/helm/Chart.yaml
@ -0,0 +1,24 @@
 apiVersion: v2
 name: cognee-chart
 description: A helm chart of the cognee backend deployment on Kubernetes environment
 # A chart can be either an 'application' or a 'library' chart.
 #
 # Application charts are a collection of templates that can be packaged into versioned archives
 # to be deployed.
 #
 # Library charts provide useful utilities or functions for the chart developer. They're included as
 # a dependency of application charts to inject those utilities and functions into the rendering
 # pipeline. Library charts do not define any templates and therefore cannot be deployed.
 type: application
 # This is the chart version. This version number should be incremented each time you make changes
 # to the chart and its templates, including the app version.
 # Versions are expected to follow Semantic Versioning (https://semver.org/)
 version: 0.1.0
 # This is the version number of the application being deployed. This version number should be
 # incremented each time you make changes to the application. Versions are not expected to
 # follow Semantic Versioning. They should reflect the version the application is using.
 # It is recommended to use it with quotes.
 appVersion: "1.16.0"
--- a/helm/Dockerfile
+++ b/helm/Dockerfile
@ -0,0 +1,59 @@
 FROM python:3.11-slim
 # Define Poetry extras to install
 ARG POETRY_EXTRAS="\
 # Storage & Databases \
 filesystem postgres weaviate qdrant neo4j falkordb milvus \
 # Notebooks & Interactive Environments \
 notebook \
 # LLM & AI Frameworks \
 langchain llama-index gemini huggingface ollama mistral groq \
 # Evaluation & Monitoring \
 deepeval evals posthog \
 # Graph Processing & Code Analysis \
 codegraph graphiti \
 # Document Processing \
 docs"
 # Set build argument
 ARG DEBUG
 # Set environment variable based on the build argument
 ENV DEBUG=${DEBUG}
 ENV PIP_NO_CACHE_DIR=true
 ENV PATH="${PATH}:/root/.poetry/bin"
 RUN apt-get install -y \
  gcc \
  libpq-dev
 WORKDIR /app
 COPY pyproject.toml poetry.lock /app/
 RUN pip install poetry
 # Don't create virtualenv since docker is already isolated
 RUN poetry config virtualenvs.create false
 # Install the dependencies
 RUN poetry install --extras "${POETRY_EXTRAS}" --no-root --without dev
 # Set the PYTHONPATH environment variable to include the /app directory
 ENV PYTHONPATH=/app
 COPY cognee/ /app/cognee
 # Copy Alembic configuration
 COPY alembic.ini /app/alembic.ini
 COPY alembic/ /app/alembic
 COPY entrypoint.sh /app/entrypoint.sh
 RUN chmod +x /app/entrypoint.sh
 RUN sed -i 's/\r$//' /app/entrypoint.sh
 ENTRYPOINT ["/app/entrypoint.sh"]
--- a/helm/README.md
+++ b/helm/README.md
@ -0,0 +1,25 @@
 # cognee-infra-helm
 General infrastructure setup for Cognee on Kubernetes using a Helm chart.
 ## Prerequisites
 Before deploying the Helm chart, ensure the following prerequisites are met: 
 **Kubernetes Cluster**: A running Kubernetes cluster (e.g., Minikube, GKE, EKS).
 **Helm**: Installed and configured for your Kubernetes cluster. You can install Helm by following the [official guide](https://helm.sh/docs/intro/install/). 
 **kubectl**: Installed and configured to interact with your cluster. Follow the instructions [here](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
 Clone the Repository Clone this repository to your local machine and navigate to the directory.
 ## Deploy Helm Chart:
   ```bash
   helm install cognee ./cognee-chart
   ```
 **Uninstall Helm Release**:  
   ```bash
   helm uninstall cognee
   ```
--- a/helm/docker-compose-helm.yml
+++ b/helm/docker-compose-helm.yml
@ -0,0 +1,46 @@
 services:
  cognee:
    image : cognee-backend:latest
    container_name: cognee-backend
    networks:
      - cognee-network
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - .:/app
      - /app/cognee-frontend/ # Ignore frontend code
    environment:
      - HOST=0.0.0.0
      - ENVIRONMENT=local
      - PYTHONPATH=.
    ports:
      - 8000:8000
      # - 5678:5678 # Debugging
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 8GB
  postgres:
    image: pgvector/pgvector:pg17
    container_name: postgres
    environment:
      POSTGRES_USER: cognee
      POSTGRES_PASSWORD: cognee
      POSTGRES_DB: cognee_db
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - 5432:5432
    networks:
      - cognee-network
 networks:
  cognee-network:
    name: cognee-network
 volumes:
  postgres_data:
--- a/helm/templates/cognee_deployment.yaml
+++ b/helm/templates/cognee_deployment.yaml
@ -0,0 +1,32 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: {{ .Release.Name }}-cognee
  labels:
    app: {{ .Release.Name }}-cognee
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{ .Release.Name }}-cognee
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}-cognee
    spec:
      containers:
        - name: cognee
          image: {{ .Values.cognee.image }}
          ports:
            - containerPort: {{ .Values.cognee.port }}
          env:
            - name: HOST
              value: {{ .Values.cognee.env.HOST }}
            - name: ENVIRONMENT
              value: {{ .Values.cognee.env.ENVIRONMENT }}
            - name: PYTHONPATH
              value: {{ .Values.cognee.env.PYTHONPATH }}
          resources:
            limits:
              cpu: {{ .Values.cognee.resources.cpu }}
              memory: {{ .Values.cognee.resources.memory }}
--- a/helm/templates/cognee_service.yaml
+++ b/helm/templates/cognee_service.yaml
@ -0,0 +1,13 @@
 apiVersion: v1
 kind: Service
 metadata:
  name: {{ .Release.Name }}-cognee
  labels:
    app: {{ .Release.Name }}-cognee
 spec:
  type: NodePort
  ports:
    - port: {{ .Values.cognee.port }}
      targetPort: {{ .Values.cognee.port }}
  selector:
    app: {{ .Release.Name }}-cognee
--- a/helm/templates/postgres_deployment.yaml
+++ b/helm/templates/postgres_deployment.yaml
@ -0,0 +1,35 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: {{ .Release.Name }}-postgres
  labels:
    app: {{ .Release.Name }}-postgres
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{ .Release.Name }}-postgres
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}-postgres
    spec:
      containers:
        - name: postgres
          image: {{ .Values.postgres.image }}
          ports:
            - containerPort: {{ .Values.postgres.port }}
          env:
            - name: POSTGRES_USER
              value: {{ .Values.postgres.env.POSTGRES_USER }}
            - name: POSTGRES_PASSWORD
              value: {{ .Values.postgres.env.POSTGRES_PASSWORD }}
            - name: POSTGRES_DB
              value: {{ .Values.postgres.env.POSTGRES_DB }}
          volumeMounts:
            - name: postgres-storage
              mountPath: /var/lib/postgresql/data
      volumes:
        - name: postgres-storage
          persistentVolumeClaim:
            claimName: {{ .Release.Name }}-postgres-pvc
--- a/helm/templates/postgres_pvc.yaml
+++ b/helm/templates/postgres_pvc.yaml
@ -0,0 +1,10 @@
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: {{ .Release.Name }}-postgres-pvc
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: {{ .Values.postgres.storage }}
--- a/helm/templates/postgres_service.yaml
+++ b/helm/templates/postgres_service.yaml
@ -0,0 +1,14 @@
 apiVersion: v1
 kind: Service
 metadata:
  name: {{ .Release.Name }}-postgres
  labels:
    app: {{ .Release.Name }}-postgres
 spec:
  type: ClusterIP
  ports:
    - port: {{ .Values.postgres.port }}
      targetPort: {{ .Values.postgres.port }}
  selector:
    app: {{ .Release.Name }}-postgres
--- a/helm/values.yaml
+++ b/helm/values.yaml
@ -0,0 +1,22 @@
 # Configuration for the 'cognee' application service
 cognee:
  # Image name (using the local image we’ll build in Minikube)
  image: "hajdul1988/cognee-backend:latest"
  port: 8000
  env:
    HOST: "0.0.0.0"
    ENVIRONMENT: "local"
    PYTHONPATH: "."
  resources:
    cpu: "4.0"
    memory: "8Gi"
 # Configuration for the 'postgres' database service
 postgres:
  image: "pgvector/pgvector:pg17"
  port: 5432
  env:
    POSTGRES_USER: "cognee"
    POSTGRES_PASSWORD: "cognee"
    POSTGRES_DB: "cognee_db"
  storage: "8Gi"
--- a/notebooks/pokemon_datapoints_notebook.ipynb
+++ b/notebooks/pokemon_datapoints_notebook.ipynb
@ -0,0 +1,536 @@
 {
 "cells": [
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:58:00.193158Z",
     "start_time": "2025-03-04T11:58:00.190238Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import nest_asyncio\n",
    "nest_asyncio.apply()"
   ],
   "id": "2efba278d106bb5f",
   "outputs": [],
   "execution_count": 2
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### Environment Configuration\n",
    "#### Setup required directories and environment variables.\n"
   ],
   "id": "ccbb2bc23aa456ee"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:59:33.879188Z",
     "start_time": "2025-03-04T11:59:33.873682Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import pathlib\n",
    "import os\n",
    "import cognee\n",
    "\n",
    "notebook_dir = pathlib.Path().resolve()\n",
    "data_directory_path = str(notebook_dir / \".data_storage\")\n",
    "cognee_directory_path = str(notebook_dir / \".cognee_system\")\n",
    "\n",
    "cognee.config.data_root_directory(data_directory_path)\n",
    "cognee.config.system_root_directory(cognee_directory_path)\n",
    "\n",
    "BASE_URL = \"https://pokeapi.co/api/v2/\"\n",
    "os.environ[\"BUCKET_URL\"] = data_directory_path\n",
    "os.environ[\"DATA_WRITER__DISABLE_COMPRESSION\"] = \"true\"\n"
   ],
   "id": "662d554f96f211d9",
   "outputs": [],
   "execution_count": 8
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Initialize DLT Pipeline\n",
    "### Create the DLT pipeline to fetch Pokémon data.\n"
   ],
   "id": "36ae0be71f6e9167"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:58:03.982939Z",
     "start_time": "2025-03-04T11:58:03.819676Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import dlt\n",
    "from pathlib import Path\n",
    "\n",
    "pipeline = dlt.pipeline(\n",
    "    pipeline_name=\"pokemon_pipeline\",\n",
    "    destination=\"filesystem\",\n",
    "    dataset_name=\"pokemon_data\",\n",
    ")\n"
   ],
   "id": "25101ae5f016ce0c",
   "outputs": [],
   "execution_count": 4
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Fetch Pokémon List\n",
    "### Retrieve a list of Pokémon from the API.\n"
   ],
   "id": "9a87ce05a072c48b"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:58:03.990076Z",
     "start_time": "2025-03-04T11:58:03.987199Z"
    }
   },
   "cell_type": "code",
   "source": [
    "@dlt.resource(write_disposition=\"replace\")\n",
    "def pokemon_list(limit: int = 50):\n",
    "    import requests\n",
    "    response = requests.get(f\"{BASE_URL}pokemon\", params={\"limit\": limit})\n",
    "    response.raise_for_status()\n",
    "    yield response.json()[\"results\"]\n"
   ],
   "id": "3b6e60778c61e24a",
   "outputs": [],
   "execution_count": 5
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Fetch Pokémon Details\n",
    "### Fetch detailed information about each Pokémon.\n"
   ],
   "id": "9952767846194e97"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:58:03.996394Z",
     "start_time": "2025-03-04T11:58:03.994122Z"
    }
   },
   "cell_type": "code",
   "source": [
    "@dlt.transformer(data_from=pokemon_list)\n",
    "def pokemon_details(pokemons):\n",
    "    \"\"\"Fetches detailed info for each Pokémon\"\"\"\n",
    "    import requests\n",
    "    for pokemon in pokemons:\n",
    "        response = requests.get(pokemon[\"url\"])\n",
    "        response.raise_for_status()\n",
    "        yield response.json()\n"
   ],
   "id": "79ec9fef12267485",
   "outputs": [],
   "execution_count": 6
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Run Data Pipeline\n",
    "### Execute the pipeline and store Pokémon data.\n"
   ],
   "id": "41e05f660bf9e9d2"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:59:41.571015Z",
     "start_time": "2025-03-04T11:59:36.840744Z"
    }
   },
   "cell_type": "code",
   "source": [
    "info = pipeline.run([pokemon_list, pokemon_details])\n",
    "print(info)\n"
   ],
   "id": "20a3b2c7f404677f",
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pipeline pokemon_pipeline load step completed in 0.06 seconds\n",
      "1 load package(s) were loaded to destination filesystem and into dataset pokemon_data\n",
      "The filesystem destination used file:///Users/lazar/PycharmProjects/cognee/.data_storage location to store data\n",
      "Load package 1741089576.860229 is LOADED and contains no failed jobs\n"
     ]
    }
   ],
   "execution_count": 9
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Load Pokémon Abilities\n",
    "### Load Pokémon ability data from stored files.\n"
   ],
   "id": "937f10b8d1037743"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:59:44.377719Z",
     "start_time": "2025-03-04T11:59:44.363718Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import json\n",
    "from cognee.low_level import DataPoint\n",
    "from uuid import uuid5, NAMESPACE_OID\n",
    "\n",
    "class Abilities(DataPoint):\n",
    "    name: str = \"Abilities\"\n",
    "    metadata: dict = {\"index_fields\": [\"name\"]}\n",
    "\n",
    "def load_abilities_data(jsonl_abilities):\n",
    "    abilities_root = Abilities()\n",
    "    pokemon_abilities = []\n",
    "\n",
    "    for jsonl_ability in jsonl_abilities:\n",
    "        with open(jsonl_ability, \"r\") as f:\n",
    "            for line in f:\n",
    "                ability = json.loads(line)\n",
    "                ability[\"id\"] = uuid5(NAMESPACE_OID, ability[\"_dlt_id\"])\n",
    "                ability[\"name\"] = ability[\"ability__name\"]\n",
    "                ability[\"is_type\"] = abilities_root\n",
    "                pokemon_abilities.append(ability)\n",
    "\n",
    "    return abilities_root, pokemon_abilities\n"
   ],
   "id": "be73050036439ea1",
   "outputs": [],
   "execution_count": 10
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Load Pokémon Data\n",
    "### Load Pokémon details and associate them with abilities.\n"
   ],
   "id": "98c97f799f73df77"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:59:46.251306Z",
     "start_time": "2025-03-04T11:59:46.238283Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from typing import List, Optional\n",
    "\n",
    "class Pokemons(DataPoint):\n",
    "    name: str = \"Pokemons\"\n",
    "    have: Abilities\n",
    "    metadata: dict = {\"index_fields\": [\"name\"]}\n",
    "\n",
    "class PokemonAbility(DataPoint):\n",
    "    name: str\n",
    "    ability__name: str\n",
    "    ability__url: str\n",
    "    is_hidden: bool\n",
    "    slot: int\n",
    "    _dlt_load_id: str\n",
    "    _dlt_id: str\n",
    "    _dlt_parent_id: str\n",
    "    _dlt_list_idx: str\n",
    "    is_type: Abilities\n",
    "    metadata: dict = {\"index_fields\": [\"ability__name\"]}\n",
    "\n",
    "class Pokemon(DataPoint):\n",
    "    name: str\n",
    "    base_experience: int\n",
    "    height: int\n",
    "    weight: int\n",
    "    is_default: bool\n",
    "    order: int\n",
    "    location_area_encounters: str\n",
    "    species__name: str\n",
    "    species__url: str\n",
    "    cries__latest: str\n",
    "    cries__legacy: str\n",
    "    sprites__front_default: str\n",
    "    sprites__front_shiny: str\n",
    "    sprites__back_default: Optional[str]\n",
    "    sprites__back_shiny: Optional[str]\n",
    "    _dlt_load_id: str\n",
    "    _dlt_id: str\n",
    "    is_type: Pokemons\n",
    "    abilities: List[PokemonAbility]\n",
    "    metadata: dict = {\"index_fields\": [\"name\"]}\n",
    "\n",
    "def load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root):\n",
    "    pokemons = []\n",
    "\n",
    "    for jsonl_pokemon in jsonl_pokemons:\n",
    "        with open(jsonl_pokemon, \"r\") as f:\n",
    "            for line in f:\n",
    "                pokemon_data = json.loads(line)\n",
    "                abilities = [\n",
    "                    ability for ability in pokemon_abilities\n",
    "                    if ability[\"_dlt_parent_id\"] == pokemon_data[\"_dlt_id\"]\n",
    "                ]\n",
    "                pokemon_data[\"external_id\"] = pokemon_data[\"id\"]\n",
    "                pokemon_data[\"id\"] = uuid5(NAMESPACE_OID, str(pokemon_data[\"id\"]))\n",
    "                pokemon_data[\"abilities\"] = [PokemonAbility(**ability) for ability in abilities]\n",
    "                pokemon_data[\"is_type\"] = pokemon_root\n",
    "                pokemons.append(Pokemon(**pokemon_data))\n",
    "\n",
    "    return pokemons\n"
   ],
   "id": "7862951248df0bf5",
   "outputs": [],
   "execution_count": 11
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Process Pokémon Data\n",
    "### Load and associate Pokémon abilities.\n"
   ],
   "id": "676fa5a2b61c2107"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:59:47.365226Z",
     "start_time": "2025-03-04T11:59:47.356722Z"
    }
   },
   "cell_type": "code",
   "source": [
    "STORAGE_PATH = Path(\".data_storage/pokemon_data/pokemon_details\")\n",
    "jsonl_pokemons = sorted(STORAGE_PATH.glob(\"*.jsonl\"))\n",
    "\n",
    "ABILITIES_PATH = Path(\".data_storage/pokemon_data/pokemon_details__abilities\")\n",
    "jsonl_abilities = sorted(ABILITIES_PATH.glob(\"*.jsonl\"))\n",
    "\n",
    "abilities_root, pokemon_abilities = load_abilities_data(jsonl_abilities)\n",
    "pokemon_root = Pokemons(have=abilities_root)\n",
    "pokemons = load_pokemon_data(jsonl_pokemons, pokemon_abilities, pokemon_root)\n"
   ],
   "id": "ad14cdecdccd71bb",
   "outputs": [],
   "execution_count": 12
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Initialize Cognee\n",
    "### Setup Cognee for data processing.\n"
   ],
   "id": "59dec67b2ae50f0f"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:59:49.244577Z",
     "start_time": "2025-03-04T11:59:48.618261Z"
    }
   },
   "cell_type": "code",
   "source": [
    "import asyncio\n",
    "from cognee.low_level import setup as cognee_setup\n",
    "\n",
    "async def initialize_cognee():\n",
    "    await cognee.prune.prune_data()\n",
    "    await cognee.prune.prune_system(metadata=True)\n",
    "    await cognee_setup()\n",
    "\n",
    "await initialize_cognee()\n"
   ],
   "id": "d2e095ae576a02c1",
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter:Database deleted successfully.INFO:cognee.infrastructure.databases.relational.sqlalchemy.SqlAlchemyAdapter:Database deleted successfully."
     ]
    }
   ],
   "execution_count": 13
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Process Pokémon Data\n",
    "### Add Pokémon data points to Cognee.\n"
   ],
   "id": "5f0b8090bc7b1fe6"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T11:59:57.744035Z",
     "start_time": "2025-03-04T11:59:50.574033Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from cognee.modules.pipelines.tasks.Task import Task\n",
    "from cognee.tasks.storage import add_data_points\n",
    "from cognee.modules.pipelines import run_tasks\n",
    "\n",
    "tasks = [Task(add_data_points, task_config={\"batch_size\": 50})]\n",
    "results = run_tasks(\n",
    "    tasks=tasks,\n",
    "    data=pokemons,\n",
    "    dataset_id=uuid5(NAMESPACE_OID, \"Pokemon\"),\n",
    "    pipeline_name='pokemon_pipeline',\n",
    ")\n",
    "\n",
    "async for result in results:\n",
    "    print(result)\n",
    "print(\"Done\")\n"
   ],
   "id": "ffa12fc1f5350d95",
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:run_tasks(tasks: [Task], data):Pipeline run started: `fd2ed59d-b550-5b05-bbe6-7b708fe12483`INFO:run_tasks(tasks: [Task], data):Coroutine task started: `add_data_points`"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x300bb3950>\n",
      "User d347ea85-e512-4cae-b9d7-496fe1745424 has registered.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/lazar/PycharmProjects/cognee/cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py:79: SAWarning: This declarative base already contains a class with the same class name and module name as cognee.infrastructure.databases.vector.pgvector.PGVectorAdapter.PGVectorDataPoint, and will be replaced in the string-lookup table.\n",
      "  class PGVectorDataPoint(Base):\n",
      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"/Users/lazar/PycharmProjects/cognee/cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py:113: SAWarning: This declarative base already contains a class with the same class name and module name as cognee.infrastructure.databases.vector.pgvector.PGVectorAdapter.PGVectorDataPoint, and will be replaced in the string-lookup table.\n",
      "  class PGVectorDataPoint(Base):\n",
      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 8, column: 16, offset: 335} for query: '\\n        UNWIND $nodes AS node\\n        MERGE (n {id: node.node_id})\\n        ON CREATE SET n += node.properties, n.updated_at = timestamp()\\n        ON MATCH SET n += node.properties, n.updated_at = timestamp()\\n        WITH n, node.node_id AS label\\n        CALL apoc.create.addLabels(n, [label]) YIELD node AS labeledNode\\n        RETURN ID(labeledNode) AS internal_id, labeledNode.id AS nodeId\\n        'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 1, column: 18, offset: 17} for query: 'MATCH (n) RETURN ID(n) AS id, labels(n) AS labels, properties(n) AS properties'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 16, offset: 43} for query: '\\n        MATCH (n)-[r]->(m)\\n        RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n        'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 33, offset: 60} for query: '\\n        MATCH (n)-[r]->(m)\\n        RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n        'INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:run_tasks(tasks: [Task], data):Coroutine task completed: `add_data_points`INFO:run_tasks(tasks: [Task], data):Pipeline run completed: `fd2ed59d-b550-5b05-bbe6-7b708fe12483`"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x30016fd40>\n",
      "Done\n"
     ]
    }
   ],
   "execution_count": 14
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "##  Search Pokémon Data\n",
    "### Execute a search query using Cognee.\n"
   ],
   "id": "e0d98d9832a2797a"
  },
  {
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-03-04T12:00:02.878871Z",
     "start_time": "2025-03-04T11:59:59.571965Z"
    }
   },
   "cell_type": "code",
   "source": [
    "from cognee.api.v1.search import SearchType\n",
    "\n",
    "search_results = await cognee.search(\n",
    "    query_type=SearchType.GRAPH_COMPLETION,\n",
    "    query_text=\"pokemons?\"\n",
    ")\n",
    "\n",
    "print(\"Search results:\")\n",
    "for result_text in search_results:\n",
    "    print(result_text)"
   ],
   "id": "bb2476b6b0c2aff",
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 1, column: 18, offset: 17} for query: 'MATCH (n) RETURN ID(n) AS id, labels(n) AS labels, properties(n) AS properties'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 16, offset: 43} for query: '\\n        MATCH (n)-[r]->(m)\\n        RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n        'WARNING:neo4j.notifications:Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated function: `id`.} {position: line: 3, column: 33, offset: 60} for query: '\\n        MATCH (n)-[r]->(m)\\n        RETURN ID(n) AS source, ID(m) AS target, TYPE(r) AS type, properties(r) AS properties\\n        'INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\u001B[92m13:00:02 - LiteLLM:INFO\u001B[0m: utils.py:2784 - \n",
      "LiteLLM completion() model= gpt-4o-mini; provider = openaiINFO:LiteLLM:\n",
      "LiteLLM completion() model= gpt-4o-mini; provider = openai"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Search results:\n",
      "The Pokemons mentioned are: golbat, jigglypuff, raichu, vulpix, and pikachu.\n"
     ]
    }
   ],
   "execution_count": 15
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": "",
   "id": "a4c2d3e9c15b017"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [tool.poetry]
 name = "cognee"
-version = "0.1.32"
+version = "0.1.33"
 description = "Cognee - is a library for enriching LLM context with a semantic layer for better understanding and reasoning."
 authors = ["Vasilije Markovic", "Boris Arzentar"]
 readme = "README.md"