feat: adds kuzu support to crewAI demo

feat: ingest pr comments with graphql (#835 )
## Description  ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>
2025-05-26 15:06:45 +02:00 · 2025-05-19 15:13:11 +02:00 · 2025-05-16 17:42:51 +02:00 · 2025-05-16 17:41:48 +02:00 · 2025-05-16 10:22:17 +02:00 · 2025-05-16 10:17:44 +02:00
37 changed files with 2238 additions and 28 deletions
--- a/cognee/api/v1/search/search.py
+++ b/cognee/api/v1/search/search.py
@ -1,7 +1,9 @@
-from typing import Union
+from typing import Union, Optional, Type, List
-from cognee.modules.users.models import User
+from cognee.infrastructure.engine.models.DataPoint import DataPoint
 from cognee.modules.search.types import SearchType
 from cognee.modules.users.exceptions import UserNotFoundError
 from cognee.modules.users.models import User
 from cognee.modules.users.methods import get_default_user
 from cognee.modules.search.methods import search as search_function
@ -13,6 +15,8 @@ async def search(
    datasets: Union[list[str], str, None] = None,
    system_prompt_path: str = "answer_simple_question.txt",
    top_k: int = 10,
    node_type: Optional[Type] = None,
    node_name: List[Optional[str]] = None,
 ) -> list:
    # We use lists from now on for datasets
    if isinstance(datasets, str):
@ -28,6 +32,8 @@ async def search(
        user,
        system_prompt_path=system_prompt_path,
        top_k=top_k,
        node_type=node_type,
        node_name=node_name,
    )
    return filtered_search_results
--- a/cognee/complex_demos/init.py
+++ b/cognee/complex_demos/init.py
--- a/cognee/complex_demos/crewai_demo/README
+++ b/cognee/complex_demos/crewai_demo/README
@ -0,0 +1,42 @@
 README:
 This is a demo project to showcase and test how cognee and CrewAI can work together:
 Short description:
 We simulate the hiring process for a technical role. These are the steps of the pipeline:
 1. First we ingest github data including:
    -commits, comments and other soft skill related information for each of the candidates.
    -source code and other technical skill related information for each of the candidates.
 2. We hire 3 agents to make the decision using cognee's memory engine
    1 - HR Expert Agent focusing on soft skills:
        - Analyzes the communication skills, clarity, engagement a kindness based on the commits, comments and github communication of the candidates.
            -To analyze the soft skills of the candidates, the agent performs multiple searches using cognee.search
            -The subgraph that the agent can use is limited to the "soft" nodeset subgraph
        - Scores each candidate from 0 to 1 and gives reasoning
    2 - Technical Expert Agent focusing on technical skills:
        - Analyzes strictly code related and technical skills based on github commits and pull requests of the candidates.
            - To analyze the technical skills of the candidates, the agent performs multiple searches using cognee.search
            - The subgraph that the agent can use is limited to the "techical" nodeset subgraph
        - Scores each candidate from 0 to 1 and gives reasoning
    3 - CEO/CTO agent who makes the final decision:
        - Given the output of the HR expert and Technical expert agents, the decision maker agent makes the final decision about the hiring procedure.
        - The agent will choose the best candidate to hire, and will give reasoning for each of the candidates (why hire/no_hire).
 The following tools were implemented:
    - Cognee build: cognifies the added data (Preliminary task, therefore it is not performed by agents.)
    - Cognee search: searches the cognee memory, limiting the subgraph using the nodeset subgraph retriever (Used by many agents)
        - In the case of technical and soft skills agents the tool gets instantiated with the restricted nodeset search capability
 The three agents are working together to simulate a hiring process, evaluating soft and technical skills, while the CEO/CTO agent
 makes the final decision (HIRE/NOHIRE) based on the outputs of the evaluation agents.
 Works from IDE and not from CLI for now.
--- a/cognee/complex_demos/crewai_demo/init.py
+++ b/cognee/complex_demos/crewai_demo/init.py
--- a/cognee/complex_demos/crewai_demo/knowledge/user_preference.txt
+++ b/cognee/complex_demos/crewai_demo/knowledge/user_preference.txt
@ -0,0 +1,4 @@
 User name is John Doe.
 User is an AI Engineer.
 User is interested in AI Agents.
 User is based in San Francisco, California.
--- a/cognee/complex_demos/crewai_demo/pyproject.toml
+++ b/cognee/complex_demos/crewai_demo/pyproject.toml
@ -0,0 +1,19 @@
 [project]
 name = "crewai_demo"
 version = "0.1.0"
 description = "Cognee crewAI demo"
 authors = [{ name = "Laszlo Hajdu", email = "laszlo@topoteretes.com" }]
 requires-python = ">=3.10,<3.13"
 dependencies = [
    "crewai[tools]>=0.114.0,<1.0.0"
 ]
 [project.scripts]
 run_crew = "association_layer_demo.main:run"
 [build-system]
 requires = ["hatchling"]
 build-backend = "hatchling.build"
 [tool.crewai]
 type = "crew"
--- a/cognee/complex_demos/crewai_demo/src/init.py
+++ b/cognee/complex_demos/crewai_demo/src/init.py
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/init.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/init.py
@ -0,0 +1,5 @@
 from .github_dev_profile import GitHubDevProfile
 from .github_dev_comments import GitHubDevComments
 from .github_dev_commits import GitHubDevCommits
 __all__ = ["GitHubDevProfile", "GitHubDevComments", "GitHubDevCommits"]
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/config/agents.yaml
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/config/agents.yaml
@ -0,0 +1,32 @@
 soft_skills_expert_agent:
    role: >
      Focused on communication, collaboration, and documentation excellence.
    goal: >
      Evaluate README clarity, issue discussions, and community engagement to score
      communication clarity and open-source culture participation.
    backstory: >
      You are an active OSS community manager who values clear writing, inclusive
      discussion, and strong documentation. You look for evidence of empathy,
      responsiveness, and collaborative spirit.
 technical_expert_agent:
    role: >
      Specialized in evaluating technical skills and code quality.
    goal: >
      Analyze repository metadata and commit histories to score coding diversity,
      depth of contributions, and commit quality.
    backstory: >
      You are a seasoned software architect and open-source maintainer. You deeply
      understand python code structure, language ecosystems, and best practices.
      Your mission is to objectively rate each candidate’s technical excellence.
 decision_maker_agent:
    role: >
      CTO/CEO-level decision maker who integrates expert feedback.
    goal: >
      Read the technical and soft-skills evaluations and decide whether to hire
      each candidate, justifying the decision.
    backstory: >
      You are the company’s CTO. You balance technical requirements, team culture,
      and long-term vision. You weigh both skill scores and communication ratings
      to make a final hire/no-hire call.
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/config/tasks.yaml
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/config/tasks.yaml
@ -0,0 +1,153 @@
 soft_skills_assessment_applicant1_task:
  description: >
    Search cognee for comments authored by '{applicant_1}'.
    Use the "search_from_cognee" tool to collect information.
    Evaluate their communication clarity, community engagement, and kindness.
    Ask multiple questions if needed to uncover diverse interactions.
    Return a complete and reasoned assessment of their soft skills.
    --- Example Output ---
        Input:
          applicant_1: Sarah Jennings
        Output:
          - Name: Sarah Jennings
          - communication_clarity: 0.92
          - community_engagement: 0.88
          - kindness: 0.95
          - reasoning: >
              Sarah consistently communicates with clarity and structure. In several threads, her responses broke down complex issues into actionable steps,
              showing strong explanatory skills. She uses inclusive language like “let’s”, “we should”, and frequently thanks others for their input,
              which indicates a high degree of kindness. Sarah also initiates or joins collaborative threads, offering feedback or connecting people with
              relevant documentation. Her tone is encouraging and non-defensive, even when correcting misunderstandings. These patterns were observed across
              over 8 threads involving different team members over a 3-week span.
  expected_output: >
    - Name: {applicant_1}
    - communication_clarity (0–1)
    - community_engagement (0–1)
    - kindness (0–1)
    - reasoning: (string)
  agent: soft_skills_expert_agent
 soft_skills_assessment_applicant2_task:
  description: >
    Search cognee for comments authored by '{applicant_2}'.
    Use the "search_from_cognee" tool to collect information.
    Evaluate their communication clarity, community engagement, and kindness.
    Ask multiple questions if needed to uncover diverse interactions.
    Return a complete and reasoned assessment of their soft skills.
    --- Example Output ---
        Input:
          applicant_1: Sarah Jennings
        Output:
          - Name: Sarah Jennings
          - communication_clarity: 0.92
          - community_engagement: 0.88
          - kindness: 0.95
          - reasoning: >
              Sarah consistently communicates with clarity and structure. In several threads, her responses broke down complex issues into actionable steps,
              showing strong explanatory skills. She uses inclusive language like “let’s”, “we should”, and frequently thanks others for their input,
              which indicates a high degree of kindness. Sarah also initiates or joins collaborative threads, offering feedback or connecting people with
              relevant documentation. Her tone is encouraging and non-defensive, even when correcting misunderstandings. These patterns were observed across
              over 8 threads involving different team members over a 3-week span.
  expected_output: >
    - Name: {applicant_2}
    - communication_clarity (0–1)
    - community_engagement (0–1)
    - kindness (0–1)
    - reasoning: (string)
  agent: soft_skills_expert_agent
 technical_assessment_applicant1_task:
  description: >
    Analyze the repository metadata and commit history associated with '{applicant_1}'.
    Use the "search_from_cognee" tool to collect information.
    Score their code_diversity, depth_of_contribution, and commit_quality.
    Base your assessment strictly on technical input—ignore soft skills.
    --- Example Output ---
        Input:
          applicant_1: Daniel Murphy
        Output:
          - Name: Daniel Murphy
          - code_diversity: 0.87
          - depth_of_contribution: 0.91
          - commit_quality: 0.83
          - reasoning: >
              Daniel contributed to multiple areas of the codebase including frontend UI components, backend API endpoints, test coverage,
              and CI/CD configuration. His commit history spans over 6 weeks with consistent activity and includes thoughtful messages
              (e.g., “refactor auth flow to support multi-tenant login” or “add unit tests for pricing logic edge cases”).
              His pull requests often include both implementation and tests, showing technical completeness.
              Several commits show iterative problem-solving and cleanup after peer feedback, indicating thoughtful collaboration
              and improvement over time.
  expected_output: >
    - Name: {applicant_1}
    - code_diversity (0–1)
    - depth_of_contribution (0–1)
    - commit_quality (0–1)
    - reasoning: (string)
  agent: technical_expert_agent
 technical_assessment_applicant2_task:
  description: >
    Analyze the repository metadata and commit history associated with '{applicant_2}'.
    Use the "search_from_cognee" tool to collect information.
    Score their code_diversity, depth_of_contribution, and commit_quality.
    Base your assessment strictly on technical input—ignore soft skills.
    --- Example Output ---
        Input:
          applicant_1: Daniel Murphy
        Output:
          - Name: Daniel Murphy
          - code_diversity: 0.87
          - depth_of_contribution: 0.91
          - commit_quality: 0.83
          - reasoning: >
              Daniel contributed to multiple areas of the codebase including frontend UI components, backend API endpoints, test coverage,
              and CI/CD configuration. His commit history spans over 6 weeks with consistent activity and includes thoughtful messages
              (e.g., “refactor auth flow to support multi-tenant login” or “add unit tests for pricing logic edge cases”).
              His pull requests often include both implementation and tests, showing technical completeness.
              Several commits show iterative problem-solving and cleanup after peer feedback, indicating thoughtful collaboration
              and improvement over time.
  expected_output: >
    - Name: {applicant_2}
    - code_diversity (0–1)
    - depth_of_contribution (0–1)
    - commit_quality (0–1)
    - reasoning: (string)
  agent: technical_expert_agent
 hiring_decision_task:
  description: >
    Review the technical and soft skill assessment task outputs for candidates: -{applicant_1} and -{applicant_2},
    then decide HIRE or NO_HIRE for each candidate with a detailed reasoning.
    The people to evaluate are:
      -{applicant_1}
      -{applicant_2}
    We have to hire one of them.
    Prepare the final output for the ingest_hiring_decision_task.
  expected_output: >
    A string strictly containing the following for each person:
      - Person
      - decision: "HIRE" or "NO_HIRE",
      - reasoning: (string)
  agent: decision_maker_agent
 ingest_hiring_decision_task:
  description: >
    Take the final hiring decision from the hiring_decision_task report and ingest it into Cognee using the "ingest_report_to_cognee" tool.
    Do not re-evaluate—just save the result using the tool you have.
  expected_output: >
    - confirmation: string message confirming successful ingestion into Cognee
  agent: decision_maker_agent
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/custom_tools/cognee_build.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/custom_tools/cognee_build.py
@ -0,0 +1,36 @@
 from crewai.tools import BaseTool
 class CogneeBuild(BaseTool):
    name: str = "Cognee Build"
    description: str = "Creates a memory and builds a knowledge graph using cognee."
    def _run(self, inputs) -> str:
        import cognee
        import asyncio
        async def main():
            try:
                await cognee.prune.prune_data()
                await cognee.prune.prune_system(metadata=True)
                for meta in inputs.values():
                    text = meta["file_content"]
                    node_set = meta["nodeset"]
                    await cognee.add(text, node_set=node_set)
                await cognee.cognify()
                return "Knowledge Graph is done."
            except Exception as e:
                return f"Error: {str(e)}"
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                loop = asyncio.new_event_loop()
                asyncio.set_event_loop(loop)
            results = loop.run_until_complete(main())
            return results
        except Exception as e:
            return f"Tool execution error: {str(e)}"
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/custom_tools/cognee_ingestion.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/custom_tools/cognee_ingestion.py
@ -0,0 +1,47 @@
 from crewai.tools import BaseTool
 from typing import Type, List
 from pydantic import BaseModel, Field, PrivateAttr
 from cognee.modules.engine.models import NodeSet
 import asyncio
 class CogneeIngestionInput(BaseModel):
    text: str = Field(
        "",
        description="The text of the report The format you should follow is {'text': 'your report'}",
    )
 class CogneeIngestion(BaseTool):
    name: str = "ingest_report_to_cognee"
    description: str = "This tool can be used to ingest the final hiring report into cognee"
    args_schema: Type[BaseModel] = CogneeIngestionInput
    _nodeset_name: str
    def __init__(self, nodeset_name: str, **kwargs):
        super().__init__(**kwargs)
        self._nodeset_name = nodeset_name
    def _run(self, text: str) -> str:
        import cognee
        from secrets import choice
        from string import ascii_letters, digits
        async def main():
            try:
                hash6 = "".join(choice(ascii_letters + digits) for _ in range(6))
                await cognee.add(text, node_set=[self._nodeset_name], dataset_name=hash6)
                await cognee.cognify(datasets=hash6)
                return "Report ingested successfully into Cognee memory."
            except Exception as e:
                return f"Error during ingestion: {str(e)}"
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                loop = asyncio.new_event_loop()
                asyncio.set_event_loop(loop)
            return loop.run_until_complete(main())
        except Exception as e:
            return f"Tool execution error: {str(e)}"
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/custom_tools/cognee_search.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/custom_tools/cognee_search.py
@ -0,0 +1,56 @@
 from crewai.tools import BaseTool
 from typing import Type
 from pydantic import BaseModel, Field, PrivateAttr
 from cognee.modules.engine.models import NodeSet
 class CogneeSearchInput(BaseModel):
    query: str = Field(
        "",
        description="The natural language question to ask the memory engine."
        "The format you should follow is {'query': 'your query'}",
    )
 class CogneeSearch(BaseTool):
    name: str = "search_from_cognee"
    description: str = (
        "Use this tool to search the Cognee memory graph. "
        "Provide a natural language query that describes the information you want to retrieve, "
        "such as comments authored or files changes by a specific person."
    )
    args_schema: Type[BaseModel] = CogneeSearchInput
    _nodeset_name: str = PrivateAttr()
    def __init__(self, nodeset_name: str, **kwargs):
        super().__init__(**kwargs)
        self._nodeset_name = nodeset_name
    def _run(self, query: str) -> str:
        import asyncio
        from cognee.modules.retrieval.graph_completion_retriever import GraphCompletionRetriever
        async def main():
            try:
                print(query)
                search_results = await GraphCompletionRetriever(
                    top_k=5,
                    node_type=NodeSet,
                    node_name=[self._nodeset_name],
                ).get_context(query=query)
                return search_results
            except Exception as e:
                return f"Error: {str(e)}"
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                loop = asyncio.new_event_loop()
                asyncio.set_event_loop(loop)
            search_results = loop.run_until_complete(main())
            return search_results
        except Exception as e:
            return f"Tool execution error: {str(e)}"
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/custom_tools/github_ingestion.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/custom_tools/github_ingestion.py
@ -0,0 +1,40 @@
 from crewai.tools import BaseTool
 from cognee.modules.engine.models import NodeSet
 from cognee.modules.retrieval.graph_completion_retriever import GraphCompletionRetriever
 from ..github_ingest_datapoints import cognify_github_data_from_username
 class GithubIngestion(BaseTool):
    name: str = "Github graph builder"
    description: str = "Ingests the github graph of a person into Cognee"
    def _run(self, applicant_1, applicant_2) -> str:
        import asyncio
        import cognee
        import os
        from cognee.low_level import DataPoint, setup as cognee_setup
        async def main():
            try:
                await cognee.prune.prune_data()
                await cognee.prune.prune_system(metadata=True)
                await cognee_setup()
                token = os.getenv("GITHUB_TOKEN")
                await cognify_github_data_from_username(applicant_1, token)
                await cognify_github_data_from_username(applicant_2, token)
                return "Github ingestion finished"
            except Exception as e:
                return f"Error: {str(e)}"
        try:
            loop = asyncio.get_event_loop()
            if loop.is_running():
                loop = asyncio.new_event_loop()
                asyncio.set_event_loop(loop)
            results = loop.run_until_complete(main())
            return results
        except Exception as e:
            return f"Tool execution error: {str(e)}"
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_comment_base.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_comment_base.py
@ -0,0 +1,55 @@
 from abc import ABC, abstractmethod
 import requests
 from cognee.shared.logging_utils import get_logger
 GITHUB_API_URL = "https://api.github.com/graphql"
 logger = get_logger("github_comments")
 class GitHubCommentBase(ABC):
    """Base class for GitHub comment providers."""
    def __init__(self, token, username, limit=10):
        self.token = token
        self.username = username
        self.limit = limit
    def _run_query(self, query: str) -> dict:
        """Executes a GraphQL query against GitHub's API."""
        headers = {"Authorization": f"Bearer {self.token}"}
        response = requests.post(GITHUB_API_URL, json={"query": query}, headers=headers)
        if response.status_code != 200:
            raise Exception(f"Query failed: {response.status_code} - {response.text}")
        return response.json()["data"]
    def get_comments(self):
        """Template method that orchestrates the comment retrieval process."""
        try:
            query = self._build_query()
            data = self._run_query(query)
            raw_comments = self._extract_comments(data)
            return [self._format_comment(item) for item in raw_comments[: self.limit]]
        except Exception as e:
            logger.error(f"Error fetching {self._get_comment_type()} comments: {e}")
            return []
    @abstractmethod
    def _build_query(self) -> str:
        """Builds the GraphQL query string."""
        pass
    @abstractmethod
    def _extract_comments(self, data) -> list:
        """Extracts the comment data from the GraphQL response."""
        pass
    @abstractmethod
    def _format_comment(self, item) -> dict:
        """Formats a single comment."""
        pass
    @abstractmethod
    def _get_comment_type(self) -> str:
        """Returns the type of comment this provider handles."""
        pass
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_comment_providers.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_comment_providers.py
@ -0,0 +1,298 @@
 from datetime import datetime, timedelta
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_comment_base import (
    GitHubCommentBase,
    logger,
 )
 class IssueCommentsProvider(GitHubCommentBase):
    """Provider for GitHub issue comments."""
    QUERY_TEMPLATE = """
    {{
      user(login: "{username}") {{
        issueComments(first: {limit}, orderBy: {{field: UPDATED_AT, direction: DESC}}) {{
          nodes {{
            body
            createdAt
            updatedAt
            url
            issue {{
              number
              title
              url
              repository {{
                nameWithOwner
              }}
              state
            }}
          }}
        }}
      }}
    }}
    """
    def _build_query(self) -> str:
        """Builds the GraphQL query for issue comments."""
        return self.QUERY_TEMPLATE.format(username=self.username, limit=self.limit)
    def _extract_comments(self, data) -> list:
        """Extracts issue comments from the GraphQL response."""
        return data["user"]["issueComments"]["nodes"]
    def _format_comment(self, comment) -> dict:
        """Formats an issue comment from GraphQL."""
        comment_id = comment["url"].split("/")[-1] if comment["url"] else None
        return {
            "repo": comment["issue"]["repository"]["nameWithOwner"],
            "issue_number": comment["issue"]["number"],
            "comment_id": comment_id,
            "body": comment["body"],
            "text": comment["body"],
            "created_at": comment["createdAt"],
            "updated_at": comment["updatedAt"],
            "html_url": comment["url"],
            "issue_url": comment["issue"]["url"],
            "author_association": "COMMENTER",
            "issue_title": comment["issue"]["title"],
            "issue_state": comment["issue"]["state"],
            "login": self.username,
            "type": "issue_comment",
        }
    def _get_comment_type(self) -> str:
        """Returns the comment type for error messages."""
        return "issue"
 class PrReviewsProvider(GitHubCommentBase):
    """Provider for GitHub PR reviews."""
    QUERY_TEMPLATE = """
    {{
      user(login: "{username}") {{
        contributionsCollection {{
          pullRequestReviewContributions(first: {fetch_limit}) {{
            nodes {{
              pullRequestReview {{
                body
                createdAt
                updatedAt
                url
                state
                pullRequest {{
                  number
                  title
                  url
                  repository {{
                    nameWithOwner
                  }}
                  state
                }}
              }}
            }}
          }}
        }}
      }}
    }}
    """
    def __init__(self, token, username, limit=10, fetch_limit=None):
        """Initialize with token, username, and optional limits."""
        super().__init__(token, username, limit)
        self.fetch_limit = fetch_limit if fetch_limit is not None else 10 * limit
    def _build_query(self) -> str:
        """Builds the GraphQL query for PR reviews."""
        return self.QUERY_TEMPLATE.format(username=self.username, fetch_limit=self.fetch_limit)
    def _extract_comments(self, data) -> list:
        """Extracts PR reviews from the GraphQL response."""
        contributions = data["user"]["contributionsCollection"]["pullRequestReviewContributions"][
            "nodes"
        ]
        return [
            node["pullRequestReview"] for node in contributions if node["pullRequestReview"]["body"]
        ]
    def _format_comment(self, review) -> dict:
        """Formats a PR review from GraphQL."""
        review_id = review["url"].split("/")[-1] if review["url"] else None
        return {
            "repo": review["pullRequest"]["repository"]["nameWithOwner"],
            "issue_number": review["pullRequest"]["number"],
            "comment_id": review_id,
            "body": review["body"],
            "text": review["body"],
            "created_at": review["createdAt"],
            "updated_at": review["updatedAt"],
            "html_url": review["url"],
            "issue_url": review["pullRequest"]["url"],
            "author_association": "COMMENTER",
            "issue_title": review["pullRequest"]["title"],
            "issue_state": review["pullRequest"]["state"],
            "login": self.username,
            "review_state": review["state"],
            "type": "pr_review",
        }
    def _get_comment_type(self) -> str:
        """Returns the comment type for error messages."""
        return "PR review"
 class PrReviewCommentsProvider(GitHubCommentBase):
    """Provider for GitHub PR review comments (inline code comments)."""
    PR_CONTRIBUTIONS_TEMPLATE = """
    {{
      user(login: "{username}") {{
        contributionsCollection {{
          pullRequestReviewContributions(first: {fetch_limit}) {{
            nodes {{
              pullRequestReview {{
                pullRequest {{
                  number
                  title
                  url
                  repository {{
                    nameWithOwner
                  }}
                  state
                }}
              }}
            }}
          }}
        }}
      }}
    }}
    """
    PR_COMMENTS_TEMPLATE = """
    {{
      repository(owner: "{owner}", name: "{repo}") {{
        pullRequest(number: {pr_number}) {{
          reviews(first: {reviews_limit}, author: "{username}") {{
            nodes {{
              comments(first: {comments_limit}) {{
                nodes {{
                  body
                  createdAt
                  updatedAt
                  url
                }}
              }}
            }}
          }}
        }}
      }}
    }}
    """
    def __init__(
        self,
        token,
        username,
        limit=10,
        fetch_limit=None,
        reviews_limit=None,
        comments_limit=None,
        pr_limit=None,
    ):
        """Initialize with token, username, and optional limits."""
        super().__init__(token, username, limit)
        self.fetch_limit = fetch_limit if fetch_limit is not None else 4 * limit
        self.reviews_limit = reviews_limit if reviews_limit is not None else 2 * limit
        self.comments_limit = comments_limit if comments_limit is not None else 3 * limit
        self.pr_limit = pr_limit if pr_limit is not None else 2 * limit
    def _build_query(self) -> str:
        """Builds the GraphQL query for PR contributions."""
        return self.PR_CONTRIBUTIONS_TEMPLATE.format(
            username=self.username, fetch_limit=self.fetch_limit
        )
    def _extract_comments(self, data) -> list:
        """Extracts PR review comments using a two-step approach."""
        prs = self._get_reviewed_prs(data)
        return self._fetch_comments_for_prs(prs)
    def _get_reviewed_prs(self, data) -> list:
        """Gets a deduplicated list of PRs the user has reviewed."""
        contributions = data["user"]["contributionsCollection"]["pullRequestReviewContributions"][
            "nodes"
        ]
        unique_prs = []
        for node in contributions:
            pr = node["pullRequestReview"]["pullRequest"]
            if not any(existing_pr["url"] == pr["url"] for existing_pr in unique_prs):
                unique_prs.append(pr)
        return unique_prs[: min(self.pr_limit, len(unique_prs))]
    def _fetch_comments_for_prs(self, prs) -> list:
        """Fetches inline comments for each PR in the list."""
        all_comments = []
        for pr in prs:
            comments = self._get_comments_for_pr(pr)
            all_comments.extend(comments)
        return all_comments
    def _get_comments_for_pr(self, pr) -> list:
        """Fetches the inline comments for a specific PR."""
        owner, repo = pr["repository"]["nameWithOwner"].split("/")
        pr_query = self.PR_COMMENTS_TEMPLATE.format(
            owner=owner,
            repo=repo,
            pr_number=pr["number"],
            username=self.username,
            reviews_limit=self.reviews_limit,
            comments_limit=self.comments_limit,
        )
        try:
            pr_comments = []
            pr_data = self._run_query(pr_query)
            reviews = pr_data["repository"]["pullRequest"]["reviews"]["nodes"]
            for review in reviews:
                for comment in review["comments"]["nodes"]:
                    comment["_pr_data"] = pr
                    pr_comments.append(comment)
            return pr_comments
        except Exception as e:
            logger.error(f"Error fetching comments for PR #{pr['number']}: {e}")
            return []
    def _format_comment(self, comment) -> dict:
        """Formats a PR review comment from GraphQL."""
        pr = comment["_pr_data"]
        comment_id = comment["url"].split("/")[-1] if comment["url"] else None
        return {
            "repo": pr["repository"]["nameWithOwner"],
            "issue_number": pr["number"],
            "comment_id": comment_id,
            "body": comment["body"],
            "text": comment["body"],
            "created_at": comment["createdAt"],
            "updated_at": comment["updatedAt"],
            "html_url": comment["url"],
            "issue_url": pr["url"],
            "author_association": "COMMENTER",
            "issue_title": pr["title"],
            "issue_state": pr["state"],
            "login": self.username,
            "type": "pr_review_comment",
        }
    def _get_comment_type(self) -> str:
        """Returns the comment type for error messages."""
        return "PR review comment"
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_datapoint_creators.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_datapoint_creators.py
@ -0,0 +1,164 @@
 from uuid import uuid5, NAMESPACE_OID
 from typing import Dict, Any, List, Tuple, Optional
 from cognee.low_level import DataPoint
 from cognee.modules.engine.models.node_set import NodeSet
 from cognee.shared.logging_utils import get_logger
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_datapoints import (
    GitHubUser,
    Repository,
    File,
    FileChange,
    Comment,
    Issue,
    Commit,
 )
 logger = get_logger("github_datapoints")
 def create_github_user_datapoint(user_data, nodesets: List[NodeSet]):
    """Creates just the GitHubUser DataPoint object from the user data, with node sets."""
    if not user_data:
        return None
    user_id = uuid5(NAMESPACE_OID, user_data.get("login", ""))
    user = GitHubUser(
        id=user_id,
        name=user_data.get("login", ""),
        bio=user_data.get("bio"),
        company=user_data.get("company"),
        location=user_data.get("location"),
        public_repos=user_data.get("public_repos", 0),
        followers=user_data.get("followers", 0),
        following=user_data.get("following", 0),
        interacts_with=[],
        belongs_to_set=nodesets,
    )
    logger.debug(f"Created GitHubUser with ID: {user_id}")
    return [user] + nodesets
 def create_repository_datapoint(repo_name: str, nodesets: List[NodeSet]) -> Repository:
    """Creates a Repository DataPoint with a consistent ID."""
    repo_id = uuid5(NAMESPACE_OID, repo_name)
    repo = Repository(
        id=repo_id,
        name=repo_name,
        has_issue=[],
        has_commit=[],
        belongs_to_set=nodesets,
    )
    logger.debug(f"Created Repository with ID: {repo_id} for {repo_name}")
    return repo
 def create_file_datapoint(filename: str, repo_name: str, nodesets: List[NodeSet]) -> File:
    """Creates a File DataPoint with a consistent ID."""
    file_key = f"{repo_name}:{filename}"
    file_id = uuid5(NAMESPACE_OID, file_key)
    file = File(id=file_id, filename=filename, repo=repo_name, belongs_to_set=nodesets)
    logger.debug(f"Created File with ID: {file_id} for {filename}")
    return file
 def create_commit_datapoint(
    commit_data: Dict[str, Any], user: GitHubUser, nodesets: List[NodeSet]
 ) -> Commit:
    """Creates a Commit DataPoint with a consistent ID and connection to user."""
    commit_id = uuid5(NAMESPACE_OID, commit_data.get("commit_sha", ""))
    commit = Commit(
        id=commit_id,
        commit_sha=commit_data.get("commit_sha", ""),
        text="Commit message:" + (str)(commit_data.get("commit_message", "")),
        commit_date=commit_data.get("commit_date", ""),
        commit_url=commit_data.get("commit_url", ""),
        author_name=commit_data.get("login", ""),
        repo=commit_data.get("repo", ""),
        has_change=[],
        belongs_to_set=nodesets,
    )
    logger.debug(f"Created Commit with ID: {commit_id} for {commit_data.get('commit_sha', '')}")
    return commit
 def create_file_change_datapoint(
    fc_data: Dict[str, Any], user: GitHubUser, file: File, nodesets: List[NodeSet]
 ) -> FileChange:
    """Creates a FileChange DataPoint with a consistent ID."""
    fc_key = (
        f"{fc_data.get('repo', '')}:{fc_data.get('commit_sha', '')}:{fc_data.get('filename', '')}"
    )
    fc_id = uuid5(NAMESPACE_OID, fc_key)
    file_change = FileChange(
        id=fc_id,
        filename=fc_data.get("filename", ""),
        status=fc_data.get("status", ""),
        additions=fc_data.get("additions", 0),
        deletions=fc_data.get("deletions", 0),
        changes=fc_data.get("changes", 0),
        text=fc_data.get("diff", ""),
        commit_sha=fc_data.get("commit_sha", ""),
        repo=fc_data.get("repo", ""),
        modifies=file.filename,
        changed_by=user,
        belongs_to_set=nodesets,
    )
    logger.debug(f"Created FileChange with ID: {fc_id} for {fc_data.get('filename', '')}")
    return file_change
 def create_issue_datapoint(
    issue_data: Dict[str, Any], repo_name: str, nodesets: List[NodeSet]
 ) -> Issue:
    """Creates an Issue DataPoint with a consistent ID."""
    issue_key = f"{repo_name}:{issue_data.get('issue_number', '')}"
    issue_id = uuid5(NAMESPACE_OID, issue_key)
    issue = Issue(
        id=issue_id,
        number=issue_data.get("issue_number", 0),
        text=issue_data.get("issue_title", ""),
        state=issue_data.get("issue_state", ""),
        repository=repo_name,
        is_pr=False,
        has_comment=[],
        belongs_to_set=nodesets,
    )
    logger.debug(f"Created Issue with ID: {issue_id} for {issue_data.get('issue_title', '')}")
    return issue
 def create_comment_datapoint(
    comment_data: Dict[str, Any], user: GitHubUser, nodesets: List[NodeSet]
 ) -> Comment:
    """Creates a Comment DataPoint with a consistent ID and connection to user."""
    comment_key = f"{comment_data.get('repo', '')}:{comment_data.get('issue_number', '')}:{comment_data.get('comment_id', '')}"
    comment_id = uuid5(NAMESPACE_OID, comment_key)
    comment = Comment(
        id=comment_id,
        comment_id=str(comment_data.get("comment_id", "")),
        text=comment_data.get("body", ""),
        created_at=comment_data.get("created_at", ""),
        updated_at=comment_data.get("updated_at", ""),
        author_name=comment_data.get("login", ""),
        issue_number=comment_data.get("issue_number", 0),
        repo=comment_data.get("repo", ""),
        authored_by=user,
        belongs_to_set=nodesets,
    )
    logger.debug(f"Created Comment with ID: {comment_id}")
    return comment
 def create_github_datapoints(github_data, nodesets: List[NodeSet]):
    """Creates DataPoint objects from GitHub data - simplified to just create user for now."""
    if not github_data:
        return None
    return create_github_user_datapoint(github_data["user"], nodesets)
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_datapoints.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_datapoints.py
@ -0,0 +1,74 @@
 from uuid import uuid5, NAMESPACE_OID
 from typing import Optional, List
 from cognee.infrastructure.engine import DataPoint
 class File(DataPoint):
    """File is now a leaf node without any lists of other DataPoints"""
    filename: str
    repo: str
    metadata: dict = {"index_fields": ["filename"]}
 class GitHubUser(DataPoint):
    name: Optional[str]
    bio: Optional[str]
    company: Optional[str]
    location: Optional[str]
    public_repos: int
    followers: int
    following: int
    interacts_with: List["Repository"] = []
    metadata: dict = {"index_fields": ["name"]}
 class FileChange(DataPoint):
    filename: str
    status: str
    additions: int
    deletions: int
    changes: int
    text: str
    commit_sha: str
    repo: str
    modifies: str
    changed_by: GitHubUser
    metadata: dict = {"index_fields": ["text"]}
 class Comment(DataPoint):
    comment_id: str
    text: str
    created_at: str
    updated_at: str
    author_name: str
    issue_number: int
    repo: str
    authored_by: GitHubUser
    metadata: dict = {"index_fields": ["text"]}
 class Issue(DataPoint):
    number: int
    text: str
    state: str
    repository: str
    is_pr: bool
    has_comment: List[Comment] = []
 class Commit(DataPoint):
    commit_sha: str
    text: str
    commit_date: str
    commit_url: str
    author_name: str
    repo: str
    has_change: List[FileChange] = []
 class Repository(DataPoint):
    name: str
    has_issue: List[Issue] = []
    has_commit: List[Commit] = []
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_dev_comments.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_dev_comments.py
@ -0,0 +1,57 @@
 from github import Github
 from datetime import datetime
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_comment_providers import (
    IssueCommentsProvider,
    PrReviewsProvider,
    PrReviewCommentsProvider,
 )
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_comment_base import logger
 class GitHubDevComments:
    """Facade class for working with a GitHub developer's comments."""
    def __init__(self, profile, limit=10, include_issue_details=True):
        """Initialize with a GitHubDevProfile instance and default parameters."""
        self.profile = profile
        self.limit = limit
        self.include_issue_details = include_issue_details
    def get_issue_comments(self):
        """Fetches the most recent comments made by the user on issues and PRs across repositories."""
        if not self.profile.user:
            logger.warning(f"No user found for profile {self.profile.username}")
            return None
        logger.debug(f"Fetching comments for {self.profile.username} with limit={self.limit}")
        # Create providers with just the basic limit - they will handle their own multipliers
        issue_provider = IssueCommentsProvider(
            self.profile.token, self.profile.username, self.limit
        )
        pr_review_provider = PrReviewsProvider(
            self.profile.token, self.profile.username, self.limit
        )
        pr_comment_provider = PrReviewCommentsProvider(
            self.profile.token, self.profile.username, self.limit
        )
        issue_comments = issue_provider.get_comments()
        pr_reviews = pr_review_provider.get_comments()
        pr_review_comments = pr_comment_provider.get_comments()
        total_comments = issue_comments + pr_reviews + pr_review_comments
        logger.info(
            f"Retrieved {len(total_comments)} comments for {self.profile.username} "
            f"({len(issue_comments)} issue, {len(pr_reviews)} PR reviews, "
            f"{len(pr_review_comments)} PR review comments)"
        )
        return total_comments
    def set_limit(self, limit=None, include_issue_details=None):
        """Sets the limit for comments to retrieve."""
        if limit is not None:
            self.limit = limit
        if include_issue_details is not None:
            self.include_issue_details = include_issue_details
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_dev_commits.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_dev_commits.py
@ -0,0 +1,195 @@
 from github import Github
 from datetime import datetime, timedelta
 class GitHubDevCommits:
    """Class for working with a GitHub developer's commits in pull requests."""
    def __init__(
        self,
        profile,
        days=30,
        prs_limit=10,
        commits_per_pr=5,
        include_files=False,
        skip_no_diff=False,
    ):
        """Initialize with a GitHubDevProfile instance and default parameters."""
        self.profile = profile
        self.days = days
        self.prs_limit = prs_limit
        self.commits_per_pr = commits_per_pr
        self.include_files = include_files
        self.skip_no_diff = skip_no_diff
        self.file_keys = ["filename", "status", "additions", "deletions", "changes", "diff"]
    def get_user_commits(self):
        """Fetches user's most recent commits from pull requests."""
        if not self.profile.user:
            return None
        commits = self._collect_user_pr_commits()
        return {"user": self.profile.get_user_info(), "commits": commits}
    def get_user_file_changes(self):
        """Returns a flat list of file changes with associated commit information from PRs."""
        if not self.profile.user:
            return None
        all_files = []
        commits = self._collect_user_pr_commits(include_files=True)
        for commit in commits:
            if "files" not in commit:
                continue
            commit_info = {
                "repo": commit["repo"],
                "commit_sha": commit["sha"],
                "commit_message": commit["message"],
                "commit_date": commit["date"],
                "commit_url": commit["url"],
                "pr_number": commit.get("pr_number"),
                "pr_title": commit.get("pr_title"),
            }
            file_changes = []
            for file in commit["files"]:
                file_data = {key: file.get(key) for key in self.file_keys}
                file_changes.append({**file_data, **commit_info})
            all_files.extend(file_changes)
        return all_files
    def set_options(
        self, days=None, prs_limit=None, commits_per_pr=None, include_files=None, skip_no_diff=None
    ):
        """Sets commit search parameters."""
        if days is not None:
            self.days = days
        if prs_limit is not None:
            self.prs_limit = prs_limit
        if commits_per_pr is not None:
            self.commits_per_pr = commits_per_pr
        if include_files is not None:
            self.include_files = include_files
        if skip_no_diff is not None:
            self.skip_no_diff = skip_no_diff
    def _get_date_filter(self, days):
        """Creates a date filter string for GitHub search queries."""
        if not days:
            return ""
        date_limit = (datetime.now() - timedelta(days=days)).strftime("%Y-%m-%d")
        return f" created:>={date_limit}"
    def _collect_user_pr_commits(self, include_files=None):
        """Collects and sorts a user's recent commits from pull requests they authored."""
        include_files = include_files if include_files is not None else self.include_files
        prs = self._get_user_prs()
        if not prs:
            return []
        all_commits = []
        for pr in prs[: self.prs_limit]:
            pr_commits = self._get_commits_from_pr(pr, include_files)
            all_commits.extend(pr_commits)
        sorted_commits = sorted(all_commits, key=lambda x: x["date"], reverse=True)
        return sorted_commits
    def _get_user_prs(self):
        """Gets pull requests authored by the user."""
        date_filter = self._get_date_filter(self.days)
        query = f"author:{self.profile.username} is:pr is:merged{date_filter}"
        try:
            return list(self.profile.github.search_issues(query))
        except Exception as e:
            print(f"Error searching for PRs: {e}")
            return []
    def _get_commits_from_pr(self, pr_issue, include_files=None):
        """Gets commits by the user from a specific PR."""
        include_files = include_files if include_files is not None else self.include_files
        pr_info = self._get_pull_request_object(pr_issue)
        if not pr_info:
            return []
        repo_name, pr = pr_info
        all_commits = self._get_all_pr_commits(pr, pr_issue.number)
        if not all_commits:
            return []
        user_commits = [
            c
            for c in all_commits
            if c.author and hasattr(c.author, "login") and c.author.login == self.profile.username
        ]
        commit_data = [
            self._extract_commit_data(commit, repo_name, pr_issue, include_files)
            for commit in user_commits[: self.commits_per_pr]
        ]
        return commit_data
    def _get_pull_request_object(self, pr_issue):
        """Gets repository and pull request objects from an issue."""
        try:
            repo_name = pr_issue.repository.full_name
            repo = self.profile.github.get_repo(repo_name)
            pr = repo.get_pull(pr_issue.number)
            return (repo_name, pr)
        except Exception as e:
            print(f"Error accessing PR #{pr_issue.number}: {e}")
            return None
    def _get_all_pr_commits(self, pr, pr_number):
        """Gets all commits from a pull request."""
        try:
            return list(pr.get_commits())
        except Exception as e:
            print(f"Error retrieving commits from PR #{pr_number}: {e}")
            return None
    def _extract_commit_data(self, commit, repo_name, pr_issue, include_files=None):
        """Extracts relevant data from a commit object within a PR context."""
        commit_data = {
            "repo": repo_name,
            "sha": commit.sha,
            "message": commit.commit.message,
            "date": commit.commit.author.date,
            "url": commit.html_url,
            "pr_number": pr_issue.number,
            "pr_title": pr_issue.title,
            "pr_url": pr_issue.html_url,
        }
        include_files = include_files if include_files is not None else self.include_files
        if include_files:
            commit_data["files"] = self._extract_commit_files(commit)
        return commit_data
    def _extract_commit_files(self, commit):
        """Extracts files changed in a commit, including diffs."""
        files = []
        for file in commit.files:
            if self.skip_no_diff and not file.patch:
                continue
            file_data = {key: getattr(file, key, None) for key in self.file_keys}
            if "diff" in self.file_keys:
                file_data["diff"] = file.patch if file.patch else "No diff available for this file"
            files.append(file_data)
        return files
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_dev_profile.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_dev_profile.py
@ -0,0 +1,96 @@
 from github import Github
 from datetime import datetime
 import json
 import os
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_dev_comments import GitHubDevComments
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_dev_commits import GitHubDevCommits
 class GitHubDevProfile:
    """Class for working with a GitHub developer's profile, commits, and activity."""
    def __init__(self, username, token):
        """Initialize with a username and GitHub API token."""
        self.github = Github(token) if token else Github()
        self.token = token
        self.username = username
        self.user = self._get_user(username)
        self.user_info = self._extract_user_info() if self.user else None
        self.comments = GitHubDevComments(self) if self.user else None
        self.commits = GitHubDevCommits(self) if self.user else None
    def get_user_info(self):
        """Returns the cached user information."""
        return self.user_info
    def get_user_repos(self, limit=None):
        """Returns a list of user's repositories with limit."""
        if not self.user:
            return []
        repos = list(self.user.get_repos())
        if limit:
            repos = repos[:limit]
        return repos
    def get_user_commits(self, days=30, prs_limit=5, commits_per_pr=3, include_files=False):
        """Fetches user's most recent commits from pull requests."""
        if not self.commits:
            return None
        self.commits.set_options(
            days=days,
            prs_limit=prs_limit,
            commits_per_pr=commits_per_pr,
            include_files=include_files,
        )
        return self.commits.get_user_commits()
    def get_user_file_changes(self, days=30, prs_limit=5, commits_per_pr=3, skip_no_diff=True):
        """Returns a flat list of file changes from PRs with associated commit information."""
        if not self.commits:
            return None
        self.commits.set_options(
            days=days,
            prs_limit=prs_limit,
            commits_per_pr=commits_per_pr,
            include_files=True,
            skip_no_diff=skip_no_diff,
        )
        return self.commits.get_user_file_changes()
    def get_issue_comments(self, limit=10, include_issue_details=True):
        """Fetches the most recent comments made by the user on issues and PRs across repositories."""
        if not self.comments:
            return None
        self.comments.set_limit(
            limit=limit,
            include_issue_details=include_issue_details,
        )
        return self.comments.get_issue_comments()
    def _get_user(self, username):
        """Fetches a GitHub user object."""
        try:
            return self.github.get_user(username)
        except Exception as e:
            print(f"Error connecting to GitHub API: {e}")
            return None
    def _extract_user_info(self):
        """Extracts basic information from a GitHub user object."""
        return {
            "login": self.user.login,
            "name": self.user.name,
            "bio": self.user.bio,
            "company": self.user.company,
            "location": self.user.location,
            "public_repos": self.user.public_repos,
            "followers": self.user.followers,
            "following": self.user.following,
        }
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_ingest.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_ingest.py
@ -0,0 +1,137 @@
 import json
 import asyncio
 import cognee
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_dev_profile import GitHubDevProfile
 def get_github_profile_data(
    username, token=None, days=30, prs_limit=5, commits_per_pr=3, issues_limit=5, max_comments=3
 ):
    """Fetches comprehensive GitHub profile data including user info, commits from PRs, and comments."""
    token = token or ""
    profile = GitHubDevProfile(username, token)
    if not profile.user:
        return None
    commits_result = profile.get_user_commits(
        days=days, prs_limit=prs_limit, commits_per_pr=commits_per_pr, include_files=True
    )
    comments = profile.get_issue_comments(limit=max_comments, include_issue_details=True)
    return {
        "user": profile.get_user_info(),
        "commits": commits_result["commits"] if commits_result else [],
        "comments": comments or [],
    }
 def get_github_file_changes(
    username, token=None, days=30, prs_limit=5, commits_per_pr=3, skip_no_diff=True
 ):
    """Fetches a flat list of file changes from PRs with associated commit information for a GitHub user."""
    token = token or ""
    profile = GitHubDevProfile(username, token)
    if not profile.user:
        return None
    file_changes = profile.get_user_file_changes(
        days=days, prs_limit=prs_limit, commits_per_pr=commits_per_pr, skip_no_diff=skip_no_diff
    )
    return {"user": profile.get_user_info(), "file_changes": file_changes or []}
 def get_github_data_for_cognee(
    username,
    token=None,
    days=30,
    prs_limit=3,
    commits_per_pr=3,
    issues_limit=3,
    max_comments=3,
    skip_no_diff=True,
 ):
    """Fetches enriched GitHub data for a user with PR file changes and comments combined with user data."""
    token = token or ""
    profile = GitHubDevProfile(username, token)
    if not profile.user:
        return None
    user_info = profile.get_user_info()
    file_changes = profile.get_user_file_changes(
        days=days, prs_limit=prs_limit, commits_per_pr=commits_per_pr, skip_no_diff=skip_no_diff
    )
    enriched_file_changes = []
    if file_changes:
        enriched_file_changes = [item | user_info for item in file_changes]
    comments = profile.get_issue_comments(limit=max_comments, include_issue_details=True)
    enriched_comments = []
    if comments:
        enriched_comments = []
        for comment in comments:
            safe_user_info = {k: v for k, v in user_info.items() if k not in comment}
            enriched_comments.append(comment | safe_user_info)
    return {"user": user_info, "file_changes": enriched_file_changes, "comments": enriched_comments}
 async def cognify_github_profile(username, token=None):
    """Ingests GitHub data into Cognee with soft and technical node sets."""
    github_data = get_github_data_for_cognee(username=username, token=token)
    if not github_data:
        return False
    await cognee.add(
        json.dumps(github_data["user"], default=str), node_set=["soft", "technical", username]
    )
    for comment in github_data["comments"]:
        await cognee.add(
            "Comment: " + json.dumps(comment, default=str), node_set=["soft", username]
        )
    for file_change in github_data["file_changes"]:
        await cognee.add(
            "File Change: " + json.dumps(file_change, default=str), node_set=["technical", username]
        )
    await cognee.cognify()
    return True
 async def main(username):
    """Main function for testing Cognee ingest."""
    import os
    import dotenv
    from cognee.api.v1.visualize.visualize import visualize_graph
    dotenv.load_dotenv()
    token = os.getenv("GITHUB_TOKEN")
    await cognify_github_profile(username, token)
    # success = await cognify_github_profile(username, token)
    # if success:
    #     visualization_path = os.path.join(os.path.dirname(__file__), "./.artifacts/github_graph.html")
    #     await visualize_graph(visualization_path)
 if __name__ == "__main__":
    import os
    import dotenv
    dotenv.load_dotenv()
    username = ""
    asyncio.run(main(username))
    # token = os.getenv("GITHUB_TOKEN")
    # github_data = get_github_data_for_cognee(username=username, token=token)
    # print(json.dumps(github_data, indent=2, default=str))
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/github_ingest_datapoints.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/github_ingest_datapoints.py
@ -0,0 +1,300 @@
 import json
 import asyncio
 from uuid import uuid5, NAMESPACE_OID
 from typing import Optional, List, Dict, Any
 from pathlib import Path
 from cognee.api.v1.search import SearchType
 import cognee
 from cognee.low_level import DataPoint, setup as cognee_setup
 from cognee.modules.retrieval.graph_completion_retriever import GraphCompletionRetriever
 from cognee.tasks.storage import add_data_points
 from cognee.modules.pipelines.tasks.task import Task
 from cognee.modules.pipelines import run_tasks
 from cognee.modules.users.methods import get_default_user
 from cognee.modules.engine.models.node_set import NodeSet
 from cognee.shared.logging_utils import get_logger
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_ingest import (
    get_github_data_for_cognee,
 )
 # Import DataPoint classes from github_datapoints.py
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_datapoints import (
    GitHubUser,
    Repository,
    File,
    Commit,
 )
 # Import creator functions from github_datapoint_creators.py
 from cognee.complex_demos.crewai_demo.src.crewai_demo.github_datapoint_creators import (
    create_github_user_datapoint,
    create_repository_datapoint,
    create_file_datapoint,
    create_commit_datapoint,
    create_file_change_datapoint,
    create_issue_datapoint,
    create_comment_datapoint,
 )
 logger = get_logger("github_ingest")
 def collect_repositories(
    section: List[Dict[str, Any]],
    repositories: Dict[str, Repository],
    user: GitHubUser,
    nodesets: List[NodeSet],
 ) -> None:
    """Collect unique repositories from a data section and register them to the user."""
    for entry in section:
        repo_name = entry.get("repo", "")
        if not repo_name or repo_name in repositories:
            continue
        repo = create_repository_datapoint(repo_name, nodesets)
        repositories[repo_name] = repo
        user.interacts_with.append(repo)
 def get_or_create_repository(
    repo_name: str, repositories: Dict[str, Repository], user: GitHubUser, nodesets: List[NodeSet]
 ) -> Repository:
    if repo_name in repositories:
        return repositories[repo_name]
    repo = create_repository_datapoint(repo_name, nodesets)
    repositories[repo_name] = repo
    user.interacts_with.append(repo)
    return repo
 def get_or_create_file(
    filename: str,
    repo_name: str,
    files: Dict[str, File],
    technical_nodeset: NodeSet,
 ) -> File:
    file_key = f"{repo_name}:{filename}"
    if file_key in files:
        return files[file_key]
    file = create_file_datapoint(filename, repo_name, [technical_nodeset])
    files[file_key] = file
    return file
 def get_or_create_commit(
    commit_data: Dict[str, Any],
    user: GitHubUser,
    commits: Dict[str, Commit],
    repository: Repository,
    technical_nodeset: NodeSet,
 ) -> Commit:
    commit_sha = commit_data.get("commit_sha", "")
    if commit_sha in commits:
        return commits[commit_sha]
    commit = create_commit_datapoint(commit_data, user, [technical_nodeset])
    commits[commit_sha] = commit
    link_commit_to_repo(commit, repository)
    return commit
 def link_file_to_repo(file: File, repository: Repository):
    if file not in repository.contains:
        repository.contains.append(file)
 def link_commit_to_repo(commit: Commit, repository: Repository):
    if commit not in repository.has_commit:
        repository.has_commit.append(commit)
 def process_file_changes_data(
    github_data: Dict[str, Any],
    user: GitHubUser,
    repositories: Dict[str, Repository],
    technical_nodeset: NodeSet,
 ) -> List[DataPoint]:
    """Process file changes data and build the graph structure with stronger connections."""
    file_changes = github_data.get("file_changes", [])
    if not file_changes:
        return []
    collect_repositories(file_changes, repositories, user, [technical_nodeset])
    files = {}
    commits = {}
    file_changes_list = []
    for fc_data in file_changes:
        repo_name = fc_data.get("repo", "")
        filename = fc_data.get("filename", "")
        commit_sha = fc_data.get("commit_sha", "")
        if not repo_name or not filename or not commit_sha:
            continue
        repository = get_or_create_repository(repo_name, repositories, user, [technical_nodeset])
        file = get_or_create_file(filename, repo_name, files, technical_nodeset)
        commit = get_or_create_commit(fc_data, user, commits, repository, technical_nodeset)
        file_change = create_file_change_datapoint(fc_data, user, file, [technical_nodeset])
        file_changes_list.append(file_change)
        if file_change not in commit.has_change:
            commit.has_change.append(file_change)
    all_datapoints = list(commits.values()) + file_changes_list
    return all_datapoints
 def process_comments_data(
    github_data: Dict[str, Any],
    user: GitHubUser,
    repositories: Dict[str, Repository],
    technical_nodeset: NodeSet,
    soft_nodeset: NodeSet,
 ) -> List[DataPoint]:
    """Process comments data and build the graph structure with stronger connections."""
    comments_data = github_data.get("comments", [])
    if not comments_data:
        return []
    collect_repositories(comments_data, repositories, user, [soft_nodeset])
    issues = {}
    comments_list = []
    for comment_data in comments_data:
        repo_name = comment_data.get("repo", "")
        issue_number = comment_data.get("issue_number", 0)
        if not repo_name or not issue_number:
            continue
        repository = get_or_create_repository(repo_name, repositories, user, [soft_nodeset])
        issue_key = f"{repo_name}:{issue_number}"
        if issue_key not in issues:
            issue = create_issue_datapoint(comment_data, repo_name, [soft_nodeset])
            issues[issue_key] = issue
            if issue not in repository.has_issue:
                repository.has_issue.append(issue)
        comment = create_comment_datapoint(comment_data, user, [soft_nodeset])
        comments_list.append(comment)
        if comment not in issues[issue_key].has_comment:
            issues[issue_key].has_comment.append(comment)
    all_datapoints = list(issues.values()) + comments_list
    return all_datapoints
 def build_github_datapoints_from_dict(github_data: Dict[str, Any]):
    """Builds all DataPoints from a GitHub data dictionary."""
    if not github_data or "user" not in github_data:
        return None
    soft_nodeset = NodeSet(id=uuid5(NAMESPACE_OID, "NodeSet:soft"), name="soft")
    technical_nodeset = NodeSet(id=uuid5(NAMESPACE_OID, "NodeSet:technical"), name="technical")
    datapoints = create_github_user_datapoint(
        github_data["user"], [soft_nodeset, technical_nodeset]
    )
    if not datapoints:
        return None
    user = datapoints[0]
    repositories = {}
    file_change_datapoints = process_file_changes_data(
        github_data, user, repositories, technical_nodeset
    )
    comment_datapoints = process_comments_data(
        github_data, user, repositories, technical_nodeset, soft_nodeset
    )
    all_datapoints = (
        datapoints + list(repositories.values()) + file_change_datapoints + comment_datapoints
    )
    return all_datapoints
 async def cognify_github_data(github_data: dict):
    """Process GitHub user, file changes, and comments data from a loaded dictionary."""
    all_datapoints = build_github_datapoints_from_dict(github_data)
    if not all_datapoints:
        logger.error("Failed to create datapoints")
        return False
    dataset_id = uuid5(NAMESPACE_OID, "GitHub")
    cognee_user = await get_default_user()
    tasks = [Task(add_data_points, task_config={"batch_size": 50})]
    results = run_tasks(
        tasks=tasks,
        data=all_datapoints,
        dataset_id=dataset_id,
        pipeline_name="github_pipeline",
        user=cognee_user,
    )
    async for result in results:
        print(result)
    logger.info(f"Done processing {len(all_datapoints)} datapoints")
    return True
 async def cognify_github_data_from_username(
    username: str,
    token: Optional[str] = None,
    days: int = 30,
    prs_limit: int = 3,
    commits_per_pr: int = 3,
    issues_limit: int = 3,
    max_comments: int = 3,
    skip_no_diff: bool = True,
 ):
    """Fetches GitHub data for a username and processes it through the DataPoint pipeline."""
    logger.info(f"Fetching GitHub data for user: {username}")
    github_data = get_github_data_for_cognee(
        username=username,
        token=token,
        days=days,
        prs_limit=prs_limit,
        commits_per_pr=commits_per_pr,
        issues_limit=issues_limit,
        max_comments=max_comments,
        skip_no_diff=skip_no_diff,
    )
    if not github_data:
        logger.error(f"Failed to fetch GitHub data for user: {username}")
        return False
    github_data = json.loads(json.dumps(github_data, default=str))
    await cognify_github_data(github_data)
    return None
 async def process_github_from_file(json_file_path: str):
    """Process GitHub data from a JSON file."""
    logger.info(f"Processing GitHub data from file: {json_file_path}")
    try:
        with open(json_file_path, "r") as f:
            github_data = json.load(f)
    except Exception as e:
        logger.error(f"Error loading JSON file: {e}")
        return False
    return await cognify_github_data(github_data)
 if __name__ == "__main__":
    import os
    import dotenv
    dotenv.load_dotenv()
    token = os.getenv("GITHUB_TOKEN")
    username = ""
    async def cognify_from_username(username, token):
        from cognee.infrastructure.databases.relational import create_db_and_tables
        await cognee.prune.prune_data()
        await cognee.prune.prune_system(metadata=True)
        await create_db_and_tables()
        await cognify_github_data_from_username(username, token)
    # Run it
    asyncio.run(cognify_from_username(username, token))
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/hiring_crew.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/hiring_crew.py
@ -0,0 +1,179 @@
 import os
 from crewai import Agent, Crew, Process, Task
 from crewai.project import CrewBase, agent, crew, task, before_kickoff
 from pydantic import BaseModel
 from cognee.complex_demos.crewai_demo.src.crewai_demo.custom_tools.cognee_ingestion import (
    CogneeIngestion,
 )
 from cognee.complex_demos.crewai_demo.src.crewai_demo.custom_tools.cognee_search import CogneeSearch
 from cognee.infrastructure.llm.get_llm_client import get_llm_client
 class AgentConfig(BaseModel):
    role: str
    goal: str
    backstory: str
@CrewBase
 class HiringCrew:
    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"
    def __init__(self, inputs):
        self.inputs = inputs
        self
    @agent
    def soft_skills_expert_agent(self) -> Agent:
        return Agent(
            config=self.agents_config["soft_skills_expert_agent"],
            tools=[CogneeSearch(nodeset_name="soft")],
            verbose=True,
        )
    @agent
    def technical_expert_agent(self) -> Agent:
        return Agent(
            config=self.agents_config["technical_expert_agent"],
            tools=[CogneeSearch(nodeset_name="technical")],
            verbose=True,
        )
    @agent
    def decision_maker_agent(self) -> Agent:
        return Agent(
            config=self.agents_config["decision_maker_agent"],
            tools=[CogneeIngestion(nodeset_name="final_report")],
            verbose=True,
        )
    @task
    def soft_skills_assessment_applicant1_task(self) -> Task:
        self.tasks_config["soft_skills_assessment_applicant1_task"]["description"] = (
            self.tasks_config["soft_skills_assessment_applicant1_task"]["description"].format(
                **self.inputs
            )
        )
        self.tasks_config["soft_skills_assessment_applicant1_task"]["expected_output"] = (
            self.tasks_config["soft_skills_assessment_applicant1_task"]["expected_output"].format(
                **self.inputs
            )
        )
        return Task(
            config=self.tasks_config["soft_skills_assessment_applicant1_task"],
            async_execution=False,
        )
    @task
    def soft_skills_assessment_applicant2_task(self) -> Task:
        self.tasks_config["soft_skills_assessment_applicant2_task"]["description"] = (
            self.tasks_config["soft_skills_assessment_applicant2_task"]["description"].format(
                **self.inputs
            )
        )
        self.tasks_config["soft_skills_assessment_applicant2_task"]["expected_output"] = (
            self.tasks_config["soft_skills_assessment_applicant2_task"]["expected_output"].format(
                **self.inputs
            )
        )
        return Task(
            config=self.tasks_config["soft_skills_assessment_applicant2_task"],
            async_execution=False,
        )
    @task
    def technical_assessment_applicant1_task(self) -> Task:
        self.tasks_config["technical_assessment_applicant1_task"]["description"] = (
            self.tasks_config["technical_assessment_applicant1_task"]["description"].format(
                **self.inputs
            )
        )
        self.tasks_config["technical_assessment_applicant1_task"]["expected_output"] = (
            self.tasks_config["technical_assessment_applicant1_task"]["expected_output"].format(
                **self.inputs
            )
        )
        return Task(
            config=self.tasks_config["technical_assessment_applicant1_task"], async_execution=False
        )
    @task
    def technical_assessment_applicant2_task(self) -> Task:
        self.tasks_config["technical_assessment_applicant2_task"]["description"] = (
            self.tasks_config["technical_assessment_applicant2_task"]["description"].format(
                **self.inputs
            )
        )
        self.tasks_config["technical_assessment_applicant2_task"]["expected_output"] = (
            self.tasks_config["technical_assessment_applicant2_task"]["expected_output"].format(
                **self.inputs
            )
        )
        return Task(
            config=self.tasks_config["technical_assessment_applicant2_task"], async_execution=False
        )
    @task
    def hiring_decision_task(self) -> Task:
        self.tasks_config["hiring_decision_task"]["description"] = self.tasks_config[
            "hiring_decision_task"
        ]["description"].format(**self.inputs)
        self.tasks_config["hiring_decision_task"]["expected_output"] = self.tasks_config[
            "hiring_decision_task"
        ]["expected_output"].format(**self.inputs)
        return Task(config=self.tasks_config["hiring_decision_task"], async_execution=False)
    @task
    def ingest_hiring_decision_task(self) -> Task:
        self.tasks_config["ingest_hiring_decision_task"]["description"] = self.tasks_config[
            "ingest_hiring_decision_task"
        ]["description"].format(**self.inputs)
        self.tasks_config["ingest_hiring_decision_task"]["expected_output"] = self.tasks_config[
            "ingest_hiring_decision_task"
        ]["expected_output"].format(**self.inputs)
        return Task(
            config=self.tasks_config["ingest_hiring_decision_task"],
            async_execution=False,
        )
    def refine_agent_configs(self, agent_name: str = None):
        system_prompt = (
            "You are an expert in improving agent definitions for autonomous AI systems. "
            "Given an agent's role, goal, and backstory, refine them to be:\n"
            "- Concise and well-written\n"
            "- Aligned with the agent’s function\n"
            "- Clear and professional\n"
            "- Consistent with multi-agent teamwork\n\n"
            "Return the updated definition as a JSON object with keys: role, goal, backstory."
        )
        agent_keys = [agent_name] if agent_name else self.agents_config.keys()
        for name in agent_keys:
            agent_def = self.agents_config[name]
            user_prompt = f"""Here is the current agent definition:
                                role: {agent_def["role"]}
                                goal: {agent_def["goal"]}
                                backstory: {agent_def["backstory"]}
                                Please improve it."""
            llm_client = get_llm_client()
            improved = llm_client.create_structured_output(
                text_input=user_prompt, system_prompt=system_prompt, response_model=AgentConfig
            )
            self.agents_config[name] = improved.dict()
    @crew
    def crew(self) -> Crew:
        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential,
            verbose=True,
            share_crew=True,
            output_log_file="hiring_crew_log.txt",
        )
--- a/cognee/complex_demos/crewai_demo/src/crewai_demo/main.py
+++ b/cognee/complex_demos/crewai_demo/src/crewai_demo/main.py
@ -0,0 +1,53 @@
 import warnings
 import os
 from hiring_crew import HiringCrew
 from cognee.complex_demos.crewai_demo.src.crewai_demo.custom_tools.github_ingestion import (
    GithubIngestion,
 )
 warnings.filterwarnings("ignore", category=SyntaxWarning, module="pysbd")
 def print_environment():
    for key in sorted(os.environ):
        print(f"{key}={os.environ[key]}")
 def run_github_ingestion(applicant_1, applicant_2):
    GithubIngestion().run(applicant_1=applicant_1, applicant_2=applicant_2)
 def run_hiring_crew(applicants: dict, number_of_rounds: int = 1, llm_client=None):
    for hiring_round in range(number_of_rounds):
        print(f"\nStarting hiring round {hiring_round + 1}...\n")
        crew = HiringCrew(inputs=applicants)
        if hiring_round > 0:
            print("Refining agent prompts for this round...")
            crew.refine_agent_configs(agent_name="soft_skills_expert_agent")
            crew.refine_agent_configs(agent_name="technical_expert_agent")
            crew.refine_agent_configs(agent_name="decision_maker_agent")
        crew.crew().kickoff()
 def run(enable_ingestion=True, enable_crew=True):
    try:
        print_environment()
        applicants = {"applicant_1": "hajdul88", "applicant_2": "lxobr"}
        if enable_ingestion:
            run_github_ingestion(applicants["applicant_1"], applicants["applicant_2"])
        if enable_crew:
            run_hiring_crew(applicants=applicants, number_of_rounds=5)
    except Exception as e:
        raise Exception(f"An error occurred while running the process: {e}")
 if __name__ == "__main__":
    enable_ingestion = True
    enable_crew = True
    run(enable_ingestion=enable_ingestion, enable_crew=enable_crew)
--- a/cognee/infrastructure/databases/graph/graph_db_interface.py
+++ b/cognee/infrastructure/databases/graph/graph_db_interface.py
@ -2,7 +2,7 @@ import inspect
 from functools import wraps
 from abc import abstractmethod, ABC
 from datetime import datetime, timezone
-from typing import Optional, Dict, Any, List, Tuple
+from typing import Protocol, Optional, Dict, Any, List, Type, Tuple
 from uuid import NAMESPACE_OID, UUID, uuid5
 from cognee.shared.logging_utils import get_logger
 from cognee.infrastructure.engine import DataPoint
@ -189,3 +189,6 @@ class GraphDBInterface(ABC):
    ) -> List[Tuple[NodeData, Dict[str, Any], NodeData]]:
        """Get all nodes connected to a given node with their relationships."""
        raise NotImplementedError
    async def get_nodeset_subgraph(self, node_type, node_name):
        pass
--- a/cognee/infrastructure/databases/graph/kuzu/adapter.py
+++ b/cognee/infrastructure/databases/graph/kuzu/adapter.py
@ -5,7 +5,7 @@ import json
 import os
 import shutil
 import asyncio
-from typing import Dict, Any, List, Union, Optional, Tuple
+from typing import Dict, Any, List, Union, Optional, Tuple, Type
 from datetime import datetime, timezone
 from uuid import UUID
 from contextlib import asynccontextmanager
@ -728,6 +728,66 @@ class KuzuAdapter(GraphDBInterface):
            logger.error(f"Failed to get graph data: {e}")
            raise
    async def get_nodeset_subgraph(
        self, node_type: Type[Any], node_name: List[str]
    ) -> Tuple[List[Tuple[str, dict]], List[Tuple[str, str, str, dict]]]:
        label = node_type.__name__
        primary_query = """
            UNWIND $names AS wantedName
            MATCH (n:Node)
            WHERE n.type = $label AND n.name = wantedName
            RETURN DISTINCT n.id
        """
        primary_rows = await self.query(primary_query, {"names": node_name, "label": label})
        primary_ids = [row[0] for row in primary_rows]
        if not primary_ids:
            return [], []
        neighbor_query = """
            MATCH (n:Node)-[:EDGE]-(nbr:Node)
            WHERE n.id IN $ids
            RETURN DISTINCT nbr.id
        """
        nbr_rows = await self.query(neighbor_query, {"ids": primary_ids})
        neighbor_ids = [row[0] for row in nbr_rows]
        all_ids = list({*primary_ids, *neighbor_ids})
        nodes_query = """
            MATCH (n:Node)
            WHERE n.id IN $ids
            RETURN n.id, n.name, n.type, n.properties
        """
        node_rows = await self.query(nodes_query, {"ids": all_ids})
        nodes: List[Tuple[str, dict]] = []
        for node_id, name, typ, props in node_rows:
            data = {"id": node_id, "name": name, "type": typ}
            if props:
                try:
                    data.update(json.loads(props))
                except json.JSONDecodeError:
                    logger.warning(f"Failed to parse JSON props for node {node_id}")
            nodes.append((node_id, data))
        edges_query = """
            MATCH (a:Node)-[r:EDGE]-(b:Node)
            WHERE a.id IN $ids AND b.id IN $ids
            RETURN a.id, b.id, r.relationship_name, r.properties
        """
        edge_rows = await self.query(edges_query, {"ids": all_ids})
        edges: List[Tuple[str, str, str, dict]] = []
        for from_id, to_id, rel_type, props in edge_rows:
            data = {}
            if props:
                try:
                    data = json.loads(props)
                except json.JSONDecodeError:
                    logger.warning(f"Failed to parse JSON props for edge {from_id}->{to_id}")
            edges.append((from_id, to_id, rel_type, data))
        return nodes, edges
    async def get_filtered_graph_data(
        self, attribute_filters: List[Dict[str, List[Union[str, int]]]]
    ):
--- a/cognee/infrastructure/databases/graph/neo4j_driver/adapter.py
+++ b/cognee/infrastructure/databases/graph/neo4j_driver/adapter.py
@ -6,7 +6,7 @@ import json
 from cognee.shared.logging_utils import get_logger, ERROR
 import asyncio
 from textwrap import dedent
-from typing import Optional, Any, List, Dict
+from typing import Optional, Any, List, Dict, Type, Tuple
 from contextlib import asynccontextmanager
 from uuid import UUID
 from neo4j import AsyncSession
@ -517,6 +517,58 @@ class Neo4jAdapter(GraphDBInterface):
        return (nodes, edges)
    async def get_nodeset_subgraph(
        self, node_type: Type[Any], node_name: List[str]
    ) -> Tuple[List[Tuple[int, dict]], List[Tuple[int, int, str, dict]]]:
        label = node_type.__name__
        query = f"""
        UNWIND $names AS wantedName
        MATCH (n:`{label}`)
        WHERE n.name = wantedName
        WITH collect(DISTINCT n) AS primary
        UNWIND primary AS p
        OPTIONAL MATCH (p)--(nbr)
        WITH primary, collect(DISTINCT nbr) AS nbrs
        WITH primary + nbrs AS nodelist
        UNWIND nodelist AS node
        WITH collect(DISTINCT node) AS nodes
        MATCH (a)-[r]-(b)
        WHERE a IN nodes AND b IN nodes
        WITH nodes, collect(DISTINCT r) AS rels
        RETURN
          [n IN nodes |
             {{ id: n.id,
                properties: properties(n) }}] AS rawNodes,
          [r IN rels  |
             {{ type: type(r),
                properties: properties(r) }}] AS rawRels
        """
        result = await self.query(query, {"names": node_name})
        if not result:
            return [], []
        raw_nodes = result[0]["rawNodes"]
        raw_rels = result[0]["rawRels"]
        nodes = [(n["properties"]["id"], n["properties"]) for n in raw_nodes]
        edges = [
            (
                r["properties"]["source_node_id"],
                r["properties"]["target_node_id"],
                r["type"],
                r["properties"],
            )
            for r in raw_rels
        ]
        return nodes, edges
    async def get_filtered_graph_data(self, attribute_filters):
        """
        Fetches nodes and relationships filtered by specified attribute values.
--- a/cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
+++ b/cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py
@ -250,14 +250,9 @@ class PGVectorAdapter(SQLAlchemyAdapter, VectorDBInterface):
        if len(vector_list) == 0:
            return []
        # Normalize vector distance and add this as score information to vector_list
        normalized_values = normalize_distances(vector_list)
        for i in range(0, len(normalized_values)):
            vector_list[i]["score"] = normalized_values[i]
        # Create and return ScoredResult objects
        return [
-            ScoredResult(id=row.get("id"), payload=row.get("payload"), score=row.get("score"))
+            ScoredResult(id=row.get("id"), payload=row.get("payload"), score=row.get("_distance"))
            for row in vector_list
        ]
--- a/cognee/infrastructure/llm/openai/adapter.py
+++ b/cognee/infrastructure/llm/openai/adapter.py
@ -63,8 +63,7 @@ class OpenAIAdapter(LLMInterface):
            messages=[
                {
                    "role": "user",
-                    "content": f"""Use the given format to
+                    "content": f"""{text_input}""",
                extract information from the following input: {text_input}. """,
                },
                {
                    "role": "system",
@ -91,8 +90,7 @@ class OpenAIAdapter(LLMInterface):
            messages=[
                {
                    "role": "user",
-                    "content": f"""Use the given format to
+                    "content": f"""{text_input}""",
                extract information from the following input: {text_input}. """,
                },
                {
                    "role": "system",
--- a/cognee/modules/engine/models/node_set.py
+++ b/cognee/modules/engine/models/node_set.py
@ -5,4 +5,3 @@ class NodeSet(DataPoint):
    """NodeSet data point."""
    name: str
    metadata: dict = {"index_fields": ["name"]}
--- a/cognee/modules/graph/cognee_graph/CogneeGraph.py
+++ b/cognee/modules/graph/cognee_graph/CogneeGraph.py
@ -1,5 +1,5 @@
 from cognee.shared.logging_utils import get_logger
-from typing import List, Dict, Union
+from typing import List, Dict, Union, Optional, Type
 from cognee.exceptions import InvalidValueError
 from cognee.modules.graph.exceptions import EntityNotFoundError, EntityAlreadyExistsError
@ -61,22 +61,27 @@ class CogneeGraph(CogneeAbstractGraph):
        node_dimension=1,
        edge_dimension=1,
        memory_fragment_filter=[],
        node_type: Optional[Type] = None,
        node_name: List[Optional[str]] = None,
    ) -> None:
        if node_dimension < 1 or edge_dimension < 1:
            raise InvalidValueError(message="Dimensions must be positive integers")
        try:
-            if len(memory_fragment_filter) == 0:
+            if node_type is not None and node_name is not None:
                nodes_data, edges_data = await adapter.get_nodeset_subgraph(
                    node_type=node_type, node_name=node_name
                )
            elif len(memory_fragment_filter) == 0:
                nodes_data, edges_data = await adapter.get_graph_data()
            else:
                nodes_data, edges_data = await adapter.get_filtered_graph_data(
                    attribute_filters=memory_fragment_filter
                )
-            if not nodes_data:
+            if not nodes_data or not edges_data:
-                raise EntityNotFoundError(message="No node data retrieved from the database.")
+                logger.warning("Empty projected graph.")
-            if not edges_data:
+                return None
                raise EntityNotFoundError(message="No edge data retrieved from the database.")
            for node_id, properties in nodes_data:
                node_attributes = {key: properties.get(key) for key in node_properties_to_project}
--- a/cognee/modules/graph/utils/expand_with_nodes_and_edges.py
+++ b/cognee/modules/graph/utils/expand_with_nodes_and_edges.py
@ -144,6 +144,7 @@ def expand_with_nodes_and_edges(
                    is_a=type_node,
                    description=node.description,
                    ontology_valid=ontology_validated_source_ent,
                    belongs_to_set=data_chunk.belongs_to_set,
                )
                added_nodes_map[entity_node_key] = entity_node
--- a/cognee/modules/retrieval/exceptions/exceptions.py
+++ b/cognee/modules/retrieval/exceptions/exceptions.py
@ -24,3 +24,13 @@ class CypherSearchError(CogneeApiError):
 class NoDataError(CriticalError):
    message: str = "No data found in the system, please add data first."
 class CollectionDistancesNotFoundError(CogneeApiError):
    def __init__(
        self,
        message: str = "No collection distances found for the given query.",
        name: str = "CollectionDistancesNotFoundError",
        status_code: int = status.HTTP_404_NOT_FOUND,
    ):
        super().__init__(message, name, status_code)
--- a/cognee/modules/retrieval/graph_completion_retriever.py
+++ b/cognee/modules/retrieval/graph_completion_retriever.py
@ -1,4 +1,4 @@
-from typing import Any, Optional
+from typing import Any, Optional, Type, List
 from collections import Counter
 import string
@ -8,6 +8,9 @@ from cognee.modules.retrieval.base_retriever import BaseRetriever
 from cognee.modules.retrieval.utils.brute_force_triplet_search import brute_force_triplet_search
 from cognee.modules.retrieval.utils.completion import generate_completion
 from cognee.modules.retrieval.utils.stop_words import DEFAULT_STOP_WORDS
 from cognee.shared.logging_utils import get_logger
 logger = get_logger()
 class GraphCompletionRetriever(BaseRetriever):
@ -18,11 +21,15 @@ class GraphCompletionRetriever(BaseRetriever):
        user_prompt_path: str = "graph_context_for_question.txt",
        system_prompt_path: str = "answer_simple_question.txt",
        top_k: Optional[int] = 5,
        node_type: Optional[Type] = None,
        node_name: List[Optional[str]] = None,
    ):
        """Initialize retriever with prompt paths and search parameters."""
        self.user_prompt_path = user_prompt_path
        self.system_prompt_path = system_prompt_path
        self.top_k = top_k if top_k is not None else 5
        self.node_type = node_type
        self.node_name = node_name
    def _get_nodes(self, retrieved_edges: list) -> dict:
        """Creates a dictionary of nodes with their names and content."""
@ -68,7 +75,11 @@ class GraphCompletionRetriever(BaseRetriever):
                            vector_index_collections.append(f"{subclass.__name__}_{field_name}")
        found_triplets = await brute_force_triplet_search(
-            query, top_k=self.top_k, collections=vector_index_collections or None
+            query,
            top_k=self.top_k,
            collections=vector_index_collections or None,
            node_type=self.node_type,
            node_name=self.node_name,
        )
        return found_triplets
@ -78,6 +89,7 @@ class GraphCompletionRetriever(BaseRetriever):
        triplets = await self.get_triplets(query)
        if len(triplets) == 0:
            logger.warning("Empty context was provided to the completion")
            return ""
        return await self.resolve_edges_to_text(triplets)
--- a/cognee/modules/retrieval/utils/brute_force_triplet_search.py
+++ b/cognee/modules/retrieval/utils/brute_force_triplet_search.py
@ -1,5 +1,5 @@
 import asyncio
-from typing import List, Optional
+from typing import List, Optional, Type
 from cognee.shared.logging_utils import get_logger, ERROR
 from cognee.modules.graph.exceptions.exceptions import EntityNotFoundError
@ -55,6 +55,8 @@ def format_triplets(edges):
 async def get_memory_fragment(
    properties_to_project: Optional[List[str]] = None,
    node_type: Optional[Type] = None,
    node_name: List[Optional[str]] = None,
 ) -> CogneeGraph:
    """Creates and initializes a CogneeGraph memory fragment with optional property projections."""
    graph_engine = await get_graph_engine()
@ -68,6 +70,8 @@ async def get_memory_fragment(
            graph_engine,
            node_properties_to_project=properties_to_project,
            edge_properties_to_project=["relationship_name"],
            node_type=node_type,
            node_name=node_name,
        )
    except EntityNotFoundError:
        pass
@ -82,6 +86,8 @@ async def brute_force_triplet_search(
    collections: List[str] = None,
    properties_to_project: List[str] = None,
    memory_fragment: Optional[CogneeGraph] = None,
    node_type: Optional[Type] = None,
    node_name: List[Optional[str]] = None,
 ) -> list:
    if user is None:
        user = await get_default_user()
@ -93,6 +99,8 @@ async def brute_force_triplet_search(
        collections=collections,
        properties_to_project=properties_to_project,
        memory_fragment=memory_fragment,
        node_type=node_type,
        node_name=node_name,
    )
    return retrieved_results
@ -104,6 +112,8 @@ async def brute_force_search(
    collections: List[str] = None,
    properties_to_project: List[str] = None,
    memory_fragment: Optional[CogneeGraph] = None,
    node_type: Optional[Type] = None,
    node_name: List[Optional[str]] = None,
 ) -> list:
    """
    Performs a brute force search to retrieve the top triplets from the graph.
@ -115,6 +125,8 @@ async def brute_force_search(
        collections (Optional[List[str]]): List of collections to query.
        properties_to_project (Optional[List[str]]): List of properties to project.
        memory_fragment (Optional[CogneeGraph]): Existing memory fragment to reuse.
        node_type: node type to filter
        node_name: node name to filter
    Returns:
        list: The top triplet results.
@ -125,7 +137,9 @@ async def brute_force_search(
        raise ValueError("top_k must be a positive integer.")
    if memory_fragment is None:
-        memory_fragment = await get_memory_fragment(properties_to_project)
+        memory_fragment = await get_memory_fragment(
            properties_to_project=properties_to_project, node_type=node_type, node_name=node_name
        )
    if collections is None:
        collections = [
--- a/cognee/modules/search/methods/search.py
+++ b/cognee/modules/search/methods/search.py
@ -1,5 +1,5 @@
 import json
-from typing import Callable
+from typing import Callable, Optional, Type, List
 from cognee.exceptions import InvalidValueError
 from cognee.infrastructure.engine.utils import parse_id
@ -11,6 +11,7 @@ from cognee.modules.retrieval.graph_completion_retriever import GraphCompletionR
 from cognee.modules.retrieval.graph_summary_completion_retriever import (
    GraphSummaryCompletionRetriever,
 )
 from cognee.infrastructure.engine.models.DataPoint import DataPoint
 from cognee.modules.retrieval.code_retriever import CodeRetriever
 from cognee.modules.retrieval.cypher_search_retriever import CypherSearchRetriever
 from cognee.modules.retrieval.natural_language_retriever import NaturalLanguageRetriever
@ -29,12 +30,20 @@ async def search(
    user: User,
    system_prompt_path="answer_simple_question.txt",
    top_k: int = 10,
    node_type: Optional[Type] = None,
    node_name: List[Optional[str]] = None,
 ):
    query = await log_query(query_text, query_type.value, user.id)
    own_document_ids = await get_document_ids_for_user(user.id, datasets)
    search_results = await specific_search(
-        query_type, query_text, user, system_prompt_path=system_prompt_path, top_k=top_k
+        query_type,
        query_text,
        user,
        system_prompt_path=system_prompt_path,
        top_k=top_k,
        node_type=node_type,
        node_name=node_name,
    )
    filtered_search_results = []
@ -57,6 +66,8 @@ async def specific_search(
    user: User,
    system_prompt_path="answer_simple_question.txt",
    top_k: int = 10,
    node_type: Optional[Type] = None,
    node_name: List[Optional[str]] = None,
 ) -> list:
    search_tasks: dict[SearchType, Callable] = {
        SearchType.SUMMARIES: SummariesRetriever(top_k=top_k).get_completion,
@ -69,6 +80,8 @@ async def specific_search(
        SearchType.GRAPH_COMPLETION: GraphCompletionRetriever(
            system_prompt_path=system_prompt_path,
            top_k=top_k,
            node_type=node_type,
            node_name=node_name,
        ).get_completion,
        SearchType.GRAPH_SUMMARY_COMPLETION: GraphSummaryCompletionRetriever(
            system_prompt_path=system_prompt_path, top_k=top_k
Author	SHA1	Message	Date
hajdul88	c9590ef760	feat: adds kuzu support to crewAI demo	2025-05-26 15:06:45 +02:00
lxobr	b4b55b820d	feat: ingest pr comments with graphql (#835 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>	2025-05-19 15:13:11 +02:00
hajdul88	d7d626698d	Changing number of rounds	2025-05-16 17:42:51 +02:00
hajdul88	aecdff0503	CrewAI demo self improving prompts + report ingestion into cognee	2025-05-16 17:41:48 +02:00
hajdul88	4e373cfee7	soft skill assessment tasks broken into apart	2025-05-16 10:22:17 +02:00
lxobr	f1e254f357	feat: fetch all comments (#830 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-16 10:17:44 +02:00
hajdul88	8f5d5b9ac2	Adds dynamic applicants	2025-05-15 18:24:26 +02:00
hajdul88	ce14a441af	First working raw crewai demo prototype	2025-05-15 14:41:46 +02:00
hajdul88	f825732eb2	Merge remote-tracking branch 'origin/dev' into feature/cog-1973-crewai-demo-implementation	2025-05-15 09:25:31 +02:00
hajdul88	ecdf624bda	adds new tool and changes the github ingestion tool	2025-05-14 17:28:25 +02:00
hajdul88	1267f6c1e7	changes github graph for the demo	2025-05-14 17:06:10 +02:00
lxobr	d8fde4c527	feat: Crewai Demo Implementation Datapoints (#821 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.	2025-05-14 14:11:12 +02:00
hajdul88	96d1dd772c	github ingestion with cognify	2025-05-13 17:03:38 +02:00
hajdul88	cc52df94b7	Merge from dev	2025-05-13 15:00:09 +02:00
lxobr	ad9abb8b76	Feature/cog 1971 crewai GitHub ingest (#813 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> `github_dev_profile.py`: - Main coordinator class for GitHub user data - Extracts basic user information - Connects to commits and comments functionality `github_dev_commits.py`: - Extracts PR commit history - Filters commits by time and quantity - Structures commit data with file changes `github_dev_comments.py`: - Extracts user comments from issues - Filters comments by time and quantity - Structures comment data with issue context `github_ingest.py`: - Transforms GitHub data for Cognee - Categorizes data into technical/soft skill nodes - Provides graph visualization support ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Boris <boris@topoteretes.com>	2025-05-12 11:33:33 +02:00
hajdul88	5d4f82fdd4	Implements crewai demo frame with all the agents	2025-04-30 16:26:34 +02:00
hajdul88	8aae9f8dd8	Adds initial crewAI cognee build and basic search with only one agent	2025-04-29 17:10:16 +02:00
hajdul88	cd813c5732	renames project	2025-04-29 15:47:25 +02:00
hajdul88	7456567597	cleans up project description	2025-04-29 15:26:56 +02:00
hajdul88	b29ab72c50	Initial crewAI skeleton	2025-04-29 15:24:44 +02:00
hajdul88	5cbdbf3abf	Initial basic onboarding project	2025-04-29 15:19:40 +02:00
hajdul88	cc4fab9e75	chore: fixing logging and empty subgraph case handling	2025-04-29 15:17:00 +02:00
lxobr	0c1e515c8f	feat: connect entities to node sets	2025-04-29 14:11:26 +02:00
hajdul88	fe83a25576	Subgraph retriever initial	2025-04-29 13:45:40 +02:00