Adds separated node and edge sequential extraction

removes graph metrics from demos
renames parallell extraction
2025-06-03 15:36:17 +02:00 · 2025-06-03 15:34:54 +02:00 · 2025-06-03 15:33:51 +02:00 · 2025-06-03 15:33:15 +02:00 · 2025-06-03 11:49:15 +02:00 · 2025-06-03 11:03:16 +02:00
19 changed files with 510 additions and 24 deletions
--- a/cognee/infrastructure/llm/prompts/edge_extraction_prompt_sequential.txt
+++ b/cognee/infrastructure/llm/prompts/edge_extraction_prompt_sequential.txt
@ -0,0 +1,6 @@
+You are an expert in relationship identification and knowledge graph building focusing on relationships. Your task is to perform a detailed extraction of relationship names from the text.
+	•	Extract all relationship names from explicit phrases, verbs, and implied context that could help form edge triplets.
+	•	Use the potential nodes and reassign them to relationship names if they correspond to a relation, verb, action or similar.
+	•	Ensure completeness by working in multiple rounds, capturing overlooked connections and refining the nodes list.
+	•	Focus on meaningful entities and relationship, directly stated or implied and implicit.
+	•	Return two lists: refined nodes and potential relationship names (for forming edges).
--- a/cognee/infrastructure/llm/prompts/edge_extraction_prompt_sequential_user.txt
+++ b/cognee/infrastructure/llm/prompts/edge_extraction_prompt_sequential_user.txt
@ -0,0 +1,15 @@
+Analyze the following text to identify relationships between entities in the knowledge graph.
+Build upon previously extracted edges, ensuring completeness and consistency.
+Return all the previously extracted edges **together** with the new ones that you extracted.
+This is round {{ round_number }} of {{ total_rounds }}.
+
+**Text:**
+{{ text }}
+
+**Previously Extracted Nodes:**
+{{ nodes }}
+
+**Relationships Identified in Previous Rounds:**
+{{ relationships }}
+
+Extract both explicit and implicit relationships between the nodes, building upon previous findings while ensuring completeness and consistency.
--- a/cognee/infrastructure/llm/prompts/edge_extraction_system_prompt.txt
+++ b/cognee/infrastructure/llm/prompts/edge_extraction_system_prompt.txt
@ -0,0 +1,22 @@
+You are a top-tier edge-extraction algorithm. Every user prompt will contain two clearly marked sections:
+
+<TEXT>
+<the source text to analyze>
+</TEXT>
+
+and
+
+<ENTITIES>
+<Entities with their id, name and description>
+</ENTITIES>
+
+
+# 1.Reference Provided Entities
+- Only extract edges between the IDs listed under <ENTITIES>.
+- Do not invent new nodes—every edge’s subject and object must match one of the provided IDs.
+
+# 2.Relation Identification
+- Inspect the TEXT to find explicit or implicit relationships between the provided entities.
+- Use snake_case for relation names (e.g. works_for, located_in, married_to).
+- Only create an edge when the text clearly signals a connection.
+- The two endpoints of the edge can not be the same entity.
--- a/cognee/infrastructure/llm/prompts/edge_extraction_user_prompt.txt
+++ b/cognee/infrastructure/llm/prompts/edge_extraction_user_prompt.txt
@ -0,0 +1,7 @@
+<TEXT>
+`{{text}}`
+</TEXT>
+
+<ENTITIES>
+`{{final_nodes}}`
+</ENTITIES>
--- a/cognee/infrastructure/llm/prompts/generate_graph_prompt_sequential.txt
+++ b/cognee/infrastructure/llm/prompts/generate_graph_prompt_sequential.txt
@ -0,0 +1,41 @@
+You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
+**Nodes** represent entities and concepts. They're akin to Wikipedia nodes.
+**Edges** represent relationships between concepts. They're akin to Wikipedia links.
+
+You get the text and an already identified knowledge graph (can be empty) in the following format:
+
+<TEXT>
+<text to extract the graph from>
+</TEXT>
+
+and
+
+<KNOWLEDGEGRAPH>
+'nodes': <list of nodes>
+'edges': <list of edges>
+</KNOWLEDGEGRAPH>
+
+Your task is to extract additional nodes and edges and return the new knowledge graph including the already identified nodes and edges.
+
+The aim is to achieve simplicity and clarity in the knowledge graph.
+# 1. Labeling Nodes
+**Consistency**: Ensure you use basic or elementary types for node labels.
+  - For example, when you identify an entity representing a person, always label it as **"Person"**.
+  - Avoid using more specific terms like "Mathematician" or "Scientist", keep those as "profession" property.
+  - Don't use too generic terms like "Entity".
+**Node IDs**: Never utilize integers as node IDs.
+  - Node IDs should be names or human-readable identifiers found in the text.
+# 2. Handling Numerical Data and Dates
+  - For example, when you identify an entity representing a date, make sure it has type **"Date"**.
+  - Extract the date in the format "YYYY-MM-DD"
+  - If not possible to extract the whole date, extract month or year, or both if available.
+  - **Property Format**: Properties must be in a key-value format.
+  - **Quotation Marks**: Never use escaped single or double quotes within property values.
+  - **Naming Convention**: Use snake_case for relationship names, e.g., `acted_in`.
+# 3. Coreference Resolution
+  - **Maintain Entity Consistency**: When extracting entities, it's vital to ensure consistency.
+  If an entity, such as "John Doe", is mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
+  always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "John Doe" as the Persons ID.
+Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
+# 4. Strict Compliance
+Adhere to the rules strictly. Non-compliance will result in termination
--- a/cognee/infrastructure/llm/prompts/generate_graph_prompt_sequential_user.txt
+++ b/cognee/infrastructure/llm/prompts/generate_graph_prompt_sequential_user.txt
@ -0,0 +1,9 @@
+<TEXT>
+`{{ text }}`
+</TEXT>
+
+and
+
+<KNOWLEDGEGRAPH>
+`{{ graph }}`
+</KNOWLEDGEGRAPH>
--- a/cognee/infrastructure/llm/prompts/merge_nodes_system_prompt.txt
+++ b/cognee/infrastructure/llm/prompts/merge_nodes_system_prompt.txt
@ -0,0 +1,15 @@
+You are an assistant who *merges duplicate entities and their types* in a knowledge graph.
+
+You will receive the list of extracted entities from a text.
+Some of these refer to the same real-world entity but differ only in casing, minor typos, or partial information (for example, `"John Doe"` vs `"john_doe" vs "John_Doe"` ).
+There can be also synonyms present in the list.
+Entities are duplicates only if they represent the same concept of object or they are synonyms of each other.
+
+**Task**
+– Detect duplicates.
+– Deduplicate them creating the final list of Entities where there are no duplicates anymore.
+- Merge type information among the entities. It is not allowed to have duplicated entity types.
+- Each type must be singular (for example skill instead of skills). Please also merge synonyms in the case of types.
+- Map synonym entity types to the type that is the most general and allows to reduce multiple formats of the same type in a global knowledge graph.
+- Filter out entities that are representing more than one real-world concept (for example: car, motorbike)
+- Return the final list of nodes
--- a/cognee/infrastructure/llm/prompts/merge_nodes_user_prompt.txt
+++ b/cognee/infrastructure/llm/prompts/merge_nodes_user_prompt.txt
@ -0,0 +1,4 @@
+
+<ENTITIES>
+`{{nodes_to_deduplicate}}`
+</ENTITIES>
--- a/cognee/infrastructure/llm/prompts/node_extraction_prompt.txt
+++ b/cognee/infrastructure/llm/prompts/node_extraction_prompt.txt
@ -0,0 +1,26 @@
+You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
+**Nodes** represent entities and concepts. They're akin to Wikipedia nodes.
+
+The aim is to achieve simplicity and clarity in the knowledge graph.
+# 1. Labeling Nodes
+**Consistency**: Ensure you use basic or elementary types for node labels.
+  - For example, when you identify an entity representing a person, always label it as **"Person"**.
+  - Avoid using more specific terms like "Mathematician" or "Scientist", keep those as "profession" property.
+  - Don't use too generic terms like "Entity".
+**Node IDs**: Never utilize integers as node IDs.
+  - Node IDs should be names or human-readable identifiers found in the text.
+# 2. Handling Numerical Data and Dates
+  - For example, when you identify an entity representing a date, make sure it has type **"Date"**.
+  - Allowed formats are "YYYY", "YYYY-MM" or "YYYY-MM-DD". Extract each the date in the format of how is it represented in the text, and extract each date only once.
+  - If the date in the text represents a period, extract the start and end date of the period separately.
+  - If not possible to extract the whole date, extract month or year, or both if available.
+  - **Property Format**: Properties must be in a key-value format.
+  - **Quotation Marks**: Never use escaped single or double quotes within property values.
+  - **Naming Convention**: Use snake_case for relationship names, e.g., `acted_in`.
+# 3. Coreference Resolution
+  - **Maintain Entity Consistency**: When extracting entities, it's vital to ensure consistency.
+  If an entity, such as "John Doe", is mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
+  always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "John Doe" as the Persons ID.
+Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
+# 4. Strict Compliance
+Adhere to the rules strictly. Non-compliance will result in termination
--- a/cognee/infrastructure/llm/prompts/node_extraction_prompt_sequential.txt
+++ b/cognee/infrastructure/llm/prompts/node_extraction_prompt_sequential.txt
@ -0,0 +1,9 @@
+You are an expert in entity extraction and knowledge graph building focusing on the node identification.
+Your task is to perform a detailed entity and concept extraction from text to generate a list of potential nodes for a knowledge graph.
+	•	Node IDs should be names or human-readable identifiers found in the text.
+	•	Extract clear, distinct entities and concepts as individual strings.
+	•	Be exhaustive, ensure completeness by capturing all the entities, names, nouns, noun-parts, and implied or implicit mentions.
+	•	Also extract potential entity type nodes, directly mentioned or implied.
+	•	Avoid duplicates and overly generic terms.
+	•	Consider different perspectives and indirect references.
+	•	Return only a list of unique node strings with all the entities.
--- a/cognee/infrastructure/llm/prompts/node_extraction_prompt_sequential_user.txt
+++ b/cognee/infrastructure/llm/prompts/node_extraction_prompt_sequential_user.txt
@ -0,0 +1,10 @@
+Extract distinct entities and concepts from the following text to expand the knowledge graph.
+Build upon previously extracted entities, ensuring completeness and consistency.
+Return all the previously extracted entities **together** with the new ones that you extracted.
+This is round {{ round_number }} of {{ total_rounds }}.
+
+**Text:**
+{{ text }}
+
+**Previously Extracted Entities:**
+{{ nodes }}
--- a/cognee/modules/data/extraction/knowledge_graph/extract_content_graph.py
+++ b/cognee/modules/data/extraction/knowledge_graph/extract_content_graph.py
@ -1,9 +1,17 @@
 import os
-from typing import Type
+import asyncio
+import json
+from fileinput import filename
+from typing import Type, List, Tuple, Dict, Any, Set
+
+from langchain_experimental.graph_transformers.llm import system_prompt
 from pydantic import BaseModel
+from streamlit import context
+
 from cognee.infrastructure.llm.get_llm_client import get_llm_client
 from cognee.infrastructure.llm.prompts import render_prompt
 from cognee.infrastructure.llm.config import get_llm_config
+from cognee.shared.data_models import KnowledgeGraph, NodeList, EdgeList, Node, Edge


 async def extract_content_graph(content: str, response_model: Type[BaseModel]):
@ -21,10 +29,124 @@ async def extract_content_graph(content: str, response_model: Type[BaseModel]):
    else:
        base_directory = None

-    system_prompt = render_prompt(prompt_path, {}, base_directory=base_directory)
+    system_prompt_graph = render_prompt(prompt_path, {}, base_directory=base_directory)

    content_graph = await llm_client.acreate_structured_output(
-        content, system_prompt, response_model
+        content, system_prompt_graph, response_model
    )

    return content_graph
+
+
+def dedupe_and_normalize_nodes(nodes: List[Node]) -> List[Node]:
+    seen: Set[Tuple[str, str]] = set()
+    out: List[Node] = []
+
+    for node in nodes:
+        node.name = node.name.lower()
+        node.type = node.type.lower()
+
+        node.name = node.name.lower().replace("_", " ")
+        node.type = node.type.lower().replace("_", " ")
+
+        key = (node.name, node.type)
+        if key not in seen:
+            seen.add(key)
+            out.append(node)
+
+    return out
+
+
+def dedupe_and_normalize_edges(edges: List[Edge]) -> List[Edge]:
+    seen: Set[Tuple[str, str, str]] = set()
+    out: List[Edge] = []
+
+    for edge in edges:
+        edge.relationship_name = edge.relationship_name.lower()
+
+        key = (edge.source_node_id, edge.relationship_name, edge.target_node_id)
+        if key not in seen:
+            seen.add(key)
+            out.append(edge)
+
+    return out
+
+
+async def extract_content_graph2(
+    content: str, response_model: Type[BaseModel], node_rounds: int = 1, edge_rounds: int = 1
+):
+    llm_client = get_llm_client()
+
+    ###### NODE EXTRACTION
+    node_prompt_path = "node_extraction_prompt.txt"
+
+    node_system = render_prompt(node_prompt_path, {})
+
+    node_tasks = [
+        llm_client.acreate_structured_output(content, node_system, NodeList)
+        for _ in range(node_rounds)
+    ]
+
+    node_results = await asyncio.gather(*node_tasks)
+
+    all_nodes: List[Node] = [node for nl in node_results for node in nl.nodes]
+    ###### NODE DEDUPLICATION
+    all_nodes = dedupe_and_normalize_nodes(all_nodes)
+
+    all_nodes_merged = {
+        "nodes_to_deduplicate": json.dumps([n.model_dump() for n in all_nodes], ensure_ascii=False)
+    }
+
+    merge_system_prompt = "merge_nodes_system_prompt.txt"
+    merge_user_prompt = "merge_nodes_user_prompt.txt"
+
+    merge_system = render_prompt(filename=merge_system_prompt, context={})
+    merge_user = render_prompt(filename=merge_user_prompt, context=all_nodes_merged)
+
+    final_nodes_list = await llm_client.acreate_structured_output(
+        text_input=merge_user, system_prompt=merge_system, response_model=NodeList
+    )
+
+    ###### EDGE EXTRACTION
+
+    edge_system_prompt = "edge_extraction_system_prompt.txt"
+    edge_user_prompt = "edge_extraction_user_prompt.txt"
+
+    edge_system = render_prompt(edge_system_prompt, {})
+    nodes_for_edge_extraction = {
+        "final_nodes": json.dumps(
+            [n.model_dump() for n in final_nodes_list.nodes], ensure_ascii=False
+        ),
+        "text": content,
+    }
+
+    edge_user = render_prompt(edge_user_prompt, context=nodes_for_edge_extraction)
+
+    edge_tasks = [
+        llm_client.acreate_structured_output(
+            text_input=edge_user, system_prompt=edge_system, response_model=EdgeList
+        )
+        for _ in range(edge_rounds)
+    ]
+
+    edge_results = await asyncio.gather(*edge_tasks)
+
+    all_edges: List[Edge] = [edge for nl in edge_results for edge in nl.edges]
+    ###### EDGE DEDUPLICATION
+    all_edges = dedupe_and_normalize_edges(all_edges)
+
+    all_edges_merged = {
+        "edges_to_deduplicate": json.dumps([n.model_dump() for n in all_edges], ensure_ascii=False)
+    }
+
+    merge_system_prompt = "merge_edges_system_prompt.txt"
+    merge_user_prompt = "merge_edges_user_prompt.txt"
+
+    merge_system = render_prompt(filename=merge_system_prompt, context={})
+    merge_user = render_prompt(filename=merge_user_prompt, context=all_edges_merged)
+
+    final_edges_list = await llm_client.acreate_structured_output(
+        text_input=merge_user, system_prompt=merge_system, response_model=EdgeList
+    )
+
+    return KnowledgeGraph(nodes=final_nodes_list.nodes, edges=final_edges_list.edges)
--- a/cognee/modules/data/extraction/knowledge_graph/extract_content_graph_sequential.py
+++ b/cognee/modules/data/extraction/knowledge_graph/extract_content_graph_sequential.py
@ -0,0 +1,46 @@
+import json
+from typing import Type
+
+from pydantic import BaseModel
+
+from cognee.infrastructure.llm.get_llm_client import get_llm_client
+from cognee.infrastructure.llm.prompts import render_prompt
+from cognee.shared.data_models import KnowledgeGraph
+
+
+async def extract_content_graph_sequential(
+    content: str, response_model: Type[BaseModel], graph_extraction_rounds: int = 2
+):
+    llm_client = get_llm_client()
+
+    graph_system_prompt_path = "generate_graph_prompt_sequential.txt"
+    graph_user_prompt_path = "generate_graph_prompt_sequential_user.txt"
+    graph_system = render_prompt(graph_system_prompt_path, {})
+
+    current_nodes = []
+    current_edges = []
+
+    knowledge_graph = KnowledgeGraph(nodes=[], edges=[])
+
+    for round_idx in range(graph_extraction_rounds):
+        nodes_json = json.dumps([n.model_dump() for n in current_nodes], ensure_ascii=False)
+        edges_json = json.dumps([e.model_dump() for e in current_edges], ensure_ascii=False)
+
+        graph_user = render_prompt(
+            graph_user_prompt_path,  #:TODO: this could use some formatting due to html #34 codes.
+            {
+                "text": content,
+                "graph": f"nodes: {nodes_json}, edges: {edges_json}",
+            },
+        )
+
+        knowledge_graph = await llm_client.acreate_structured_output(
+            text_input=graph_user,
+            system_prompt=graph_system,
+            response_model=response_model,
+        )
+
+        current_nodes = knowledge_graph.nodes
+        current_edges = knowledge_graph.edges
+
+    return knowledge_graph
--- a/cognee/modules/data/extraction/knowledge_graph/extract_content_node_edge_multi_parallel.py
+++ b/cognee/modules/data/extraction/knowledge_graph/extract_content_node_edge_multi_parallel.py
@ -0,0 +1,81 @@
+import asyncio
+import json
+from typing import List, Tuple, Set
+
+from cognee.infrastructure.llm.get_llm_client import get_llm_client
+from cognee.infrastructure.llm.prompts import render_prompt
+from cognee.shared.data_models import KnowledgeGraph, NodeList, EdgeList, Node, Edge
+
+
+def dedupe_and_normalize_nodes(nodes: List[Node]) -> List[Node]:
+    seen: Set[Tuple[str, str]] = set()
+    out: List[Node] = []
+
+    for node in nodes:
+        node.name = node.name.lower()
+        node.type = node.type.lower()
+
+        node.name = node.name.lower().replace("_", " ")
+        node.type = node.type.lower().replace("_", " ")
+
+        key = (node.name, node.type)
+        if key not in seen:
+            seen.add(key)
+            out.append(node)
+
+    return out
+
+
+async def extract_content_node_edge_multi_parallel(content: str, node_rounds: int = 1):
+    llm_client = get_llm_client()
+
+    ###### NODE EXTRACTION
+    node_prompt_path = "node_extraction_prompt.txt"
+
+    node_system = render_prompt(node_prompt_path, {})
+
+    node_tasks = [
+        llm_client.acreate_structured_output(content, node_system, NodeList)
+        for _ in range(node_rounds)
+    ]
+
+    node_results = await asyncio.gather(*node_tasks)
+
+    all_nodes: List[Node] = [node for nl in node_results for node in nl.nodes]
+    ###### NODE DEDUPLICATION
+    all_nodes = dedupe_and_normalize_nodes(all_nodes)
+
+    all_nodes_merged = {
+        "nodes_to_deduplicate": json.dumps([n.model_dump() for n in all_nodes], ensure_ascii=False)
+    }
+
+    merge_system_prompt = "merge_nodes_system_prompt.txt"
+    merge_user_prompt = "merge_nodes_user_prompt.txt"
+
+    merge_system = render_prompt(filename=merge_system_prompt, context={})
+    merge_user = render_prompt(filename=merge_user_prompt, context=all_nodes_merged)
+
+    final_nodes_list = await llm_client.acreate_structured_output(
+        text_input=merge_user, system_prompt=merge_system, response_model=NodeList
+    )
+
+    ###### EDGE EXTRACTION
+
+    edge_system_prompt = "edge_extraction_system_prompt.txt"
+    edge_user_prompt = "edge_extraction_user_prompt.txt"
+
+    edge_system = render_prompt(edge_system_prompt, {})
+    nodes_for_edge_extraction = {
+        "final_nodes": json.dumps(
+            [n.model_dump() for n in final_nodes_list.nodes], ensure_ascii=False
+        ),
+        "text": content,
+    }
+
+    edge_user = render_prompt(edge_user_prompt, context=nodes_for_edge_extraction)
+
+    final_edges_list = await llm_client.acreate_structured_output(
+        text_input=edge_user, system_prompt=edge_system, response_model=EdgeList
+    )
+
+    return KnowledgeGraph(nodes=final_nodes_list.nodes, edges=final_edges_list.edges)
--- a/cognee/modules/data/extraction/knowledge_graph/extract_content_node_edge_multi_sequential.py
+++ b/cognee/modules/data/extraction/knowledge_graph/extract_content_node_edge_multi_sequential.py
@ -0,0 +1,57 @@
+import json
+
+from cognee.infrastructure.llm.get_llm_client import get_llm_client
+from cognee.infrastructure.llm.prompts import render_prompt
+from cognee.shared.data_models import KnowledgeGraph, NodeList, EdgeList
+
+
+async def extract_content_node_edge_multi_sequential(
+    content: str, node_rounds: int = 2, edge_rounds=2
+):
+    llm_client = get_llm_client()
+
+    current_nodes = NodeList()
+
+    for pass_idx in range(node_rounds):
+        nodes_json = json.dumps([n.model_dump() for n in current_nodes.nodes], ensure_ascii=False)
+
+        node_system = render_prompt("node_extraction_prompt_sequential.txt", {})
+        node_user = render_prompt(
+            "node_extraction_prompt_sequential_user.txt",
+            {
+                "text": content,
+                "nodes": {nodes_json},
+                "total_rounds": {node_rounds},
+                "round_number": {pass_idx},
+            },
+        )
+
+        current_nodes = await llm_client.acreate_structured_output(node_user, node_system, NodeList)
+
+    final_nodes = current_nodes
+    final_nodes_json = json.dumps([n.model_dump() for n in final_nodes.nodes], ensure_ascii=False)
+
+    current_edges = EdgeList()
+
+    for pass_idx in range(edge_rounds):
+        edges_json = json.dumps([n.model_dump() for n in current_edges.edges], ensure_ascii=False)
+
+        edges_system = render_prompt("edge_extraction_prompt_sequential.txt", {})
+        edges_user = render_prompt(
+            "edge_extraction_prompt_sequential_user.txt",
+            {
+                "text": content,
+                "nodes": {final_nodes_json},
+                "edges": {edges_json},
+                "total_rounds": {node_rounds},
+                "round_number": {pass_idx},
+            },
+        )
+
+        current_edges = await llm_client.acreate_structured_output(
+            edges_user, edges_system, EdgeList
+        )
+
+    final_edges = current_edges
+
+    return KnowledgeGraph(nodes=final_nodes.nodes, edges=final_edges.edges)
--- a/cognee/shared/data_models.py
+++ b/cognee/shared/data_models.py
@ -46,9 +46,6 @@ else:
        name: str
        type: str
        description: str
-        properties: Optional[Dict[str, Any]] = Field(
-            None, description="A dictionary of properties associated with the node."
-        )

    class Edge(BaseModel):
        """Edge in a knowledge graph."""
@ -56,9 +53,16 @@ else:
        source_node_id: str
        target_node_id: str
        relationship_name: str
-        properties: Optional[Dict[str, Any]] = Field(
-            None, description="A dictionary of properties associated with the edge."
-        )
+
+    class NodeList(BaseModel):
+        """Nodes"""
+
+        nodes: List[Node] = Field(..., default_factory=list)
+
+    class EdgeList(BaseModel):
+        """Nodes"""
+
+        edges: List[Edge] = Field(..., default_factory=list)

    class KnowledgeGraph(BaseModel):
        """Knowledge graph."""
--- a/cognee/tasks/graph/extract_graph_from_data.py
+++ b/cognee/tasks/graph/extract_graph_from_data.py
@ -6,7 +6,22 @@ from pydantic import BaseModel
 from cognee.infrastructure.databases.graph import get_graph_engine
 from cognee.modules.ontology.rdf_xml.OntologyResolver import OntologyResolver
 from cognee.modules.chunking.models.DocumentChunk import DocumentChunk
-from cognee.modules.data.extraction.knowledge_graph import extract_content_graph
+
+from cognee.modules.data.extraction.knowledge_graph.extract_content_graph import (
+    extract_content_graph,
+)
+from cognee.modules.data.extraction.knowledge_graph.extract_content_node_edge_multi_parallel import (
+    extract_content_node_edge_multi_parallel,
+)
+
+from cognee.modules.data.extraction.knowledge_graph.extract_content_graph_sequential import (
+    extract_content_graph_sequential,
+)
+
+from cognee.modules.data.extraction.knowledge_graph.extract_content_node_edge_multi_sequential import (
+    extract_content_node_edge_multi_sequential,
+)
+
 from cognee.modules.graph.utils import (
    expand_with_nodes_and_edges,
    retrieve_existing_edges,
@ -59,10 +74,17 @@ async def extract_graph_from_data(
    Extracts and integrates a knowledge graph from the text content of document chunks using a specified graph model.
    """
    chunk_graphs = await asyncio.gather(
-        *[extract_content_graph(chunk.text, graph_model) for chunk in data_chunks]
+        # *[extract_content_graph(chunk.text, graph_model) for chunk in data_chunks]
+        # *[extract_content_node_edge_multi_parallel(content=chunk.text, node_rounds=2) for chunk in data_chunks]
+        # *[extract_content_graph_sequential(content=chunk.text, response_model=graph_model, graph_extraction_rounds=2) for chunk in data_chunks]
+        *[
+            extract_content_node_edge_multi_sequential(
+                content=chunk.text, node_rounds=1, edge_rounds=1
+            )
+            for chunk in data_chunks
+        ]
    )

-    # Note: Filter edges with missing source or target nodes
    if graph_model == KnowledgeGraph:
        for graph in chunk_graphs:
            valid_node_ids = {node.id for node in graph.nodes}
@ -71,7 +93,6 @@ async def extract_graph_from_data(
                for edge in graph.edges
                if edge.source_node_id in valid_node_ids and edge.target_node_id in valid_node_ids
            ]
-
    return await integrate_chunk_graphs(
        data_chunks, chunk_graphs, graph_model, ontology_adapter or OntologyResolver()
    )
--- a/examples/python/dynamic_steps_example.py
+++ b/examples/python/dynamic_steps_example.py
@ -180,14 +180,9 @@ async def main(enable_steps):

    # Step 3: Create knowledge graph
    if enable_steps.get("cognify"):
-        pipeline_run = await cognee.cognify()
+        await cognee.cognify()
        print("Knowledge graph created.")

-    # Step 4: Calculate descriptive metrics
-    if enable_steps.get("graph_metrics"):
-        await get_pipeline_run_metrics(pipeline_run, include_optional=True)
-        print("Descriptive graph metrics saved to database.")
-
    # Step 5: Query insights
    if enable_steps.get("retriever"):
        search_results = await cognee.search(
--- a/examples/python/ontology_demo_example.py
+++ b/examples/python/ontology_demo_example.py
@ -62,13 +62,9 @@ async def main():
        os.path.dirname(os.path.abspath(__file__)), "ontology_input_example/basic_ontology.owl"
    )

-    pipeline_run = await cognee.cognify(ontology_file_path=ontology_path)
+    await cognee.cognify(ontology_file_path=ontology_path)
    print("Knowledge with ontology created.")

-    # Step 4: Calculate descriptive metrics
-    await get_pipeline_run_metrics(pipeline_run, include_optional=True)
-    print("Descriptive graph metrics saved to database.")
-
    # Step 5: Query insights
    search_results = await cognee.search(
        query_type=SearchType.GRAPH_COMPLETION,
Author	SHA1	Message	Date
hajdul88	1ab57707cc	Adds separated node and edge sequential extraction	2025-06-03 15:36:17 +02:00
hajdul88	1fc381e51b	removes graph metrics from demos	2025-06-03 15:34:54 +02:00
hajdul88	dca58ff97b	renames parallell extraction	2025-06-03 15:33:51 +02:00
hajdul88	199e997f93	Adds multi sequential node and edge prompts	2025-06-03 15:33:15 +02:00
hajdul88	a22759e260	Adds KG sequential multi-extraction	2025-06-03 11:49:15 +02:00
hajdul88	d901f0a43a	Separating graph prompts	2025-06-03 11:03:16 +02:00
hajdul88	ba0ad38863	saves multiround KG creation	2025-06-02 11:15:49 +02:00
hajdul88	fa5c0b8e75	deleting edge merging prompts	2025-05-29 16:23:53 +02:00
hajdul88	987b03b895	feat: adds multi parallel node extraction	2025-05-29 16:17:59 +02:00
hajdul88	e69ab1fe1d	parallel extraction	2025-05-28 17:29:41 +02:00