Compare commits
10 commits
main
...
feature/co
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1ab57707cc | ||
|
|
1fc381e51b | ||
|
|
dca58ff97b | ||
|
|
199e997f93 | ||
|
|
a22759e260 | ||
|
|
d901f0a43a | ||
|
|
ba0ad38863 | ||
|
|
fa5c0b8e75 | ||
|
|
987b03b895 | ||
|
|
e69ab1fe1d |
19 changed files with 510 additions and 24 deletions
|
|
@ -0,0 +1,6 @@
|
|||
You are an expert in relationship identification and knowledge graph building focusing on relationships. Your task is to perform a detailed extraction of relationship names from the text.
|
||||
• Extract all relationship names from explicit phrases, verbs, and implied context that could help form edge triplets.
|
||||
• Use the potential nodes and reassign them to relationship names if they correspond to a relation, verb, action or similar.
|
||||
• Ensure completeness by working in multiple rounds, capturing overlooked connections and refining the nodes list.
|
||||
• Focus on meaningful entities and relationship, directly stated or implied and implicit.
|
||||
• Return two lists: refined nodes and potential relationship names (for forming edges).
|
||||
|
|
@ -0,0 +1,15 @@
|
|||
Analyze the following text to identify relationships between entities in the knowledge graph.
|
||||
Build upon previously extracted edges, ensuring completeness and consistency.
|
||||
Return all the previously extracted edges **together** with the new ones that you extracted.
|
||||
This is round {{ round_number }} of {{ total_rounds }}.
|
||||
|
||||
**Text:**
|
||||
{{ text }}
|
||||
|
||||
**Previously Extracted Nodes:**
|
||||
{{ nodes }}
|
||||
|
||||
**Relationships Identified in Previous Rounds:**
|
||||
{{ relationships }}
|
||||
|
||||
Extract both explicit and implicit relationships between the nodes, building upon previous findings while ensuring completeness and consistency.
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
You are a top-tier edge-extraction algorithm. Every user prompt will contain two clearly marked sections:
|
||||
|
||||
<TEXT>
|
||||
<the source text to analyze>
|
||||
</TEXT>
|
||||
|
||||
and
|
||||
|
||||
<ENTITIES>
|
||||
<Entities with their id, name and description>
|
||||
</ENTITIES>
|
||||
|
||||
|
||||
# 1.Reference Provided Entities
|
||||
- Only extract edges between the IDs listed under <ENTITIES>.
|
||||
- Do not invent new nodes—every edge’s subject and object must match one of the provided IDs.
|
||||
|
||||
# 2.Relation Identification
|
||||
- Inspect the TEXT to find explicit or implicit relationships between the provided entities.
|
||||
- Use snake_case for relation names (e.g. works_for, located_in, married_to).
|
||||
- Only create an edge when the text clearly signals a connection.
|
||||
- The two endpoints of the edge can not be the same entity.
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
<TEXT>
|
||||
`{{text}}`
|
||||
</TEXT>
|
||||
|
||||
<ENTITIES>
|
||||
`{{final_nodes}}`
|
||||
</ENTITIES>
|
||||
|
|
@ -0,0 +1,41 @@
|
|||
You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
|
||||
**Nodes** represent entities and concepts. They're akin to Wikipedia nodes.
|
||||
**Edges** represent relationships between concepts. They're akin to Wikipedia links.
|
||||
|
||||
You get the text and an already identified knowledge graph (can be empty) in the following format:
|
||||
|
||||
<TEXT>
|
||||
<text to extract the graph from>
|
||||
</TEXT>
|
||||
|
||||
and
|
||||
|
||||
<KNOWLEDGEGRAPH>
|
||||
'nodes': <list of nodes>
|
||||
'edges': <list of edges>
|
||||
</KNOWLEDGEGRAPH>
|
||||
|
||||
Your task is to extract additional nodes and edges and return the new knowledge graph including the already identified nodes and edges.
|
||||
|
||||
The aim is to achieve simplicity and clarity in the knowledge graph.
|
||||
# 1. Labeling Nodes
|
||||
**Consistency**: Ensure you use basic or elementary types for node labels.
|
||||
- For example, when you identify an entity representing a person, always label it as **"Person"**.
|
||||
- Avoid using more specific terms like "Mathematician" or "Scientist", keep those as "profession" property.
|
||||
- Don't use too generic terms like "Entity".
|
||||
**Node IDs**: Never utilize integers as node IDs.
|
||||
- Node IDs should be names or human-readable identifiers found in the text.
|
||||
# 2. Handling Numerical Data and Dates
|
||||
- For example, when you identify an entity representing a date, make sure it has type **"Date"**.
|
||||
- Extract the date in the format "YYYY-MM-DD"
|
||||
- If not possible to extract the whole date, extract month or year, or both if available.
|
||||
- **Property Format**: Properties must be in a key-value format.
|
||||
- **Quotation Marks**: Never use escaped single or double quotes within property values.
|
||||
- **Naming Convention**: Use snake_case for relationship names, e.g., `acted_in`.
|
||||
# 3. Coreference Resolution
|
||||
- **Maintain Entity Consistency**: When extracting entities, it's vital to ensure consistency.
|
||||
If an entity, such as "John Doe", is mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
|
||||
always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "John Doe" as the Persons ID.
|
||||
Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
|
||||
# 4. Strict Compliance
|
||||
Adhere to the rules strictly. Non-compliance will result in termination
|
||||
|
|
@ -0,0 +1,9 @@
|
|||
<TEXT>
|
||||
`{{ text }}`
|
||||
</TEXT>
|
||||
|
||||
and
|
||||
|
||||
<KNOWLEDGEGRAPH>
|
||||
`{{ graph }}`
|
||||
</KNOWLEDGEGRAPH>
|
||||
|
|
@ -0,0 +1,15 @@
|
|||
You are an assistant who *merges duplicate entities and their types* in a knowledge graph.
|
||||
|
||||
You will receive the list of extracted entities from a text.
|
||||
Some of these refer to the same real-world entity but differ only in casing, minor typos, or partial information (for example, `"John Doe"` vs `"john_doe" vs "John_Doe"` ).
|
||||
There can be also synonyms present in the list.
|
||||
Entities are duplicates only if they represent the same concept of object or they are synonyms of each other.
|
||||
|
||||
**Task**
|
||||
– Detect duplicates.
|
||||
– Deduplicate them creating the final list of Entities where there are no duplicates anymore.
|
||||
- Merge type information among the entities. It is not allowed to have duplicated entity types.
|
||||
- Each type must be singular (for example skill instead of skills). Please also merge synonyms in the case of types.
|
||||
- Map synonym entity types to the type that is the most general and allows to reduce multiple formats of the same type in a global knowledge graph.
|
||||
- Filter out entities that are representing more than one real-world concept (for example: car, motorbike)
|
||||
- Return the final list of nodes
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
|
||||
<ENTITIES>
|
||||
`{{nodes_to_deduplicate}}`
|
||||
</ENTITIES>
|
||||
26
cognee/infrastructure/llm/prompts/node_extraction_prompt.txt
Normal file
26
cognee/infrastructure/llm/prompts/node_extraction_prompt.txt
Normal file
|
|
@ -0,0 +1,26 @@
|
|||
You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
|
||||
**Nodes** represent entities and concepts. They're akin to Wikipedia nodes.
|
||||
|
||||
The aim is to achieve simplicity and clarity in the knowledge graph.
|
||||
# 1. Labeling Nodes
|
||||
**Consistency**: Ensure you use basic or elementary types for node labels.
|
||||
- For example, when you identify an entity representing a person, always label it as **"Person"**.
|
||||
- Avoid using more specific terms like "Mathematician" or "Scientist", keep those as "profession" property.
|
||||
- Don't use too generic terms like "Entity".
|
||||
**Node IDs**: Never utilize integers as node IDs.
|
||||
- Node IDs should be names or human-readable identifiers found in the text.
|
||||
# 2. Handling Numerical Data and Dates
|
||||
- For example, when you identify an entity representing a date, make sure it has type **"Date"**.
|
||||
- Allowed formats are "YYYY", "YYYY-MM" or "YYYY-MM-DD". Extract each the date in the format of how is it represented in the text, and extract each date only once.
|
||||
- If the date in the text represents a period, extract the start and end date of the period separately.
|
||||
- If not possible to extract the whole date, extract month or year, or both if available.
|
||||
- **Property Format**: Properties must be in a key-value format.
|
||||
- **Quotation Marks**: Never use escaped single or double quotes within property values.
|
||||
- **Naming Convention**: Use snake_case for relationship names, e.g., `acted_in`.
|
||||
# 3. Coreference Resolution
|
||||
- **Maintain Entity Consistency**: When extracting entities, it's vital to ensure consistency.
|
||||
If an entity, such as "John Doe", is mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
|
||||
always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "John Doe" as the Persons ID.
|
||||
Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
|
||||
# 4. Strict Compliance
|
||||
Adhere to the rules strictly. Non-compliance will result in termination
|
||||
|
|
@ -0,0 +1,9 @@
|
|||
You are an expert in entity extraction and knowledge graph building focusing on the node identification.
|
||||
Your task is to perform a detailed entity and concept extraction from text to generate a list of potential nodes for a knowledge graph.
|
||||
• Node IDs should be names or human-readable identifiers found in the text.
|
||||
• Extract clear, distinct entities and concepts as individual strings.
|
||||
• Be exhaustive, ensure completeness by capturing all the entities, names, nouns, noun-parts, and implied or implicit mentions.
|
||||
• Also extract potential entity type nodes, directly mentioned or implied.
|
||||
• Avoid duplicates and overly generic terms.
|
||||
• Consider different perspectives and indirect references.
|
||||
• Return only a list of unique node strings with all the entities.
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
Extract distinct entities and concepts from the following text to expand the knowledge graph.
|
||||
Build upon previously extracted entities, ensuring completeness and consistency.
|
||||
Return all the previously extracted entities **together** with the new ones that you extracted.
|
||||
This is round {{ round_number }} of {{ total_rounds }}.
|
||||
|
||||
**Text:**
|
||||
{{ text }}
|
||||
|
||||
**Previously Extracted Entities:**
|
||||
{{ nodes }}
|
||||
|
|
@ -1,9 +1,17 @@
|
|||
import os
|
||||
from typing import Type
|
||||
import asyncio
|
||||
import json
|
||||
from fileinput import filename
|
||||
from typing import Type, List, Tuple, Dict, Any, Set
|
||||
|
||||
from langchain_experimental.graph_transformers.llm import system_prompt
|
||||
from pydantic import BaseModel
|
||||
from streamlit import context
|
||||
|
||||
from cognee.infrastructure.llm.get_llm_client import get_llm_client
|
||||
from cognee.infrastructure.llm.prompts import render_prompt
|
||||
from cognee.infrastructure.llm.config import get_llm_config
|
||||
from cognee.shared.data_models import KnowledgeGraph, NodeList, EdgeList, Node, Edge
|
||||
|
||||
|
||||
async def extract_content_graph(content: str, response_model: Type[BaseModel]):
|
||||
|
|
@ -21,10 +29,124 @@ async def extract_content_graph(content: str, response_model: Type[BaseModel]):
|
|||
else:
|
||||
base_directory = None
|
||||
|
||||
system_prompt = render_prompt(prompt_path, {}, base_directory=base_directory)
|
||||
system_prompt_graph = render_prompt(prompt_path, {}, base_directory=base_directory)
|
||||
|
||||
content_graph = await llm_client.acreate_structured_output(
|
||||
content, system_prompt, response_model
|
||||
content, system_prompt_graph, response_model
|
||||
)
|
||||
|
||||
return content_graph
|
||||
|
||||
|
||||
def dedupe_and_normalize_nodes(nodes: List[Node]) -> List[Node]:
|
||||
seen: Set[Tuple[str, str]] = set()
|
||||
out: List[Node] = []
|
||||
|
||||
for node in nodes:
|
||||
node.name = node.name.lower()
|
||||
node.type = node.type.lower()
|
||||
|
||||
node.name = node.name.lower().replace("_", " ")
|
||||
node.type = node.type.lower().replace("_", " ")
|
||||
|
||||
key = (node.name, node.type)
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
out.append(node)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
def dedupe_and_normalize_edges(edges: List[Edge]) -> List[Edge]:
|
||||
seen: Set[Tuple[str, str, str]] = set()
|
||||
out: List[Edge] = []
|
||||
|
||||
for edge in edges:
|
||||
edge.relationship_name = edge.relationship_name.lower()
|
||||
|
||||
key = (edge.source_node_id, edge.relationship_name, edge.target_node_id)
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
out.append(edge)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
async def extract_content_graph2(
|
||||
content: str, response_model: Type[BaseModel], node_rounds: int = 1, edge_rounds: int = 1
|
||||
):
|
||||
llm_client = get_llm_client()
|
||||
|
||||
###### NODE EXTRACTION
|
||||
node_prompt_path = "node_extraction_prompt.txt"
|
||||
|
||||
node_system = render_prompt(node_prompt_path, {})
|
||||
|
||||
node_tasks = [
|
||||
llm_client.acreate_structured_output(content, node_system, NodeList)
|
||||
for _ in range(node_rounds)
|
||||
]
|
||||
|
||||
node_results = await asyncio.gather(*node_tasks)
|
||||
|
||||
all_nodes: List[Node] = [node for nl in node_results for node in nl.nodes]
|
||||
###### NODE DEDUPLICATION
|
||||
all_nodes = dedupe_and_normalize_nodes(all_nodes)
|
||||
|
||||
all_nodes_merged = {
|
||||
"nodes_to_deduplicate": json.dumps([n.model_dump() for n in all_nodes], ensure_ascii=False)
|
||||
}
|
||||
|
||||
merge_system_prompt = "merge_nodes_system_prompt.txt"
|
||||
merge_user_prompt = "merge_nodes_user_prompt.txt"
|
||||
|
||||
merge_system = render_prompt(filename=merge_system_prompt, context={})
|
||||
merge_user = render_prompt(filename=merge_user_prompt, context=all_nodes_merged)
|
||||
|
||||
final_nodes_list = await llm_client.acreate_structured_output(
|
||||
text_input=merge_user, system_prompt=merge_system, response_model=NodeList
|
||||
)
|
||||
|
||||
###### EDGE EXTRACTION
|
||||
|
||||
edge_system_prompt = "edge_extraction_system_prompt.txt"
|
||||
edge_user_prompt = "edge_extraction_user_prompt.txt"
|
||||
|
||||
edge_system = render_prompt(edge_system_prompt, {})
|
||||
nodes_for_edge_extraction = {
|
||||
"final_nodes": json.dumps(
|
||||
[n.model_dump() for n in final_nodes_list.nodes], ensure_ascii=False
|
||||
),
|
||||
"text": content,
|
||||
}
|
||||
|
||||
edge_user = render_prompt(edge_user_prompt, context=nodes_for_edge_extraction)
|
||||
|
||||
edge_tasks = [
|
||||
llm_client.acreate_structured_output(
|
||||
text_input=edge_user, system_prompt=edge_system, response_model=EdgeList
|
||||
)
|
||||
for _ in range(edge_rounds)
|
||||
]
|
||||
|
||||
edge_results = await asyncio.gather(*edge_tasks)
|
||||
|
||||
all_edges: List[Edge] = [edge for nl in edge_results for edge in nl.edges]
|
||||
###### EDGE DEDUPLICATION
|
||||
all_edges = dedupe_and_normalize_edges(all_edges)
|
||||
|
||||
all_edges_merged = {
|
||||
"edges_to_deduplicate": json.dumps([n.model_dump() for n in all_edges], ensure_ascii=False)
|
||||
}
|
||||
|
||||
merge_system_prompt = "merge_edges_system_prompt.txt"
|
||||
merge_user_prompt = "merge_edges_user_prompt.txt"
|
||||
|
||||
merge_system = render_prompt(filename=merge_system_prompt, context={})
|
||||
merge_user = render_prompt(filename=merge_user_prompt, context=all_edges_merged)
|
||||
|
||||
final_edges_list = await llm_client.acreate_structured_output(
|
||||
text_input=merge_user, system_prompt=merge_system, response_model=EdgeList
|
||||
)
|
||||
|
||||
return KnowledgeGraph(nodes=final_nodes_list.nodes, edges=final_edges_list.edges)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,46 @@
|
|||
import json
|
||||
from typing import Type
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from cognee.infrastructure.llm.get_llm_client import get_llm_client
|
||||
from cognee.infrastructure.llm.prompts import render_prompt
|
||||
from cognee.shared.data_models import KnowledgeGraph
|
||||
|
||||
|
||||
async def extract_content_graph_sequential(
|
||||
content: str, response_model: Type[BaseModel], graph_extraction_rounds: int = 2
|
||||
):
|
||||
llm_client = get_llm_client()
|
||||
|
||||
graph_system_prompt_path = "generate_graph_prompt_sequential.txt"
|
||||
graph_user_prompt_path = "generate_graph_prompt_sequential_user.txt"
|
||||
graph_system = render_prompt(graph_system_prompt_path, {})
|
||||
|
||||
current_nodes = []
|
||||
current_edges = []
|
||||
|
||||
knowledge_graph = KnowledgeGraph(nodes=[], edges=[])
|
||||
|
||||
for round_idx in range(graph_extraction_rounds):
|
||||
nodes_json = json.dumps([n.model_dump() for n in current_nodes], ensure_ascii=False)
|
||||
edges_json = json.dumps([e.model_dump() for e in current_edges], ensure_ascii=False)
|
||||
|
||||
graph_user = render_prompt(
|
||||
graph_user_prompt_path, #:TODO: this could use some formatting due to html #34 codes.
|
||||
{
|
||||
"text": content,
|
||||
"graph": f"nodes: {nodes_json}, edges: {edges_json}",
|
||||
},
|
||||
)
|
||||
|
||||
knowledge_graph = await llm_client.acreate_structured_output(
|
||||
text_input=graph_user,
|
||||
system_prompt=graph_system,
|
||||
response_model=response_model,
|
||||
)
|
||||
|
||||
current_nodes = knowledge_graph.nodes
|
||||
current_edges = knowledge_graph.edges
|
||||
|
||||
return knowledge_graph
|
||||
|
|
@ -0,0 +1,81 @@
|
|||
import asyncio
|
||||
import json
|
||||
from typing import List, Tuple, Set
|
||||
|
||||
from cognee.infrastructure.llm.get_llm_client import get_llm_client
|
||||
from cognee.infrastructure.llm.prompts import render_prompt
|
||||
from cognee.shared.data_models import KnowledgeGraph, NodeList, EdgeList, Node, Edge
|
||||
|
||||
|
||||
def dedupe_and_normalize_nodes(nodes: List[Node]) -> List[Node]:
|
||||
seen: Set[Tuple[str, str]] = set()
|
||||
out: List[Node] = []
|
||||
|
||||
for node in nodes:
|
||||
node.name = node.name.lower()
|
||||
node.type = node.type.lower()
|
||||
|
||||
node.name = node.name.lower().replace("_", " ")
|
||||
node.type = node.type.lower().replace("_", " ")
|
||||
|
||||
key = (node.name, node.type)
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
out.append(node)
|
||||
|
||||
return out
|
||||
|
||||
|
||||
async def extract_content_node_edge_multi_parallel(content: str, node_rounds: int = 1):
|
||||
llm_client = get_llm_client()
|
||||
|
||||
###### NODE EXTRACTION
|
||||
node_prompt_path = "node_extraction_prompt.txt"
|
||||
|
||||
node_system = render_prompt(node_prompt_path, {})
|
||||
|
||||
node_tasks = [
|
||||
llm_client.acreate_structured_output(content, node_system, NodeList)
|
||||
for _ in range(node_rounds)
|
||||
]
|
||||
|
||||
node_results = await asyncio.gather(*node_tasks)
|
||||
|
||||
all_nodes: List[Node] = [node for nl in node_results for node in nl.nodes]
|
||||
###### NODE DEDUPLICATION
|
||||
all_nodes = dedupe_and_normalize_nodes(all_nodes)
|
||||
|
||||
all_nodes_merged = {
|
||||
"nodes_to_deduplicate": json.dumps([n.model_dump() for n in all_nodes], ensure_ascii=False)
|
||||
}
|
||||
|
||||
merge_system_prompt = "merge_nodes_system_prompt.txt"
|
||||
merge_user_prompt = "merge_nodes_user_prompt.txt"
|
||||
|
||||
merge_system = render_prompt(filename=merge_system_prompt, context={})
|
||||
merge_user = render_prompt(filename=merge_user_prompt, context=all_nodes_merged)
|
||||
|
||||
final_nodes_list = await llm_client.acreate_structured_output(
|
||||
text_input=merge_user, system_prompt=merge_system, response_model=NodeList
|
||||
)
|
||||
|
||||
###### EDGE EXTRACTION
|
||||
|
||||
edge_system_prompt = "edge_extraction_system_prompt.txt"
|
||||
edge_user_prompt = "edge_extraction_user_prompt.txt"
|
||||
|
||||
edge_system = render_prompt(edge_system_prompt, {})
|
||||
nodes_for_edge_extraction = {
|
||||
"final_nodes": json.dumps(
|
||||
[n.model_dump() for n in final_nodes_list.nodes], ensure_ascii=False
|
||||
),
|
||||
"text": content,
|
||||
}
|
||||
|
||||
edge_user = render_prompt(edge_user_prompt, context=nodes_for_edge_extraction)
|
||||
|
||||
final_edges_list = await llm_client.acreate_structured_output(
|
||||
text_input=edge_user, system_prompt=edge_system, response_model=EdgeList
|
||||
)
|
||||
|
||||
return KnowledgeGraph(nodes=final_nodes_list.nodes, edges=final_edges_list.edges)
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
import json
|
||||
|
||||
from cognee.infrastructure.llm.get_llm_client import get_llm_client
|
||||
from cognee.infrastructure.llm.prompts import render_prompt
|
||||
from cognee.shared.data_models import KnowledgeGraph, NodeList, EdgeList
|
||||
|
||||
|
||||
async def extract_content_node_edge_multi_sequential(
|
||||
content: str, node_rounds: int = 2, edge_rounds=2
|
||||
):
|
||||
llm_client = get_llm_client()
|
||||
|
||||
current_nodes = NodeList()
|
||||
|
||||
for pass_idx in range(node_rounds):
|
||||
nodes_json = json.dumps([n.model_dump() for n in current_nodes.nodes], ensure_ascii=False)
|
||||
|
||||
node_system = render_prompt("node_extraction_prompt_sequential.txt", {})
|
||||
node_user = render_prompt(
|
||||
"node_extraction_prompt_sequential_user.txt",
|
||||
{
|
||||
"text": content,
|
||||
"nodes": {nodes_json},
|
||||
"total_rounds": {node_rounds},
|
||||
"round_number": {pass_idx},
|
||||
},
|
||||
)
|
||||
|
||||
current_nodes = await llm_client.acreate_structured_output(node_user, node_system, NodeList)
|
||||
|
||||
final_nodes = current_nodes
|
||||
final_nodes_json = json.dumps([n.model_dump() for n in final_nodes.nodes], ensure_ascii=False)
|
||||
|
||||
current_edges = EdgeList()
|
||||
|
||||
for pass_idx in range(edge_rounds):
|
||||
edges_json = json.dumps([n.model_dump() for n in current_edges.edges], ensure_ascii=False)
|
||||
|
||||
edges_system = render_prompt("edge_extraction_prompt_sequential.txt", {})
|
||||
edges_user = render_prompt(
|
||||
"edge_extraction_prompt_sequential_user.txt",
|
||||
{
|
||||
"text": content,
|
||||
"nodes": {final_nodes_json},
|
||||
"edges": {edges_json},
|
||||
"total_rounds": {node_rounds},
|
||||
"round_number": {pass_idx},
|
||||
},
|
||||
)
|
||||
|
||||
current_edges = await llm_client.acreate_structured_output(
|
||||
edges_user, edges_system, EdgeList
|
||||
)
|
||||
|
||||
final_edges = current_edges
|
||||
|
||||
return KnowledgeGraph(nodes=final_nodes.nodes, edges=final_edges.edges)
|
||||
|
|
@ -46,9 +46,6 @@ else:
|
|||
name: str
|
||||
type: str
|
||||
description: str
|
||||
properties: Optional[Dict[str, Any]] = Field(
|
||||
None, description="A dictionary of properties associated with the node."
|
||||
)
|
||||
|
||||
class Edge(BaseModel):
|
||||
"""Edge in a knowledge graph."""
|
||||
|
|
@ -56,9 +53,16 @@ else:
|
|||
source_node_id: str
|
||||
target_node_id: str
|
||||
relationship_name: str
|
||||
properties: Optional[Dict[str, Any]] = Field(
|
||||
None, description="A dictionary of properties associated with the edge."
|
||||
)
|
||||
|
||||
class NodeList(BaseModel):
|
||||
"""Nodes"""
|
||||
|
||||
nodes: List[Node] = Field(..., default_factory=list)
|
||||
|
||||
class EdgeList(BaseModel):
|
||||
"""Nodes"""
|
||||
|
||||
edges: List[Edge] = Field(..., default_factory=list)
|
||||
|
||||
class KnowledgeGraph(BaseModel):
|
||||
"""Knowledge graph."""
|
||||
|
|
|
|||
|
|
@ -6,7 +6,22 @@ from pydantic import BaseModel
|
|||
from cognee.infrastructure.databases.graph import get_graph_engine
|
||||
from cognee.modules.ontology.rdf_xml.OntologyResolver import OntologyResolver
|
||||
from cognee.modules.chunking.models.DocumentChunk import DocumentChunk
|
||||
from cognee.modules.data.extraction.knowledge_graph import extract_content_graph
|
||||
|
||||
from cognee.modules.data.extraction.knowledge_graph.extract_content_graph import (
|
||||
extract_content_graph,
|
||||
)
|
||||
from cognee.modules.data.extraction.knowledge_graph.extract_content_node_edge_multi_parallel import (
|
||||
extract_content_node_edge_multi_parallel,
|
||||
)
|
||||
|
||||
from cognee.modules.data.extraction.knowledge_graph.extract_content_graph_sequential import (
|
||||
extract_content_graph_sequential,
|
||||
)
|
||||
|
||||
from cognee.modules.data.extraction.knowledge_graph.extract_content_node_edge_multi_sequential import (
|
||||
extract_content_node_edge_multi_sequential,
|
||||
)
|
||||
|
||||
from cognee.modules.graph.utils import (
|
||||
expand_with_nodes_and_edges,
|
||||
retrieve_existing_edges,
|
||||
|
|
@ -59,10 +74,17 @@ async def extract_graph_from_data(
|
|||
Extracts and integrates a knowledge graph from the text content of document chunks using a specified graph model.
|
||||
"""
|
||||
chunk_graphs = await asyncio.gather(
|
||||
*[extract_content_graph(chunk.text, graph_model) for chunk in data_chunks]
|
||||
# *[extract_content_graph(chunk.text, graph_model) for chunk in data_chunks]
|
||||
# *[extract_content_node_edge_multi_parallel(content=chunk.text, node_rounds=2) for chunk in data_chunks]
|
||||
# *[extract_content_graph_sequential(content=chunk.text, response_model=graph_model, graph_extraction_rounds=2) for chunk in data_chunks]
|
||||
*[
|
||||
extract_content_node_edge_multi_sequential(
|
||||
content=chunk.text, node_rounds=1, edge_rounds=1
|
||||
)
|
||||
for chunk in data_chunks
|
||||
]
|
||||
)
|
||||
|
||||
# Note: Filter edges with missing source or target nodes
|
||||
if graph_model == KnowledgeGraph:
|
||||
for graph in chunk_graphs:
|
||||
valid_node_ids = {node.id for node in graph.nodes}
|
||||
|
|
@ -71,7 +93,6 @@ async def extract_graph_from_data(
|
|||
for edge in graph.edges
|
||||
if edge.source_node_id in valid_node_ids and edge.target_node_id in valid_node_ids
|
||||
]
|
||||
|
||||
return await integrate_chunk_graphs(
|
||||
data_chunks, chunk_graphs, graph_model, ontology_adapter or OntologyResolver()
|
||||
)
|
||||
|
|
|
|||
|
|
@ -180,14 +180,9 @@ async def main(enable_steps):
|
|||
|
||||
# Step 3: Create knowledge graph
|
||||
if enable_steps.get("cognify"):
|
||||
pipeline_run = await cognee.cognify()
|
||||
await cognee.cognify()
|
||||
print("Knowledge graph created.")
|
||||
|
||||
# Step 4: Calculate descriptive metrics
|
||||
if enable_steps.get("graph_metrics"):
|
||||
await get_pipeline_run_metrics(pipeline_run, include_optional=True)
|
||||
print("Descriptive graph metrics saved to database.")
|
||||
|
||||
# Step 5: Query insights
|
||||
if enable_steps.get("retriever"):
|
||||
search_results = await cognee.search(
|
||||
|
|
|
|||
|
|
@ -62,13 +62,9 @@ async def main():
|
|||
os.path.dirname(os.path.abspath(__file__)), "ontology_input_example/basic_ontology.owl"
|
||||
)
|
||||
|
||||
pipeline_run = await cognee.cognify(ontology_file_path=ontology_path)
|
||||
await cognee.cognify(ontology_file_path=ontology_path)
|
||||
print("Knowledge with ontology created.")
|
||||
|
||||
# Step 4: Calculate descriptive metrics
|
||||
await get_pipeline_run_metrics(pipeline_run, include_optional=True)
|
||||
print("Descriptive graph metrics saved to database.")
|
||||
|
||||
# Step 5: Query insights
|
||||
search_results = await cognee.search(
|
||||
query_type=SearchType.GRAPH_COMPLETION,
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue