<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
253 lines
123 KiB
Text
253 lines
123 KiB
Text
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Cognee GraphRAG\n",
|
||
"\n",
|
||
"Connecting external knowledge to the LLM efficiently and retrieving it is a key challenge faced by developers. For developers and data scientists, integrating structured and unstructured data into AI workflows often involves multiple tools, complex pipelines, and time-consuming processes.\n",
|
||
"\n",
|
||
"Enter **cognee,** a powerful framework for knowledge and memory management. Cognee streamlines the path from raw data to actionable insights.\n",
|
||
"\n",
|
||
"In this notebook, we’ll explore a demo that leverages cognee and creates a knowledge graph from a document, process it into a meaningful structure, and extract useful insights. By the end, you’ll see how cognee can give you new insights into your data by connecting various data sources in one big semantic layer you can analyze.\n",
|
||
"\n",
|
||
"## RAG: Retrieval Augmented Generation - Recap\n",
|
||
"\n",
|
||
"RAG enhances LLMs by integrating external knowledge sources during inference. It does so by turning the data into a vector representation and storing it in a vector store.\n",
|
||
"\n",
|
||
"### Key Benefits of RAG:\n",
|
||
"\n",
|
||
"1. Connecting domain specific data to LLMs\n",
|
||
"2. Cost savings\n",
|
||
"3. Higher accuracy than base LLM\n",
|
||
"\n",
|
||
"However, building a RAG system presents challenges: handling diverse data formats, data updates, creating a robust metadata layer, and mediocre accuracy\n",
|
||
"\n",
|
||
"## Introducing cognee\n",
|
||
"\n",
|
||
"cognee simplifies knowledge and memory management for LLMs\n",
|
||
"\n",
|
||
"cognee is inspired by human mind and higher cognitive functions. It mimics ways we construct our mental map of the world and build a semantic understanding of various objects, terms and issues in our everyday lives.\n",
|
||
"\n",
|
||
"cognee brings this approach to code by allowing developers to create semantic layers that would allow users to store their ontologies which are **a formalised depiction of knowledge** in graphs.\n",
|
||
"\n",
|
||
"This lets you use the knowledge you have about a system connect it to LLMs in a modular way, with the best data engineering practices, wide choice of vector and graph stores and various LLMs you can use.\n",
|
||
"\n",
|
||
"Together, they:\n",
|
||
"\n",
|
||
"- Turn unstructured and semi-structured data into a graph/vector representation.\n",
|
||
"- Enable ontology generation for particular domains, making unique graphs for every vertical\n",
|
||
"- Provide a deterministic layer for LLM outputs, ensuring consistency and reliability.\n",
|
||
"\n",
|
||
"## Step-by-Step Demo: Building a RAG System with Cognee\n",
|
||
"\n",
|
||
"### 1. Setting Up the Environment\n",
|
||
"\n",
|
||
"Start by importing the required libraries and defining the environment:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"!pip install cognee==0.1.39"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"import cognee\n",
|
||
"\n",
|
||
"await cognee.prune.prune_data()\n",
|
||
"await cognee.prune.prune_system(metadata=True)\n",
|
||
"\n",
|
||
"if \"OPENAI_API_KEY\" not in os.environ:\n",
|
||
" os.environ[\"OPENAI_API_KEY\"] = \"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Ensure you’ve set up your API keys and installed necessary dependencies.\n",
|
||
"\n",
|
||
"### 2. Preparing the Dataset\n",
|
||
"\n",
|
||
"We’ll use a brief profile of an individual as our sample dataset:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"documents = [\"Jessica Miller, Experienced Sales Manager with a strong track record in building high-performing teams.\",\n",
|
||
" \"David Thompson, Creative Graphic Designer with over 8 years of experience in visual design and branding.\"\n",
|
||
" ]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 3. Adding Data to Cognee\n",
|
||
"\n",
|
||
"Load the dataset into the cognee framework:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"await cognee.add(documents)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This step prepares the data for graph-based processing.\n",
|
||
"\n",
|
||
"### 5. Processing Data into a Knowledge Graph\n",
|
||
"\n",
|
||
"Transform the data into a structured knowledge graph:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"await cognee.cognify()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The graph now contains nodes and relationships derived from the dataset, creating a powerful structure for exploration.\n",
|
||
"\n",
|
||
"### 6. Performing Searches\n",
|
||
"\n",
|
||
"### Answer prompt based on knowledge graph approach:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from cognee.api.v1.search import SearchType\n",
|
||
"search_results = await cognee.search(query_type=SearchType.GRAPH_COMPLETION, query_text=\"Tell me who are the people mentioned?\")\n",
|
||
"\n",
|
||
"print(\"\\n\\nAnswer based on knowledge graph:\\n\")\n",
|
||
"for result in search_results:\n",
|
||
" print(f\"{result}\\n\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Answer prompt based on RAG approach:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"search_results = await cognee.search(query_type=SearchType.RAG_COMPLETION, query_text=\"Tell me who are the people mentioned?\")\n",
|
||
"\n",
|
||
"print(\"\\n\\nAnswer based on RAG:\\n\")\n",
|
||
"for result in search_results:\n",
|
||
" print(f\"{result}\\n\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In conclusion, the results demonstrate a significant advantage of the knowledge graph-based approach (Graphrag) over the RAG approach. Graphrag successfully identified all the mentioned individuals across multiple documents, showcasing its ability to aggregate and infer information from a global context. In contrast, the RAG approach was limited to identifying individuals within a single document due to its chunking-based processing constraints. This highlights Graphrag's superior capability in comprehensively resolving queries that span across a broader corpus of interconnected data."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 7. Finding Related Nodes\n",
|
||
"\n",
|
||
"Explore relationships in the knowledge graph:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"related_nodes = await cognee.search(query_type=SearchType.INSIGHTS, query_text=\"person\")\n",
|
||
"\n",
|
||
"print(\"\\n\\nRelated nodes are:\\n\")\n",
|
||
"for node in related_nodes:\n",
|
||
" print(f\"{node}\\n\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Why Choose Cognee?\n",
|
||
"\n",
|
||
"### 1. Agentic Framework and Memory tied together\n",
|
||
"\n",
|
||
"Your agents can now get long-term, short-term memory and memory specific to their domains\n",
|
||
"\n",
|
||
"### 2. Enhanced Querying and Insights\n",
|
||
"\n",
|
||
"Your memory can now automatically optimize itself and allow to respond to questions better\n",
|
||
"\n",
|
||
"### 3. Simplified Deployment\n",
|
||
"\n",
|
||
"You can use the standard tools out of the box and get things done without much effort\n",
|
||
"\n",
|
||
"## Visualizing the Knowledge Graph\n",
|
||
"\n",
|
||
"Imagine a graph structure where each node represents a document or entity, and edges indicate relationships.\n",
|
||
"\n",
|
||
"Here’s the visualized knowledge graph from the simple example above:\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"## Conclusion\n",
|
||
"\n",
|
||
"Try running it yourself\n",
|
||
"\n",
|
||
"[join the cognee community](https://discord.gg/tV7pr5XSj7)"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|