cognee/notebooks/cognee_graphiti_demo.ipynb
lxobr 9cc357ac1c
Feat/cog 1365 unify retrievers (#572)
<!-- .github/pull_request_template.md -->

## Description
<!-- Provide a clear description of the changes in this PR -->
- Created the `BaseRetriever` class to unify all the retrievers and
searches.
- Implemented seven specialized retrievers (summaries, chunks,
completions, graph, graph-summary, insights, code) with consistent
get_context/get_completion interfaces.
- Added json context dumping feature in the current completion
implementations to enable context comparisons.
- Built a comparison framework to validate old vs new implementations.
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced multiple retrieval classes for enhanced search
capabilities, including `BaseRetriever`, `ChunksRetriever`,
`CodeRetriever`, `CompletionRetriever`, `GraphCompletionRetriever`,
`GraphSummaryCompletionRetriever`, `InsightsRetriever`, and
`SummariesRetriever`.
- Enhanced query completions with optional context saving for improved
data persistence.
- Implemented advanced tools to compare retrieval outcomes across
different implementations.

- **Refactor**
- Streamlined internal module organization and updated references for
increased maintainability and consistency.
- Added comments indicating future maintenance tasks related to code
merging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-02-27 12:13:21 +01:00

231 lines
6.7 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cognee Graphiti integration demo"
]
},
{
"cell_type": "markdown",
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"source": [
"First we import the necessary libaries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cognee\n",
"import logging\n",
"import warnings\n",
"from cognee.modules.pipelines import Task, run_tasks\n",
"from cognee.shared.utils import setup_logging\n",
"from cognee.tasks.temporal_awareness import build_graph_with_temporal_awareness\n",
"from cognee.infrastructure.databases.relational import (\n",
" create_db_and_tables as create_relational_db_and_tables,\n",
")\n",
"from cognee.tasks.temporal_awareness.index_graphiti_objects import (\n",
" index_and_transform_graphiti_nodes_and_edges,\n",
")\n",
"from cognee.modules.retrieval.utils.brute_force_triplet_search import brute_force_triplet_search\n",
"from cognee.tasks.completion.graph_query_completion import retrieved_edges_to_string\n",
"from cognee.infrastructure.llm.prompts import read_query_prompt, render_prompt\n",
"from cognee.infrastructure.llm.get_llm_client import get_llm_client"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set environment variables"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-15T10:43:57.893763Z",
"start_time": "2025-01-15T10:43:57.891332Z"
}
},
"outputs": [],
"source": [
"import os\n",
"\n",
"# We ignore warnigns for now\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"# API key for cognee\n",
"if \"LLM_API_KEY\" not in os.environ:\n",
" os.environ[\"LLM_API_KEY\"] = \"\"\n",
"\n",
"# API key for graphiti\n",
"if \"OPENAI_API_KEY\" not in os.environ:\n",
" os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
"\n",
"GRAPH_DATABASE_PROVIDER = \"neo4j\"\n",
"GRAPH_DATABASE_USERNAME = \"neo4j\"\n",
"GRAPH_DATABASE_PASSWORD = \"pleaseletmein\"\n",
"GRAPH_DATABASE_URL = \"bolt://localhost:7687\"\n",
"\n",
"os.environ[\"GRAPH_DATABASE_PROVIDER\"] = GRAPH_DATABASE_PROVIDER\n",
"os.environ[\"GRAPH_DATABASE_USERNAME\"] = GRAPH_DATABASE_USERNAME\n",
"os.environ[\"GRAPH_DATABASE_PASSWORD\"] = GRAPH_DATABASE_PASSWORD\n",
"os.environ[\"GRAPH_DATABASE_URL\"] = GRAPH_DATABASE_URL\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input texts with temporal information"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-15T10:43:57.928664Z",
"start_time": "2025-01-15T10:43:57.927105Z"
}
},
"outputs": [],
"source": [
"text_list = [\n",
" \"Kamala Harris is the Attorney General of California. She was previously \"\n",
" \"the district attorney for San Francisco.\",\n",
" \"As AG, Harris was in office from January 3, 2011 January 3, 2017\",\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running graphiti + transforming its graph into cognee's core system (graph transformation + vector embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-15T10:44:25.008501Z",
"start_time": "2025-01-15T10:43:57.932240Z"
}
},
"outputs": [],
"source": [
"# 🔧 Setting Up Logging to Suppress Errors\n",
"setup_logging(logging.ERROR) # Keeping logs clean and focused\n",
"\n",
"# 🧹 Pruning Old Data and Metadata\n",
"await cognee.prune.prune_data() # Removing outdated data\n",
"await cognee.prune.prune_system(metadata=True)\n",
"\n",
"# 🏗️ Creating Relational Database and Tables\n",
"await create_relational_db_and_tables()\n",
"\n",
"# 📚 Adding Text Data to Cognee\n",
"for text in text_list:\n",
" await cognee.add(text)\n",
"\n",
"# 🕰️ Building Temporal-Aware Graphs\n",
"tasks = [\n",
" Task(build_graph_with_temporal_awareness, text_list=text_list),\n",
"]\n",
"\n",
"# 🚀 Running the Task Pipeline\n",
"pipeline = run_tasks(tasks)\n",
"\n",
"# 🌟 Processing Pipeline Results\n",
"async for result in pipeline:\n",
" print(f\"✅ Result Processed: {result}\")\n",
"\n",
"# 🔄 Indexing and Transforming Graph Data\n",
"await index_and_transform_graphiti_nodes_and_edges()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieving and generating answer from graphiti graph with cognee retriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-15T10:44:27.844438Z",
"start_time": "2025-01-15T10:44:25.013325Z"
}
},
"outputs": [],
"source": [
"# Step 1: Formulating the Query 🔍\n",
"query = \"When was Kamala Harris in office?\"\n",
"\n",
"# Step 2: Searching for Relevant Triplets 📊\n",
"triplets = await brute_force_triplet_search(\n",
" query=query,\n",
" top_k=3,\n",
" collections=[\"graphitinode_content\", \"graphitinode_name\", \"graphitinode_summary\"],\n",
")\n",
"\n",
"# Step 3: Preparing the Context for the LLM\n",
"context = await retrieved_edges_to_string(triplets)\n",
"\n",
"args = {\"question\": query, \"context\": context}\n",
"\n",
"# Step 4: Generating Prompts ✍️\n",
"user_prompt = render_prompt(\"graph_context_for_question.txt\", args)\n",
"system_prompt = read_query_prompt(\"answer_simple_question_restricted.txt\")\n",
"\n",
"# Step 5: Interacting with the LLM 🤖\n",
"llm_client = get_llm_client()\n",
"computed_answer = await llm_client.acreate_structured_output(\n",
" text_input=user_prompt, # Input prompt for the user context\n",
" system_prompt=system_prompt, # System-level instructions for the model\n",
" response_model=str,\n",
")\n",
"\n",
"# Step 6: Displaying the Computed Answer ✨\n",
"print(f\"💡 Answer: {computed_answer}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}