{ "cells": [ { "cell_type": "markdown", "id": "6f22c8fe6d92cfcc", "metadata": {}, "source": [ "# Using Cognee with Python Development Data\n", "\n", "Unite authoritative Python practice (Guido van Rossum's own contributions!), normative guidance (Zen/PEP 8), and your lived context (rules + conversations) into one *AI memory* that produces answers that are relevant, explainable, and consistent." ] }, { "cell_type": "markdown", "id": "fe69acbf9ab1a22b", "metadata": {}, "source": [ "## What You'll Learn\n", "\n", "In this comprehensive tutorial, you'll discover how to transform scattered development data into an intelligent knowledge system that enhances your coding workflow. By the end, you'll have:\n", "\n", "- **Connected disparate data sources** (Guido's CPython contributions, mypy development, PEP discussions, your Python projects) into a unified AI memory graph\n", "- **Built an memory layer** that understands Python design philosophy, best practice coding patterns, and your preferences and experience\n", "- **Learn how to use intelligent search capabilities** that combine the diverse context\n", "- **Integrated everything with your coding environment** through MCP (Model Context Protocol)\n", "\n", "This tutorial demonstrates the power of **knowledge graphs** and **retrieval-augmented generation (RAG)** for software development, showing you how to build systems that learn from Python's creator and improve your own Python development." ] }, { "cell_type": "markdown", "id": "b03b59c064213dd4", "metadata": {}, "source": [ "## Cognee and its core operations\n", "\n", "Before we dive in, let's understand the core Cognee operations we'll be working with:\n", "\n", "- **`cognee.add()`** - Ingests raw data (files, text, APIs) into the system\n", "- **`cognee.cognify()`** - Processes and structures data into a knowledge graph using AI\n", "- **`cognee.search()`** - Queries the knowledge graph with natural language or Cypher\n", "- **`cognee.memify()`** - Cognee's \"secret sauce\" that infers implicit connections and rules from your data" ] }, { "cell_type": "markdown", "id": "6a7669fbb6a3e6c7", "metadata": {}, "source": [ "## Data used in this tutorial\n", "\n", "Cognee can ingest many types of sources. In this tutorial, we use a small, concrete set of files that cover different perspectives:\n", "\n", "- **`guido_contributions.json` — Authoritative exemplars.** Real PRs and commits from Guido van Rossum (mypy, CPython). These show how Python’s creator solved problems and provide concrete anchors for patterns.\n", "- **`pep_style_guide.md` — Norms.** Encodes community style and typing conventions (PEP 8 and related). Ensures that search results and inferred rules align with widely accepted standards.\n", "- **`zen_principles.md` — Philosophy.** The Zen of Python. Grounds design trade‑offs (simplicity, explicitness, readability) beyond syntax or mechanics.\n", "- **`my_developer_rules.md` — Local constraints.** Your house rules, conventions, and project‑specific requirements (scope, privacy, Spec.md). Keeps recommendations relevant to your actual workflow.\n", "- **`copilot_conversations.json` — Personal history.** Transcripts of real assistant conversations, including your questions, code snippets, and discussion topics. Captures “how you code” and connects it to “how Guido codes.”" ] }, { "cell_type": "markdown", "id": "2a5dac2c6fdc7ca7", "metadata": {}, "source": [ "# Preliminaries\n", "\n", "Cognee relies heavily on async functions.\n", "We need `nest_asyncio` so `await` works in this notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "20cb02b49e3c53e2", "metadata": { "ExecuteTime": { "end_time": "2025-09-07T14:35:00.836706Z", "start_time": "2025-09-07T14:35:00.832646Z" } }, "outputs": [], "source": [ "import nest_asyncio\n", "nest_asyncio.apply()" ] }, { "cell_type": "markdown", "id": "30e66c894fb4cfd5", "metadata": {}, "source": [ "To strike the balanace between speed, cost, anc quality, we recommend using OpenAI's `4o-mini` model; make sure your `.env` file contains this line:\n", "\n", "```LLM_MODEL=\"gpt-4o-mini\"```" ] }, { "cell_type": "markdown", "id": "45e1caaec20c9518", "metadata": {}, "source": [ "We will do a quick import check." ] }, { "cell_type": "code", "execution_count": null, "id": "9386ecb596860399", "metadata": { "ExecuteTime": { "end_time": "2025-09-07T14:35:03.910260Z", "start_time": "2025-09-07T14:35:00.938966Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "\u001b[2m2025-09-07T14:35:01.883464\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDeleted old log file: /Users/lazar/PycharmProjects/cognee/logs/2025-09-07_14-54-27.log\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n", "/Users/lazar/PycharmProjects/cognee/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "\n", "\u001b[2m2025-09-07T14:35:02.487548\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mLogging initialized \u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m \u001b[36mcognee_version\u001b[0m=\u001b[35m0.2.4-local\u001b[0m \u001b[36mdatabase_path\u001b[0m=\u001b[35m/Users/lazar/PycharmProjects/cognee/cognee/.cognee_system/databases\u001b[0m \u001b[36mgraph_database_name\u001b[0m=\u001b[35m\u001b[0m \u001b[36mos_info\u001b[0m=\u001b[35m'Darwin 24.5.0 (Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:29 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6030)'\u001b[0m \u001b[36mpython_version\u001b[0m=\u001b[35m3.12.8\u001b[0m \u001b[36mrelational_config\u001b[0m=\u001b[35mcognee_db\u001b[0m \u001b[36mstructlog_version\u001b[0m=\u001b[35m25.4.0\u001b[0m \u001b[36mvector_config\u001b[0m=\u001b[35mlancedb\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:02.487958\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDatabase storage: /Users/lazar/PycharmProjects/cognee/cognee/.cognee_system/databases\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "🔍 Quick Cognee Import Check\n", "==============================\n", "📍 Cognee location: /Users/lazar/PycharmProjects/cognee/cognee/__init__.py\n", "📁 Package directory: /Users/lazar/PycharmProjects/cognee/cognee\n", "📦 Status: INSTALLED PACKAGE\n" ] } ], "source": [ "import cognee\n", "import os\n", "from pathlib import Path\n", "\n", "print('🔍 Quick Cognee Import Check')\n", "print('=' * 30)\n", "print(f'📍 Cognee location: {cognee.__file__}')\n", "print(f'📁 Package directory: {os.path.dirname(cognee.__file__)}')\n", "\n", "# Check if it's local or installed\n", "current_dir = Path.cwd()\n", "cognee_path = Path(cognee.__file__)\n", "if current_dir in cognee_path.parents:\n", " print('🏠 Status: LOCAL DEVELOPMENT VERSION')\n", "else:\n", " print('📦 Status: INSTALLED PACKAGE')" ] }, { "cell_type": "code", "execution_count": null, "id": "19e74e6b691020db", "metadata": { "ExecuteTime": { "end_time": "2025-09-07T14:35:03.921217Z", "start_time": "2025-09-07T14:35:03.918659Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📁 Project root: /Users/lazar/PycharmProjects/cognee\n" ] } ], "source": [ "# Path setup for notebook environment (self-contained)\n", "import sys\n", "from pathlib import Path\n", "\n", "notebook_dir = Path.cwd()\n", "if notebook_dir.name == 'notebooks':\n", " project_root = notebook_dir.parent\n", "else:\n", " project_root = Path.cwd()\n", "\n", "project_root_str = str(project_root.absolute())\n", "if project_root_str not in sys.path:\n", " sys.path.insert(0, project_root_str)\n", "\n", "print(f\"📁 Project root: {project_root_str}\")" ] }, { "cell_type": "markdown", "id": "af584b935cbdc8d", "metadata": {}, "source": [ "Finally, we will begin with a clean slate, by removing any previous Cognee data:" ] }, { "cell_type": "code", "execution_count": null, "id": "dd47383aa9519465", "metadata": { "ExecuteTime": { "end_time": "2025-09-07T14:35:06.194073Z", "start_time": "2025-09-07T14:35:03.929446Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "\u001b[2m2025-09-07T14:35:06.190189\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDatabase deleted successfully.\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n" ] } ], "source": [ "import cognee\n", "\n", "await cognee.prune.prune_data()\n", "await cognee.prune.prune_system(metadata=True)" ] }, { "cell_type": "markdown", "id": "93c9783037715026", "metadata": {}, "source": [ "### First data ingestion: Exploring Guido's Python Contributions\n", "\n", "We'll begin with a document that contains detailed PRs and commits from Guido van Rossum's work on mypy and CPython, showing real-world examples of Python's creator solving type system and language design challenges.\n", "\n", "We'll use Cognee's `add()` and `cognify()` functions to ingest this data and build a knowledge graph that connects Guido's development patterns with Python best practices." ] }, { "cell_type": "code", "execution_count": null, "id": "b8743ed520b4de37", "metadata": { "ExecuteTime": { "end_time": "2025-09-07T14:39:53.511862Z", "start_time": "2025-09-07T14:35:06.228778Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User 666b4a6d-34ef-4221-aba2-68a64a7b1eaa has registered.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "\u001b[1mEmbeddingRateLimiter initialized: enabled=False, requests_limit=60, interval_seconds=60\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.623496\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `576f15b1-6366-5079-b586-01bf92a45a1d`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.624579\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.625619\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.646868\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: pypdf_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.647515\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: text_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.647982\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: image_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.648557\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: audio_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.649816\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: unstructured_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.660104\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.660527\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.661133\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `576f15b1-6366-5079-b586-01bf92a45a1d`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.685106\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `be77ae78-61ae-5066-8df8-04ba903dbe6d`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.685571\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.685911\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.693221\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task started: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:09.721431\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:35:24.808308\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:10.235598\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:18.963698\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRetrieved 202 nodes and 335 edges in 0.04 seconds\u001b[0m [\u001b[0m\u001b[1m\u001b[34mNeo4jAdapter\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:20.167605\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:20.168080\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:20.168371\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:20.185430\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:22.765687\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:22.767063\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:24.039208\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRetrieved 203 nodes and 337 edges in 0.02 seconds\u001b[0m [\u001b[0m\u001b[1m\u001b[34mNeo4jAdapter\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:25.499226\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:25.500201\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:25.500492\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:25.514153\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:27.574764\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:27.576271\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:28.552555\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRetrieved 204 nodes and 339 edges in 0.04 seconds\u001b[0m [\u001b[0m\u001b[1m\u001b[34mNeo4jAdapter\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:29.917238\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:29.917690\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:29.918038\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:29.931887\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:36:50.418281\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:37:31.380478\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:37:36.217182\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRetrieved 332 nodes and 535 edges in 0.04 seconds\u001b[0m [\u001b[0m\u001b[1m\u001b[34mNeo4jAdapter\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:37:37.608817\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:37:37.610610\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:37:37.610882\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:37:37.623203\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:37:59.333307\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:38:30.317584\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:38:36.143492\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRetrieved 415 nodes and 697 edges in 0.05 seconds\u001b[0m [\u001b[0m\u001b[1m\u001b[34mNeo4jAdapter\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:38:37.703572\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:38:37.704043\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:38:37.704434\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:38:37.705143\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:07.687322\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:41.229696\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:45.991597\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRetrieved 513 nodes and 872 edges in 0.06 seconds\u001b[0m [\u001b[0m\u001b[1m\u001b[34mNeo4jAdapter\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.641833\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.642446\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_knowledge_graph_from_events`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.642764\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_events_and_timestamps`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.646206\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task completed: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.646490\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.646738\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.647065\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `be77ae78-61ae-5066-8df8-04ba903dbe6d`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.722293\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRetrieved 513 nodes and 872 edges in 0.05 seconds\u001b[0m [\u001b[0m\u001b[1m\u001b[34mNeo4jAdapter\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:47.727445\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mGraph projection completed: 513 nodes, 872 edges in 0.05s\u001b[0m [\u001b[0m\u001b[1m\u001b[34mCogneeGraph\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:48.085263\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.04s\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n" ] } ], "source": [ "import cognee\n", "\n", "result = await cognee.add(\n", " \"file://data/guido_contributions.json\",\n", " node_set=[\"guido_data\"]\n", ")\n", "await cognee.cognify(temporal_cognify=True)\n", "results = await cognee.search(\"Show me commits\")\n", "print(results[0])" ] }, { "cell_type": "code", "execution_count": null, "id": "f08b362cbf12b398", "metadata": { "ExecuteTime": { "end_time": "2025-09-07T14:39:53.561679Z", "start_time": "2025-09-07T14:39:53.559528Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Showing commits from the provided context.\n" ] } ], "source": [ "# This cell has been merged with the previous cell for self-containment" ] }, { "cell_type": "markdown", "id": "10d582d02ead905e", "metadata": {}, "source": [ "### What's just happened?\n", "The `search()` function uses natural language to query a knowledge graph containing Guido's development history.\n", "Unlike traditional databases, Cognee understands the relationships between commits, language features, design decisions, and evolution over time.\n", "\n", "Cognee also allows you to visualize the graphs created:" ] }, { "cell_type": "code", "execution_count": 7, "id": "1fb068f422bda6cf", "metadata": { "ExecuteTime": { "end_time": "2025-09-07T14:39:53.688017Z", "start_time": "2025-09-07T14:39:53.598467Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "\u001b[2m2025-09-07T14:39:53.671009\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRetrieved 513 nodes and 872 edges in 0.06 seconds\u001b[0m [\u001b[0m\u001b[1m\u001b[34mNeo4jAdapter\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:53.676478\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mGraph visualization saved as ./guido_contributions.html\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n", "\n", "\u001b[2m2025-09-07T14:39:53.677322\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mThe HTML file has been stored at path: ./guido_contributions.html\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n" ] }, { "data": { "text/plain": [ "'\\n \\n \\n
\\n \\n \\n \\n \\n \\n \\n \\n \\n\\n \\n \\n \\n '" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cognee import visualize_graph\n", "await visualize_graph('./guido_contributions.html')" ] }, { "cell_type": "code", "execution_count": 8, "id": "f24341c97d6eaccb", "metadata": { "ExecuteTime": { "end_time": "2025-09-07T14:39:53.733197Z", "start_time": "2025-09-07T14:39:53.729922Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "