{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": [ "# Turn expert developer Python knowhow into an AI Memory that will make your own code better. \n", "\n", "\n", "### Learning from Guido van Rossum, the creator of Python" ], "id": "6f22c8fe6d92cfcc" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## What You'll Learn\n", "\n", "In this comprehensive tutorial, you'll discover how to transform scattered development data into an intelligent knowledge system that enhances your coding workflow. By the end, you'll have:\n", "\n", "- **Connected disparate data sources** (Guido's CPython contributions, mypy development, PEP discussions, your Python projects) into a unified AI memory graph\n", "- **Built an memory layer** that understands Python design philosophy and coding patterns\n", "- **Learn how to use intelligent search capabilities** that surface Pythonic solutions when you need them most\n", "- **Integrated everything with your coding environment** through MCP (Model Context Protocol)\n", "\n", "This tutorial demonstrates the power of **knowledge graphs** and **retrieval-augmented generation (RAG)** for software development, showing you how to build systems that learn from Python's creator and improve your own Python development." ], "id": "fe69acbf9ab1a22b" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## Understanding the Cognee Ecosystem\n", "\n", "Before we dive in, let's understand the key components we'll be working with:\n", "\n", "### Core Cognee Functions\n", "\n", "- **`cognee.add()`** - Ingests raw data (files, text, APIs) into the system\n", "- **`cognee.cognify()`** - Processes and structures data into a knowledge graph using AI\n", "- **`cognee.search()`** - Queries the knowledge graph with natural language or Cypher\n", "- **`cognee.visualize()`** - Creates interactive graph visualizations\n", "- **`cognee.memify()`** - Cognee's \"secret sauce\" that infers implicit connections and rules from your data" ], "id": "b03b59c064213dd4" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Why This Approach Works\n", "\n", "Traditional coding assistants lack context about your specific projects, rules, and past decisions. By building a knowledge graph of development history from Python's creator, we create an AI that understands:\n", "\n", "- Guido's coding patterns and Python design philosophy\n", "- How language features evolved and why\n", "- Best practices derived from decades of Python development\n", "- Connections between different aspects of Python programming" ], "id": "6a7669fbb6a3e6c7" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Exploring Guido's Python Contributions\n", "\n", "We'll begin with a pre-loaded dataset containing Guido van Rossum's contributions to the Python ecosystem. This demonstrates how cognee structures and connects development activity from Python's creator.\n", "\n", "Take a moment to explore the repositories this user contributed to with the Graph Visualizaiton" ], "id": "93c9783037715026" }, { "metadata": { "ExecuteTime": { "end_time": "2025-09-07T09:57:30.827241Z", "start_time": "2025-09-07T09:57:27.209811Z" } }, "cell_type": "code", "source": [ "import cognee\n", "import os\n", "from pathlib import Path\n", "\n", "print('🔍 Quick Cognee Import Check')\n", "print('=' * 30)\n", "print(f'📍 Cognee location: {cognee.__file__}')\n", "print(f'📁 Package directory: {os.path.dirname(cognee.__file__)}')\n", "\n", "# Check if it's local or installed\n", "current_dir = Path.cwd()\n", "cognee_path = Path(cognee.__file__)\n", "if current_dir in cognee_path.parents:\n", " print('🏠 Status: LOCAL DEVELOPMENT VERSION')\n", "else:\n", " print('📦 Status: INSTALLED PACKAGE')" ], "id": "9386ecb596860399", "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "\u001B[2m2025-09-07T09:57:28.419783\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mDeleted old log file: /Users/lazar/PycharmProjects/cognee/logs/2025-09-04_09-37-20.log\u001B[0m [\u001B[0m\u001B[1m\u001B[34mcognee.shared.logging_utils\u001B[0m]\u001B[0m\n", "/Users/lazar/PycharmProjects/cognee/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "\n", "\u001B[2m2025-09-07T09:57:29.163761\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mLogging initialized \u001B[0m [\u001B[0m\u001B[1m\u001B[34mcognee.shared.logging_utils\u001B[0m]\u001B[0m \u001B[36mcognee_version\u001B[0m=\u001B[35m0.2.4-local\u001B[0m \u001B[36mdatabase_path\u001B[0m=\u001B[35m/Users/lazar/PycharmProjects/cognee/cognee/.cognee_system/databases\u001B[0m \u001B[36mgraph_database_name\u001B[0m=\u001B[35m\u001B[0m \u001B[36mos_info\u001B[0m=\u001B[35m'Darwin 24.5.0 (Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:29 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6030)'\u001B[0m \u001B[36mpython_version\u001B[0m=\u001B[35m3.12.8\u001B[0m \u001B[36mrelational_config\u001B[0m=\u001B[35mcognee_db\u001B[0m \u001B[36mstructlog_version\u001B[0m=\u001B[35m25.4.0\u001B[0m \u001B[36mvector_config\u001B[0m=\u001B[35mlancedb\u001B[0m\n", "\n", "\u001B[2m2025-09-07T09:57:29.164208\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mDatabase storage: /Users/lazar/PycharmProjects/cognee/cognee/.cognee_system/databases\u001B[0m [\u001B[0m\u001B[1m\u001B[34mcognee.shared.logging_utils\u001B[0m]\u001B[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "🔍 Quick Cognee Import Check\n", "==============================\n", "📍 Cognee location: /Users/lazar/PycharmProjects/cognee/cognee/__init__.py\n", "📁 Package directory: /Users/lazar/PycharmProjects/cognee/cognee\n", "📦 Status: INSTALLED PACKAGE\n" ] } ], "execution_count": 1 }, { "metadata": { "ExecuteTime": { "end_time": "2025-09-07T09:59:43.525416Z", "start_time": "2025-09-07T09:59:43.522480Z" } }, "cell_type": "code", "source": [ "import sys\n", "from pathlib import Path\n", "notebook_dir = Path.cwd()\n", "if notebook_dir.name == 'notebooks':\n", " project_root = notebook_dir.parent\n", "else:\n", " project_root = Path.cwd()\n", "\n", "# Add project root to the beginning of sys.path\n", "project_root_str = str(project_root.absolute())\n", "if project_root_str not in sys.path:\n", " sys.path.insert(0, project_root_str)\n", "\n", "print(f\"📁 Project root: {project_root_str}\")" ], "id": "19e74e6b691020db", "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "📁 Project root: /Users/lazar/PycharmProjects/cognee\n" ] } ], "execution_count": 2 }, { "metadata": { "ExecuteTime": { "end_time": "2025-09-07T10:00:14.685409Z", "start_time": "2025-09-07T10:00:12.419145Z" } }, "cell_type": "code", "source": [ "import cognee\n", "\n", "\n", "\n", "result = await cognee.add(\"file://data/guido_contributions.json\", node_set=[\"guido\"])\n", "await cognee.cognify()\n", "results = await cognee.search(\"Show me commits\")" ], "id": "b8743ed520b4de37", "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "\u001B[2m2025-09-07T10:00:12.505721\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mOntology file 'None' not found. No owl ontology will be attached to the graph.\u001B[0m [\u001B[0m\u001B[1m\u001B[34mOntologyAdapter\u001B[0m]\u001B[0m\n", "\n", "\u001B[2m2025-09-07T10:00:12.578388\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mRetrieved 85 nodes and 182 edges in 0.02 seconds\u001B[0m [\u001B[0m\u001B[1m\u001B[34mNeo4jAdapter\u001B[0m]\u001B[0m\n", "\n", "\u001B[2m2025-09-07T10:00:12.580139\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mGraph projection completed: 85 nodes, 182 edges in 0.03s\u001B[0m [\u001B[0m\u001B[1m\u001B[34mCogneeGraph\u001B[0m]\u001B[0m\n", "\n", "\u001B[2m2025-09-07T10:00:12.958731\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.03s\u001B[0m [\u001B[0m\u001B[1m\u001B[34mcognee.shared.logging_utils\u001B[0m]\u001B[0m\n" ] } ], "execution_count": 6 }, { "metadata": { "ExecuteTime": { "end_time": "2025-09-07T10:00:16.283950Z", "start_time": "2025-09-07T10:00:16.280509Z" } }, "cell_type": "code", "source": "results[0]", "id": "f08b362cbf12b398", "outputs": [ { "data": { "text/plain": [ "'Here are the commits from the provided context:'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 7 }, { "metadata": {}, "cell_type": "markdown", "source": "What's happening here? The search() function uses natural language to query a knowledge graph containing Guido's development history. Unlike traditional databases, this understands the relationships between commits, language features, design decisions, and evolution over time.", "id": "10d582d02ead905e" }, { "metadata": { "ExecuteTime": { "end_time": "2025-09-07T10:04:42.969912Z", "start_time": "2025-09-07T10:04:42.898473Z" } }, "cell_type": "code", "source": [ "from cognee import visualize_graph\n", "\n", "await visualize_graph('./guido_contributions.html')" ], "id": "1fb068f422bda6cf", "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "\u001B[2m2025-09-07T10:04:42.959421\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mRetrieved 85 nodes and 182 edges in 0.03 seconds\u001B[0m [\u001B[0m\u001B[1m\u001B[34mNeo4jAdapter\u001B[0m]\u001B[0m\n", "\n", "\u001B[2m2025-09-07T10:04:42.963711\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mGraph visualization saved as ./guido_contributions.html\u001B[0m [\u001B[0m\u001B[1m\u001B[34mcognee.shared.logging_utils\u001B[0m]\u001B[0m\n", "\n", "\u001B[2m2025-09-07T10:04:42.964638\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mThe HTML file has been stored at path: ./guido_contributions.html\u001B[0m [\u001B[0m\u001B[1m\u001B[34mcognee.shared.logging_utils\u001B[0m]\u001B[0m\n" ] }, { "data": { "text/plain": [ "'\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
\\n \\n\\n \\n \\n \\n \\n \\n '" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 12 }, { "metadata": { "ExecuteTime": { "end_time": "2025-09-07T10:04:24.449814Z", "start_time": "2025-09-07T10:04:24.443869Z" } }, "cell_type": "code", "source": [ "from IPython.display import IFrame, HTML, display\n", "display(IFrame(\"./guido_contributions.html\", width=\"100%\", height=\"500\"))" ], "id": "f24341c97d6eaccb", "outputs": [ { "data": { "text/plain": [ "" ], "text/html": [ "\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "execution_count": 11 }, { "metadata": {}, "cell_type": "markdown", "source": [ "**Why visualization matters:** Knowledge graphs reveal hidden patterns in Python's development. The interactive visualization shows how different projects (CPython, mypy, PEPs), features, and time periods connect - insights that show Python's thoughtful evolution.\n", "\n", "Take a moment to explore the graph. Notice how:\n", "\n", "- CPython core development clusters around 2020\n", "- Mypy contributions focus on fixtures and run classes\n", "- PEP discussions mention Thomas Grainiger and Adam Turner\n", "- Time-based connections show how ideas evolved into features" ], "id": "3418aa17bf35e3bb" }, { "metadata": {}, "cell_type": "markdown", "source": "tNow we'll add your own the remaining data and see how they connections emerge between Guido's contributions, Python best practices and user conversations.", "id": "5e8d9094a09ae05d" }, { "metadata": { "ExecuteTime": { "end_time": "2025-09-07T11:29:01.564485Z", "start_time": "2025-09-07T11:29:01.371325Z" } }, "cell_type": "code", "source": [ "await cognee.add(\"file://data/copilot_conversations.json\", node_set=\"conversation_logs\")\n", "await cognee.add(\"file://data/my_developer_rules.md\", node_set=\"repository_data\")\n", "await cognee.add(\"file://data/zen_principles.md\", node_set=\"repository_data\")\n", "await cognee.add(\"file://data/pep_style_guide.md\", node_set=\"repository_data\")" ], "id": "5315318324968f0f", "outputs": [ { "data": { "text/plain": [ "PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('525400dd-b28e-59bf-aee9-ff7338a16159'), dataset_id=UUID('7cbac52a-507c-5a7d-aea8-91cc6a696792'), dataset_name='main_dataset', payload=None, data_ingestion_info=[{'run_info': PipelineRunAlreadyCompleted(status='PipelineRunAlreadyCompleted', pipeline_run_id=UUID('525400dd-b28e-59bf-aee9-ff7338a16159'), dataset_id=UUID('7cbac52a-507c-5a7d-aea8-91cc6a696792'), dataset_name='main_dataset', payload=None, data_ingestion_info=None), 'data_id': UUID('c2edae28-6c2f-5673-b354-b5376e88c6b1')}])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "execution_count": 19 }, { "metadata": {}, "cell_type": "markdown", "source": "", "id": "98b9b613d6d11bc5" }, { "metadata": { "ExecuteTime": { "end_time": "2025-09-07T10:07:59.212671Z", "start_time": "2025-09-07T10:07:56.314960Z" } }, "cell_type": "code", "source": [ "results = cognee.search(\"What Python type hinting challenges did I face, and how does Guido approach similar problems in mypy?\")\n", "print(results)" ], "id": "98b69c45db2fca3", "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "\u001B[2m2025-09-07T10:07:56.373046\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mRetrieved 147 nodes and 322 edges in 0.03 seconds\u001B[0m [\u001B[0m\u001B[1m\u001B[34mNeo4jAdapter\u001B[0m]\u001B[0m\n", "\n", "\u001B[2m2025-09-07T10:07:56.375530\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mGraph projection completed: 147 nodes, 322 edges in 0.04s\u001B[0m [\u001B[0m\u001B[1m\u001B[34mCogneeGraph\u001B[0m]\u001B[0m\n", "\n", "\u001B[2m2025-09-07T10:07:56.739044\u001B[0m [\u001B[32m\u001B[1minfo \u001B[0m] \u001B[1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.02s\u001B[0m [\u001B[0m\u001B[1m\u001B[34mcognee.shared.logging_utils\u001B[0m]\u001B[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[\"Using the provided context, I'll answer briefly about your type-hinting challenges and how Guido (in mypy) approaches similar problems.\"]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/b6/fdg52thn3h309cbg6kv5nryc0000gn/T/ipykernel_75109/564786944.py:1: RuntimeWarning: coroutine 'search' was never awaited\n", " results = await cognee.search(\"What Python type hinting challenges did I face, and how does Guido approach similar problems in mypy?\")\n", "RuntimeWarning: Enable tracemalloc to get the object allocation traceback\n" ] } ], "execution_count": 18 }, { "metadata": {}, "cell_type": "markdown", "source": [ "You'll see that cognee has connected your Python development challenges with Guido's approaches, revealing patterns like:\n", "\n", "- \"Type hint implementation failed due to circular imports - similar to issue Guido solved in mypy PR #1234\"\n", "- \"Performance bottleneck in list comprehension matches pattern Guido optimized in CPython commit abc123\"" ], "id": "6c49c4c252036fa1" }, { "metadata": {}, "cell_type": "markdown", "source": [ "Let's now introduce the memory functions. These algorithms run on top of your semantic layer, connecting the dots and improving the search.\n", "\n", "Memify is customizable and can use any transformation you'd like to write. But it also requires" ], "id": "a1f4606bfed8fc45" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "cognee.memify()", "id": "20234960f7566b15" }, { "metadata": {}, "cell_type": "markdown", "source": [ "**What `memify()` does for Python:** This advanced function uses AI to:\n", "\n", "- **Infer rule patterns** from your code (e.g., \"When implementing iterators, always follow the protocol Guido established\")\n", "- **Connect design philosophy to practice** (e.g., linking \"explicit is better than implicit\" to your type hinting decisions)\n" ], "id": "58d3ccec16f67c24" }, { "metadata": {}, "cell_type": "markdown", "source": "Now let's see how the system has connected your Python development patterns with established best practices:\n", "id": "a304033f9f0f5dcf" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "# Search for connections between your async patterns and Python philosophy\n", "results = cognee.search(\n", " \"How does my AsyncWebScraper implementation align with Python's design principles?\",\n", " search_type=\"GRAPH_COMPLETION\"\n", ")\n", "print(\"Python Pattern Analysis:\", results)" ], "id": "518fa9b17a604657" }, { "metadata": {}, "cell_type": "markdown", "source": "Now let's see use time awareness feature of cognee to see what are all events that happened between X and Y", "id": "c641b8b7e50dd2ae" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "28e7d5a75e076b8f" }, { "metadata": {}, "cell_type": "markdown", "source": "Hm, maybe we are unhappy with the result and want to give feedback to the system so it doesn't give us a bad answer again", "id": "ec6cf074a6c272ab" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": "", "id": "67dec85a658aad76" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }