{
"cells": [
{
"cell_type": "markdown",
"id": "d35ac8ce-0f92-46f5-9ba4-a46970f0ce19",
"metadata": {},
"source": [
"# Cognee - Get Started"
]
},
{
"cell_type": "markdown",
"id": "bd981778-0c84-4542-8e6f-1a7712184873",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": []
},
{
"cell_type": "markdown",
"id": "d8e606b1-94d3-43ce-bb4b-dbadff7f4ca6",
"metadata": {},
"source": [
"## How to enable LLM to connect to your data\n",
"\n",
"\n",
"\n",
"\n",
"#### Let's try and convert what is a large amount of unorganized data into structured graph that we can give to LLMs\n",
"#### Here is an example of some data chunks you sent to LLM to get a better answer to a question when sending LLM prompts"
]
},
{
"cell_type": "markdown",
"id": "cb74c44c58f052f1",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "fd7d5dab855e3234",
"metadata": {},
"source": [
"#### What if we could let LLM interact in natural language with a structured system like this:"
]
},
{
"cell_type": "markdown",
"id": "683c557088782e47",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "1bf1fa3631dc03ed",
"metadata": {},
"source": [
"#### How we do it\n",
"We use Knowledge Graphs. Knowledge graphs simply map out knowledge, linking specific facts and their connections. When Large Language Models (LLMs) process text, they infer these links, leading to occasional inaccuracies due to their probabilistic nature. Clearly defined relationships enhance their accuracy. This structured approach can extend beyond concepts to document layouts, pages, or other organizational schemas.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "074f0ea8-c659-4736-be26-be4b0e5ac665",
"metadata": {},
"source": [
"### Demo time"
]
},
{
"cell_type": "markdown",
"id": "0587d91d",
"metadata": {},
"source": [
"First let's define some data that we will cognify and perform a search on"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "df16431d0f48b006",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:38:49.122448Z",
"start_time": "2025-06-30T11:38:49.120056Z"
}
},
"outputs": [],
"source": [
"job_position = \"\"\"Senior Data Scientist (Machine Learning)\n",
"\n",
"Company: TechNova Solutions\n",
"Location: San Francisco, CA\n",
"\n",
"Job Description:\n",
"\n",
"TechNova Solutions is seeking a Senior Data Scientist specializing in Machine Learning to join our dynamic analytics team. The ideal candidate will have a strong background in developing and deploying machine learning models, working with large datasets, and translating complex data into actionable insights.\n",
"\n",
"Responsibilities:\n",
"\n",
"Develop and implement advanced machine learning algorithms and models.\n",
"Analyze large, complex datasets to extract meaningful patterns and insights.\n",
"Collaborate with cross-functional teams to integrate predictive models into products.\n",
"Stay updated with the latest advancements in machine learning and data science.\n",
"Mentor junior data scientists and provide technical guidance.\n",
"Qualifications:\n",
"\n",
"Master’s or Ph.D. in Data Science, Computer Science, Statistics, or a related field.\n",
"5+ years of experience in data science and machine learning.\n",
"Proficient in Python, R, and SQL.\n",
"Experience with deep learning frameworks (e.g., TensorFlow, PyTorch).\n",
"Strong problem-solving skills and attention to detail.\n",
"Candidate CVs\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9086abf3af077ab4",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:38:49.646981Z",
"start_time": "2025-06-30T11:38:49.645059Z"
}
},
"outputs": [],
"source": [
"job_1 = \"\"\"\n",
"CV 1: Relevant\n",
"Name: Dr. Emily Carter\n",
"Contact Information:\n",
"\n",
"Email: emily.carter@example.com\n",
"Phone: (555) 123-4567\n",
"Summary:\n",
"\n",
"Senior Data Scientist with over 8 years of experience in machine learning and predictive analytics. Expertise in developing advanced algorithms and deploying scalable models in production environments.\n",
"\n",
"Education:\n",
"\n",
"Ph.D. in Computer Science, Stanford University (2014)\n",
"B.S. in Mathematics, University of California, Berkeley (2010)\n",
"Experience:\n",
"\n",
"Senior Data Scientist, InnovateAI Labs (2016 – Present)\n",
"Led a team in developing machine learning models for natural language processing applications.\n",
"Implemented deep learning algorithms that improved prediction accuracy by 25%.\n",
"Collaborated with cross-functional teams to integrate models into cloud-based platforms.\n",
"Data Scientist, DataWave Analytics (2014 – 2016)\n",
"Developed predictive models for customer segmentation and churn analysis.\n",
"Analyzed large datasets using Hadoop and Spark frameworks.\n",
"Skills:\n",
"\n",
"Programming Languages: Python, R, SQL\n",
"Machine Learning: TensorFlow, Keras, Scikit-Learn\n",
"Big Data Technologies: Hadoop, Spark\n",
"Data Visualization: Tableau, Matplotlib\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a9de0cc07f798b7f",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:38:50.081581Z",
"start_time": "2025-06-30T11:38:50.079653Z"
}
},
"outputs": [],
"source": [
"job_2 = \"\"\"\n",
"CV 2: Relevant\n",
"Name: Michael Rodriguez\n",
"Contact Information:\n",
"\n",
"Email: michael.rodriguez@example.com\n",
"Phone: (555) 234-5678\n",
"Summary:\n",
"\n",
"Data Scientist with a strong background in machine learning and statistical modeling. Skilled in handling large datasets and translating data into actionable business insights.\n",
"\n",
"Education:\n",
"\n",
"M.S. in Data Science, Carnegie Mellon University (2013)\n",
"B.S. in Computer Science, University of Michigan (2011)\n",
"Experience:\n",
"\n",
"Senior Data Scientist, Alpha Analytics (2017 – Present)\n",
"Developed machine learning models to optimize marketing strategies.\n",
"Reduced customer acquisition cost by 15% through predictive modeling.\n",
"Data Scientist, TechInsights (2013 – 2017)\n",
"Analyzed user behavior data to improve product features.\n",
"Implemented A/B testing frameworks to evaluate product changes.\n",
"Skills:\n",
"\n",
"Programming Languages: Python, Java, SQL\n",
"Machine Learning: Scikit-Learn, XGBoost\n",
"Data Visualization: Seaborn, Plotly\n",
"Databases: MySQL, MongoDB\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "185ff1c102d06111",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:38:50.492793Z",
"start_time": "2025-06-30T11:38:50.490913Z"
}
},
"outputs": [],
"source": [
"job_3 = \"\"\"\n",
"CV 3: Relevant\n",
"Name: Sarah Nguyen\n",
"Contact Information:\n",
"\n",
"Email: sarah.nguyen@example.com\n",
"Phone: (555) 345-6789\n",
"Summary:\n",
"\n",
"Data Scientist specializing in machine learning with 6 years of experience. Passionate about leveraging data to drive business solutions and improve product performance.\n",
"\n",
"Education:\n",
"\n",
"M.S. in Statistics, University of Washington (2014)\n",
"B.S. in Applied Mathematics, University of Texas at Austin (2012)\n",
"Experience:\n",
"\n",
"Data Scientist, QuantumTech (2016 – Present)\n",
"Designed and implemented machine learning algorithms for financial forecasting.\n",
"Improved model efficiency by 20% through algorithm optimization.\n",
"Junior Data Scientist, DataCore Solutions (2014 – 2016)\n",
"Assisted in developing predictive models for supply chain optimization.\n",
"Conducted data cleaning and preprocessing on large datasets.\n",
"Skills:\n",
"\n",
"Programming Languages: Python, R\n",
"Machine Learning Frameworks: PyTorch, Scikit-Learn\n",
"Statistical Analysis: SAS, SPSS\n",
"Cloud Platforms: AWS, Azure\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "d55ce4c58f8efb67",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:38:50.900140Z",
"start_time": "2025-06-30T11:38:50.898061Z"
}
},
"outputs": [],
"source": [
"job_4 = \"\"\"\n",
"CV 4: Not Relevant\n",
"Name: David Thompson\n",
"Contact Information:\n",
"\n",
"Email: david.thompson@example.com\n",
"Phone: (555) 456-7890\n",
"Summary:\n",
"\n",
"Creative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals.\n",
"\n",
"Education:\n",
"\n",
"B.F.A. in Graphic Design, Rhode Island School of Design (2012)\n",
"Experience:\n",
"\n",
"Senior Graphic Designer, CreativeWorks Agency (2015 – Present)\n",
"Led design projects for clients in various industries.\n",
"Created branding materials that increased client engagement by 30%.\n",
"Graphic Designer, Visual Innovations (2012 – 2015)\n",
"Designed marketing collateral, including brochures, logos, and websites.\n",
"Collaborated with the marketing team to develop cohesive brand strategies.\n",
"Skills:\n",
"\n",
"Design Software: Adobe Photoshop, Illustrator, InDesign\n",
"Web Design: HTML, CSS\n",
"Specialties: Branding and Identity, Typography\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ca4ecc32721ad332",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:38:51.360603Z",
"start_time": "2025-06-30T11:38:51.357573Z"
}
},
"outputs": [],
"source": [
"job_5 = \"\"\"\n",
"CV 5: Not Relevant\n",
"Name: Jessica Miller\n",
"Contact Information:\n",
"\n",
"Email: jessica.miller@example.com\n",
"Phone: (555) 567-8901\n",
"Summary:\n",
"\n",
"Experienced Sales Manager with a strong track record in driving sales growth and building high-performing teams. Excellent communication and leadership skills.\n",
"\n",
"Education:\n",
"\n",
"B.A. in Business Administration, University of Southern California (2010)\n",
"Experience:\n",
"\n",
"Sales Manager, Global Enterprises (2015 – Present)\n",
"Managed a sales team of 15 members, achieving a 20% increase in annual revenue.\n",
"Developed sales strategies that expanded customer base by 25%.\n",
"Sales Representative, Market Leaders Inc. (2010 – 2015)\n",
"Consistently exceeded sales targets and received the 'Top Salesperson' award in 2013.\n",
"Skills:\n",
"\n",
"Sales Strategy and Planning\n",
"Team Leadership and Development\n",
"CRM Software: Salesforce, Zoho\n",
"Negotiation and Relationship Building\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"id": "4415446a",
"metadata": {},
"source": [
" Please add the necessary environment information bellow:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "bce39dc6",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:39:26.596920Z",
"start_time": "2025-06-30T11:39:26.592336Z"
}
},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Setting environment variables\n",
"\n",
"if \"LLM_API_KEY\" not in os.environ:\n",
" os.environ[\"LLM_API_KEY\"] = \"YOUR KEY\"\n",
"\n",
"# \"neo4j\" or \"networkx\"\n",
"os.environ[\"GRAPH_DATABASE_PROVIDER\"] = \"kuzu\"\n",
"# Not needed if using kuzu\n",
"# os.environ[\"GRAPH_DATABASE_URL\"]=\"\"\n",
"# os.environ[\"GRAPH_DATABASE_USERNAME\"]=\"\"\n",
"# os.environ[\"GRAPH_DATABASE_PASSWORD\"]=\"\"\n",
"\n",
"# \"pgvector\", \"qdrant\", \"weaviate\" or \"lancedb\"\n",
"os.environ[\"VECTOR_DB_PROVIDER\"] = \"lancedb\"\n",
"# Not needed if using \"lancedb\" or \"pgvector\"\n",
"# os.environ[\"VECTOR_DB_URL\"]=\"\"\n",
"# os.environ[\"VECTOR_DB_KEY\"]=\"\"\n",
"\n",
"# Relational Database provider \"sqlite\" or \"postgres\"\n",
"os.environ[\"DB_PROVIDER\"] = \"sqlite\"\n",
"\n",
"# Database name\n",
"os.environ[\"DB_NAME\"] = \"cognee_db\"\n",
"\n",
"# Postgres specific parameters (Only if Postgres or PGVector is used)\n",
"# os.environ[\"DB_HOST\"]=\"127.0.0.1\"\n",
"# os.environ[\"DB_PORT\"]=\"5432\"\n",
"# os.environ[\"DB_USERNAME\"]=\"cognee\"\n",
"# os.environ[\"DB_PASSWORD\"]=\"cognee\""
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "9f1a1dbd",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:39:29.907387Z",
"start_time": "2025-06-30T11:39:28.664364Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"\u001b[2m2025-10-22T17:32:21.564940\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDeleted old log file: /Users/daulet/Desktop/dev/cognee-claude/logs/2025-10-22_18-20-09.log\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:23.434353\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mLogging initialized \u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m \u001b[36mcognee_version\u001b[0m=\u001b[35m0.3.6-local\u001b[0m \u001b[36mdatabase_path\u001b[0m=\u001b[35m/Users/daulet/Desktop/dev/cognee-claude/cognee/.cognee_system/databases\u001b[0m \u001b[36mgraph_database_name\u001b[0m=\u001b[35m\u001b[0m \u001b[36mos_info\u001b[0m=\u001b[35m'Darwin 24.5.0 (Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:43 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T8132)'\u001b[0m \u001b[36mpython_version\u001b[0m=\u001b[35m3.10.11\u001b[0m \u001b[36mrelational_config\u001b[0m=\u001b[35mcognee_db\u001b[0m \u001b[36mstructlog_version\u001b[0m=\u001b[35m25.4.0\u001b[0m \u001b[36mvector_config\u001b[0m=\u001b[35mlancedb\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:23.434807\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDatabase storage: /Users/daulet/Desktop/dev/cognee-claude/cognee/.cognee_system/databases\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:25.196928\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mLoaded JSON extension \u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:25.224692\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDeleted Kuzu database files at /Users/daulet/Desktop/dev/cognee-claude/cognee/.cognee_system/databases/cognee_graph_kuzu\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:28.977925\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDatabase deleted successfully.\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[1mStorage manager absolute path: /Users/daulet/Desktop/dev/cognee-claude/cognee/.cognee_cache\u001b[0m\n",
"\n",
"\u001b[1mDeleting cache... \u001b[0m\n",
"\n",
"\u001b[1m✓ Cache deleted successfully! \u001b[0m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.3.6-local\n"
]
}
],
"source": [
"# Reset the cognee system with the following command:\n",
"\n",
"import cognee\n",
"\n",
"await cognee.prune.prune_data()\n",
"await cognee.prune.prune_system(metadata=True)\n",
"print(cognee.__version__)"
]
},
{
"cell_type": "markdown",
"id": "383d6971",
"metadata": {},
"source": [
"#### After we have defined and gathered our data let's add it to cognee "
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "904df61ba484a8e5",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:39:34.803971Z",
"start_time": "2025-06-30T11:39:32.388973Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"User 01092867-dbe8-41f8-8098-8ba40720b147 has registered.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"\u001b[2m2025-10-22T17:32:31.127995\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.128723\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.129049\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.130109\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.130379\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.130648\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.131309\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.131641\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.131923\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.132751\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.132977\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.133278\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.133850\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.134324\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.134638\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.136293\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.136674\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.136981\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.167700\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: pypdf_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.168441\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: text_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.168736\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: image_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.168970\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: audio_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.169218\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: unstructured_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.169493\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: advanced_pdf_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.169791\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mRegistered loader: beautiful_soup_loader\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.infrastructure.loaders.LoaderEngine\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.187717\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.188164\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.188468\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.191678\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.191974\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.192412\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.196317\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.196710\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.196942\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.199817\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.200558\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.201307\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.211986\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.212368\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.212638\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.223791\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `ingest_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.224334\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `resolve_data_directories`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.224642\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `217e19e1-e107-5d16-98c3-7c17b8fb56d8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('998f97df-1f02-56cf-9092-4b58933b6e43'), dataset_id=UUID('700b0f29-6447-5feb-8567-427b5f9ce6e7'), dataset_name='example', payload=None, data_ingestion_info=[{'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('998f97df-1f02-56cf-9092-4b58933b6e43'), dataset_id=UUID('700b0f29-6447-5feb-8567-427b5f9ce6e7'), dataset_name='example', payload=None, data_ingestion_info=None), 'data_id': UUID('844a73d2-2d9a-5c09-9ef6-4409b39231cc')}, {'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('998f97df-1f02-56cf-9092-4b58933b6e43'), dataset_id=UUID('700b0f29-6447-5feb-8567-427b5f9ce6e7'), dataset_name='example', payload=None, data_ingestion_info=None), 'data_id': UUID('02a041d5-8200-5b22-8f19-467f2f02474e')}, {'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('998f97df-1f02-56cf-9092-4b58933b6e43'), dataset_id=UUID('700b0f29-6447-5feb-8567-427b5f9ce6e7'), dataset_name='example', payload=None, data_ingestion_info=None), 'data_id': UUID('38724ad4-312f-52c1-a21e-1b7d05ffc6e6')}, {'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('998f97df-1f02-56cf-9092-4b58933b6e43'), dataset_id=UUID('700b0f29-6447-5feb-8567-427b5f9ce6e7'), dataset_name='example', payload=None, data_ingestion_info=None), 'data_id': UUID('dd9ac80a-d229-5064-8d1f-35f9e59c273b')}, {'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('998f97df-1f02-56cf-9092-4b58933b6e43'), dataset_id=UUID('700b0f29-6447-5feb-8567-427b5f9ce6e7'), dataset_name='example', payload=None, data_ingestion_info=None), 'data_id': UUID('13b89415-5188-5fea-9f3f-a1aa558c96bf')}, {'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('998f97df-1f02-56cf-9092-4b58933b6e43'), dataset_id=UUID('700b0f29-6447-5feb-8567-427b5f9ce6e7'), dataset_name='example', payload=None, data_ingestion_info=None), 'data_id': UUID('f587a773-ed86-5823-8ea4-1647bdd283c1')}])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import cognee\n",
"\n",
"await cognee.add([job_1, job_2, job_3, job_4, job_5, job_position], \"example\")"
]
},
{
"cell_type": "markdown",
"id": "0f15c5b1",
"metadata": {},
"source": [
"All good, let's cognify it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7c431fdef4921ae0",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:39:41.213655Z",
"start_time": "2025-06-30T11:39:41.209787Z"
}
},
"outputs": [],
"source": [
"from cognee.shared.data_models import KnowledgeGraph\n",
"from cognee.modules.data.models import Dataset, Data\n",
"from cognee.modules.data.methods.get_dataset_data import get_dataset_data\n",
"from cognee.modules.cognify.config import get_cognify_config\n",
"from cognee.modules.pipelines.tasks.task import Task\n",
"from cognee.modules.pipelines import run_tasks\n",
"from cognee.modules.users.models import User\n",
"from cognee.tasks.documents import (\n",
" classify_documents,\n",
" extract_chunks_from_documents,\n",
")\n",
"from cognee.infrastructure.llm import get_max_chunk_tokens\n",
"from cognee.tasks.graph import extract_graph_from_data\n",
"from cognee.tasks.storage import add_data_points\n",
"from cognee.tasks.summarization import summarize_text\n",
"\n",
"\n",
"async def run_cognify_pipeline(dataset: Dataset, user: User = None):\n",
" data_documents: list[Data] = await get_dataset_data(dataset_id=dataset.id)\n",
"\n",
" try:\n",
" cognee_config = get_cognify_config()\n",
"\n",
" tasks = [\n",
" Task(classify_documents),\n",
" Task(\n",
" extract_chunks_from_documents, max_chunk_size=get_max_chunk_tokens()\n",
" ), # Extract text chunks based on the document type.\n",
" Task(\n",
" extract_graph_from_data, graph_model=KnowledgeGraph,\n",
" task_config={\"batch_size\": 10}\n",
" ), # Generate knowledge graphs from the document chunks.\n",
" Task(\n",
" summarize_text,\n",
" summarization_model=cognee_config.summarization_model,\n",
" task_config={\"batch_size\": 10},\n",
" ),\n",
" Task(add_data_points, task_config={\"batch_size\": 10}),\n",
" ]\n",
"\n",
" pipeline_run = run_tasks(tasks, dataset.id, data_documents, user, \"cognify_pipeline\", context={\"dataset\": dataset})\n",
" pipeline_run_status = None\n",
"\n",
" async for run_status in pipeline_run:\n",
" pipeline_run_status = run_status\n",
"\n",
" except Exception as error:\n",
" raise error"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f0a91b99c6215e09",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:40:17.060626Z",
"start_time": "2025-06-30T11:39:42.388114Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"\u001b[2m2025-10-22T17:32:31.255377\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.255817\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.256159\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.256747\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.256995\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.257279\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.257912\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.258299\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.258583\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.259054\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.259324\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.259597\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.260022\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.260188\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.260491\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.260947\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run started: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.261146\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.261418\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.274433\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task started: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.278010\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task started: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.281207\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task started: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.283453\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task started: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.284792\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task started: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.286257\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task started: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.290895\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.297876\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.302397\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.305646\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.310398\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:31.315796\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.143015\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mOntology file 'None' not found. No owl ontology will be attached to the graph.\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.146010\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mReconnecting to Kuzu database...\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.209446\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mLoaded JSON extension \u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.250606\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'company' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.251247\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'technova solutions' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.251605\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'job_title' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.251917\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'senior data scientist' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.252205\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'location' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.252619\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'san francisco, ca' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.253028\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'field' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.253352\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'machine learning' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.253728\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'programming_language' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.254041\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'python' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.254391\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'r' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.254717\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'sql' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.255024\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'framework' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.255353\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'tensorflow' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:43.255677\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'pytorch' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.085017\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mOntology file 'None' not found. No owl ontology will be attached to the graph.\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.087950\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'person' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.088400\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'sarah nguyen' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.088770\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'company' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.089112\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'quantumtech' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.089456\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'datacore solutions' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.089778\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'educationalinstitution' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.090099\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'university of washington' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.090407\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'university of texas at austin' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.090633\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'date' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.090944\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2014' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:44.091278\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2012' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.378854\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mOntology file 'None' not found. No owl ontology will be attached to the graph.\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.382934\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'person' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.383452\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'michael rodriguez' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.383915\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'company' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.384258\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'alpha analytics' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.384574\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'techinsights' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.384974\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'degree' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.385387\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'm.s. in data science' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.385698\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'b.s. in computer science' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.385967\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'date' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.386237\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2013' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.386666\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2011' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:45.708394\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.501125\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mOntology file 'None' not found. No owl ontology will be attached to the graph.\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.506693\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'person' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.507371\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'david thompson' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.507948\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'educational_institution' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.508407\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'rhode island school of design' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.508825\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'company' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.509329\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'creativeworks agency' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.509869\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'visual innovations' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.510325\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'degree' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.510770\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'b.f.a. in graphic design' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.511233\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'software' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.511647\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'adobe creative suite' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.512106\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'programming_language' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.512512\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'html' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.512869\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'css' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.513266\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'date' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.513711\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2012' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.514473\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2015' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.514806\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2023' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.547201\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mOntology file 'None' not found. No owl ontology will be attached to the graph.\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.550368\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'person' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.550711\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'dr. emily carter' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.551128\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'organization' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.551361\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'innovateai labs' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.551653\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'datawave analytics' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.551930\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'stanford university' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.552277\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'university of california, berkeley' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.552539\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'field' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.552830\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'computer science' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.553091\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'mathematics' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.553319\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'date' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.553629\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2014' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.554157\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2010' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.616823\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.751735\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mOntology file 'None' not found. No owl ontology will be attached to the graph.\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.755512\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'person' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.756051\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'jessica miller' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.756473\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'company' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.756833\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'global enterprises' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.757205\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'market leaders inc' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.757563\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'educational_institution' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.757889\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'university of southern california' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.758206\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'degree' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.758521\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'business administration' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.758826\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for 'date' in category 'classes'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.759101\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2010' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.759371\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2015' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.759642\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mNo close match found for '2013' in category 'individuals'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mOntologyAdapter\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:46.962570\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:47.741024\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:47.791834\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:47.936640\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.174043\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.373978\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.892224\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.892950\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.893556\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.893897\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task completed: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.894327\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.894620\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:50.894963\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:51.137472\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:51.837645\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:51.838458\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:51.838913\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:51.839292\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task completed: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:51.839766\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:51.840098\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:51.840455\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.121172\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.440873\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.819130\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.819594\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.819851\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.820103\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task completed: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.820333\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.820601\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:52.820827\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.127628\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.128258\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.128614\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.129006\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task completed: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.129250\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.129562\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.130173\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.162811\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.163317\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.163749\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.164292\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task completed: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.164713\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.165106\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.165381\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:53.831843\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task started: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:55.281094\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `add_data_points`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:55.281876\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `summarize_text`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:55.282324\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `extract_graph_from_data`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:55.282761\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAsync Generator task completed: `extract_chunks_from_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:55.283172\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `check_permissions_on_dataset`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:55.283546\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCoroutine task completed: `classify_documents`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_base\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:55.283864\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mPipeline run completed: `b5e94561-ce91-54f8-90cb-65ced43a80c8`\u001b[0m [\u001b[0m\u001b[1m\u001b[34mrun_tasks_with_telemetry()\u001b[0m]\u001b[0m\n"
]
}
],
"source": [
"from cognee.modules.users.methods import get_default_user\n",
"from cognee.modules.data.methods import get_datasets_by_name\n",
"from cognee.modules.users.methods import get_user\n",
"\n",
"default_user = await get_default_user()\n",
"\n",
"user = await get_user(default_user.id)\n",
"\n",
"datasets = await get_datasets_by_name([\"example\"], user.id)\n",
"\n",
"await run_cognify_pipeline(datasets[0], user)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "9dd29caf28c272d1",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:40:23.648015Z",
"start_time": "2025-06-30T11:40:23.638810Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"\u001b[2m2025-10-22T17:32:55.436203\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mGraph visualization saved as /Users/daulet/Desktop/dev/cognee-claude/notebooks/.artifacts/graph_visualization.html\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:55.440788\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mThe HTML file has been stored at path: /Users/daulet/Desktop/dev/cognee-claude/notebooks/.artifacts/graph_visualization.html\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n"
]
}
],
"source": [
"import pathlib\n",
"from cognee.api.v1.visualize import visualize_graph\n",
"\n",
"# Use the current working directory instead of __file__:\n",
"notebook_dir = pathlib.Path.cwd()\n",
"\n",
"graph_file_path = (notebook_dir / \".artifacts\" / \"graph_visualization.html\").resolve()\n",
"\n",
"# Make sure to convert to string if visualize_graph expects a string\n",
"b = await visualize_graph(str(graph_file_path))"
]
},
{
"cell_type": "markdown",
"id": "219a6d41",
"metadata": {},
"source": [
" We get the url to the graph on graphistry in the notebook cell bellow, showing nodes and connections made by the cognify process."
]
},
{
"cell_type": "markdown",
"id": "59e6c3c3",
"metadata": {},
"source": [
" We can also do a search on the data to explore the knowledge."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "e5e7dfc8",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:40:31.638407Z",
"start_time": "2025-06-30T11:40:31.331926Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'id': '4d8dda57-2681-5264-a2bd-e2ddfe66a785', 'payload': {'id': '4d8dda57-2681-5264-a2bd-e2ddfe66a785', 'created_at': 1761154370407, 'updated_at': 1761154370407, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'sarah nguyen'}, 'score': 0.5706782937049866}\n",
"{'id': '198e2ab8-75e9-5931-97ab-da9a5a8e188c', 'payload': {'id': '198e2ab8-75e9-5931-97ab-da9a5a8e188c', 'created_at': 1761154372158, 'updated_at': 1761154372158, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'san francisco, ca'}, 'score': 1.3495469093322754}\n",
"{'id': '435dbd37-ab20-503c-9e99-ab8b8a3484e5', 'payload': {'id': '435dbd37-ab20-503c-9e99-ab8b8a3484e5', 'created_at': 1761154372158, 'updated_at': 1761154372158, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'senior data scientist'}, 'score': 1.3935251235961914}\n",
"{'id': '36a5e3c8-c5f5-5ab5-8d59-ea69d8b36932', 'payload': {'id': '36a5e3c8-c5f5-5ab5-8d59-ea69d8b36932', 'created_at': 1761154372497, 'updated_at': 1761154372497, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'jessica miller'}, 'score': 1.4042136669158936}\n",
"{'id': 'fb525174-8b91-5986-8fc5-abf254e9fa2b', 'payload': {'id': 'fb525174-8b91-5986-8fc5-abf254e9fa2b', 'created_at': 1761154370215, 'updated_at': 1761154370215, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'm.s. in data science'}, 'score': 1.4239115715026855}\n",
"{'id': '73ae630f-7b09-5dce-8c18-45d0a57b30f9', 'payload': {'id': '73ae630f-7b09-5dce-8c18-45d0a57b30f9', 'created_at': 1761154370214, 'updated_at': 1761154370214, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'michael rodriguez'}, 'score': 1.4521194696426392}\n",
"{'id': '7e8fd696-6d82-57a8-9a1e-7b21aae89d07', 'payload': {'id': '7e8fd696-6d82-57a8-9a1e-7b21aae89d07', 'created_at': 1761154371182, 'updated_at': 1761154371182, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'dr. emily carter'}, 'score': 1.4712798595428467}\n",
"{'id': 'd953f294-a346-5035-ad49-1ad411c3ed2f', 'payload': {'id': 'd953f294-a346-5035-ad49-1ad411c3ed2f', 'created_at': 1761154370215, 'updated_at': 1761154370215, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'b.s. in computer science'}, 'score': 1.4747629165649414}\n",
"{'id': 'ce8b394a-b30e-52fc-b80a-6352edc60e5b', 'payload': {'id': 'ce8b394a-b30e-52fc-b80a-6352edc60e5b', 'created_at': 1761154371182, 'updated_at': 1761154371182, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'stanford university'}, 'score': 1.48723304271698}\n",
"{'id': '9780afb1-dccc-53eb-9a30-c0d4ce033711', 'payload': {'id': '9780afb1-dccc-53eb-9a30-c0d4ce033711', 'created_at': 1761154371182, 'updated_at': 1761154371182, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'innovateai labs'}, 'score': 1.4982985258102417}\n"
]
}
],
"source": [
"async def search(\n",
" vector_engine,\n",
" collection_name: str,\n",
" query_text: str = None,\n",
"):\n",
" query_vector = (await vector_engine.embedding_engine.embed_text([query_text]))[0]\n",
"\n",
" connection = await vector_engine.get_connection()\n",
" collection = await connection.open_table(collection_name)\n",
"\n",
" results = await collection.vector_search(query_vector).limit(10).to_pandas()\n",
"\n",
" result_values = list(results.to_dict(\"index\").values())\n",
"\n",
" return [\n",
" dict(\n",
" id=str(result[\"id\"]),\n",
" payload=result[\"payload\"],\n",
" score=result[\"_distance\"],\n",
" )\n",
" for result in result_values\n",
" ]\n",
"\n",
"\n",
"from cognee.infrastructure.databases.vector import get_vector_engine\n",
"\n",
"vector_engine = get_vector_engine()\n",
"results = await search(vector_engine, \"Entity_name\", \"sarah.nguyen@example.com\")\n",
"for result in results:\n",
" print(result)"
]
},
{
"cell_type": "markdown",
"id": "81fa2b00",
"metadata": {},
"source": [
" We normalize search output scores so the lower the score of the search result is the higher the chance that it's what you're looking for. In the example above we have searched for node entities in the knowledge graph related to \"sarah.nguyen@example.com\""
]
},
{
"cell_type": "markdown",
"id": "1b94ff96",
"metadata": {},
"source": [
" In the example bellow we'll use cognee search to summarize information regarding the node most related to \"sarah.nguyen@example.com\" in the knowledge graph"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "21a3e9a6",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:40:42.923695Z",
"start_time": "2025-06-30T11:40:42.104461Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"\u001b[2m2025-10-22T17:32:55.955610\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mStarting summary retrieval for query: 'sarah nguyen'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mSummariesRetriever\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.190497\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mFound 6 summaries from vector search\u001b[0m [\u001b[0m\u001b[1m\u001b[34mSummariesRetriever\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.191007\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mReturning 6 summary payloads \u001b[0m [\u001b[0m\u001b[1m\u001b[34mSummariesRetriever\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.191316\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mStarting completion generation for query: 'sarah nguyen'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mSummariesRetriever\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.191586\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mReturning context with 6 item(s)\u001b[0m [\u001b[0m\u001b[1m\u001b[34mSummariesRetriever\u001b[0m]\u001b[0m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\\Extracted summaries are:\n",
"\n",
"{'id': 'ca911035-51e1-511f-8674-0422ef81e150', 'created_at': 1761154370214, 'updated_at': 1761154370214, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'Michael Rodriguez is an experienced Data Scientist with expertise in machine learning and statistical analysis, adept at managing extensive datasets and deriving meaningful business conclusions.'}\n",
"\n",
"{'id': '60ccaf78-2265-5f54-b9e6-daa57da13543', 'created_at': 1761154373882, 'updated_at': 1761154373882, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'Experienced Graphic Designer with over 8 years in visual design and branding.'}\n",
"\n",
"{'id': 'a3ab0d89-fbc9-586c-b0ea-9c51556c88d4', 'created_at': 1761154372158, 'updated_at': 1761154372158, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'Senior Data Scientist with expertise in Machine Learning'}\n",
"\n",
"{'id': '902bae53-0cf3-5efc-9d71-549cc9aaec57', 'created_at': 1761154371182, 'updated_at': 1761154371182, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'Experienced Senior Data Scientist with over 8 years in machine learning and predictive analytics, specializing in algorithm development and model deployment.'}\n",
"\n",
"{'id': '65e40d1f-813a-5cd1-a0a2-ccf532618a43', 'created_at': 1761154370407, 'updated_at': 1761154370407, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'Data Scientist with 6 years of expertise in machine learning, focused on data-driven business solutions and enhancing product performance.'}\n",
"\n",
"{'id': '8abed59a-3092-52a0-83e0-ad724cc75f74', 'created_at': 1761154372497, 'updated_at': 1761154372497, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'Accomplished Sales Manager with a proven history in enhancing sales performance and nurturing effective teams. Strong leadership and communication capabilities.'}\n",
"\n"
]
}
],
"source": [
"from cognee.api.v1.search import SearchType\n",
"\n",
"node = (await vector_engine.search(\"Entity_name\", \"sarah.nguyen@example.com\"))[0]\n",
"node_name = node.payload[\"text\"]\n",
"\n",
"search_results = await cognee.search(query_type=SearchType.SUMMARIES, query_text=node_name)\n",
"print(\"\\n\\Extracted summaries are:\\n\")\n",
"for result in search_results:\n",
" print(f\"{result}\\n\")"
]
},
{
"cell_type": "markdown",
"id": "fd6e5fe2",
"metadata": {},
"source": [
"In this example we'll use cognee search to find chunks in which the node most related to \"sarah.nguyen@example.com\" is a part of"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "c7a8abff",
"metadata": {
"ExecuteTime": {
"end_time": "2025-06-30T11:40:46.410055Z",
"start_time": "2025-06-30T11:40:46.103960Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"\u001b[2m2025-10-22T17:32:56.208076\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mStarting chunk retrieval for query: 'sarah nguyen'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mChunksRetriever\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.421367\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mFound 6 chunks from vector search\u001b[0m [\u001b[0m\u001b[1m\u001b[34mChunksRetriever\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.421954\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mReturning 6 chunk payloads \u001b[0m [\u001b[0m\u001b[1m\u001b[34mChunksRetriever\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.422324\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mStarting completion generation for query: 'sarah nguyen'\u001b[0m [\u001b[0m\u001b[1m\u001b[34mChunksRetriever\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.422688\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mReturning context with 6 item(s)\u001b[0m [\u001b[0m\u001b[1m\u001b[34mChunksRetriever\u001b[0m]\u001b[0m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Extracted chunks are:\n",
"\n",
"{'id': '3fa6c5fc-402e-5dc9-aad0-8e0d2951ffe9', 'created_at': 1761154370407, 'updated_at': 1761154370407, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': '\\nCV 3: Relevant\\nName: Sarah Nguyen\\nContact Information:\\n\\nEmail: sarah.nguyen@example.com\\nPhone: (555) 345-6789\\nSummary:\\n\\nData Scientist specializing in machine learning with 6 years of experience. Passionate about leveraging data to drive business solutions and improve product performance.\\n\\nEducation:\\n\\nM.S. in Statistics, University of Washington (2014)\\nB.S. in Applied Mathematics, University of Texas at Austin (2012)\\nExperience:\\n\\nData Scientist, QuantumTech (2016 – Present)\\nDesigned and implemented machine learning algorithms for financial forecasting.\\nImproved model efficiency by 20% through algorithm optimization.\\nJunior Data Scientist, DataCore Solutions (2014 – 2016)\\nAssisted in developing predictive models for supply chain optimization.\\nConducted data cleaning and preprocessing on large datasets.\\nSkills:\\n\\nProgramming Languages: Python, R\\nMachine Learning Frameworks: PyTorch, Scikit-Learn\\nStatistical Analysis: SAS, SPSS\\nCloud Platforms: AWS, Azure\\n'}\n",
"\n",
"{'id': '7dd3c5d3-e566-56ed-a12b-9a561c9681e1', 'created_at': 1761154372497, 'updated_at': 1761154372497, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': \"\\nCV 5: Not Relevant\\nName: Jessica Miller\\nContact Information:\\n\\nEmail: jessica.miller@example.com\\nPhone: (555) 567-8901\\nSummary:\\n\\nExperienced Sales Manager with a strong track record in driving sales growth and building high-performing teams. Excellent communication and leadership skills.\\n\\nEducation:\\n\\nB.A. in Business Administration, University of Southern California (2010)\\nExperience:\\n\\nSales Manager, Global Enterprises (2015 – Present)\\nManaged a sales team of 15 members, achieving a 20% increase in annual revenue.\\nDeveloped sales strategies that expanded customer base by 25%.\\nSales Representative, Market Leaders Inc. (2010 – 2015)\\nConsistently exceeded sales targets and received the 'Top Salesperson' award in 2013.\\nSkills:\\n\\nSales Strategy and Planning\\nTeam Leadership and Development\\nCRM Software: Salesforce, Zoho\\nNegotiation and Relationship Building\\n\"}\n",
"\n",
"{'id': 'bd353807-e0a4-55a9-b646-0471f17cd7b7', 'created_at': 1761154373882, 'updated_at': 1761154373882, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': '\\nCV 4: Not Relevant\\nName: David Thompson\\nContact Information:\\n\\nEmail: david.thompson@example.com\\nPhone: (555) 456-7890\\nSummary:\\n\\nCreative Graphic Designer with over 8 years of experience in visual design and branding. Proficient in Adobe Creative Suite and passionate about creating compelling visuals.\\n\\nEducation:\\n\\nB.F.A. in Graphic Design, Rhode Island School of Design (2012)\\nExperience:\\n\\nSenior Graphic Designer, CreativeWorks Agency (2015 – Present)\\nLed design projects for clients in various industries.\\nCreated branding materials that increased client engagement by 30%.\\nGraphic Designer, Visual Innovations (2012 – 2015)\\nDesigned marketing collateral, including brochures, logos, and websites.\\nCollaborated with the marketing team to develop cohesive brand strategies.\\nSkills:\\n\\nDesign Software: Adobe Photoshop, Illustrator, InDesign\\nWeb Design: HTML, CSS\\nSpecialties: Branding and Identity, Typography\\n'}\n",
"\n",
"{'id': '54ced21b-9c63-5d18-9368-64b267a69770', 'created_at': 1761154371182, 'updated_at': 1761154371182, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': '\\nCV 1: Relevant\\nName: Dr. Emily Carter\\nContact Information:\\n\\nEmail: emily.carter@example.com\\nPhone: (555) 123-4567\\nSummary:\\n\\nSenior Data Scientist with over 8 years of experience in machine learning and predictive analytics. Expertise in developing advanced algorithms and deploying scalable models in production environments.\\n\\nEducation:\\n\\nPh.D. in Computer Science, Stanford University (2014)\\nB.S. in Mathematics, University of California, Berkeley (2010)\\nExperience:\\n\\nSenior Data Scientist, InnovateAI Labs (2016 – Present)\\nLed a team in developing machine learning models for natural language processing applications.\\nImplemented deep learning algorithms that improved prediction accuracy by 25%.\\nCollaborated with cross-functional teams to integrate models into cloud-based platforms.\\nData Scientist, DataWave Analytics (2014 – 2016)\\nDeveloped predictive models for customer segmentation and churn analysis.\\nAnalyzed large datasets using Hadoop and Spark frameworks.\\nSkills:\\n\\nProgramming Languages: Python, R, SQL\\nMachine Learning: TensorFlow, Keras, Scikit-Learn\\nBig Data Technologies: Hadoop, Spark\\nData Visualization: Tableau, Matplotlib\\n'}\n",
"\n",
"{'id': '090781c7-a653-53de-b685-8d1152d142f3', 'created_at': 1761154370214, 'updated_at': 1761154370214, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': '\\nCV 2: Relevant\\nName: Michael Rodriguez\\nContact Information:\\n\\nEmail: michael.rodriguez@example.com\\nPhone: (555) 234-5678\\nSummary:\\n\\nData Scientist with a strong background in machine learning and statistical modeling. Skilled in handling large datasets and translating data into actionable business insights.\\n\\nEducation:\\n\\nM.S. in Data Science, Carnegie Mellon University (2013)\\nB.S. in Computer Science, University of Michigan (2011)\\nExperience:\\n\\nSenior Data Scientist, Alpha Analytics (2017 – Present)\\nDeveloped machine learning models to optimize marketing strategies.\\nReduced customer acquisition cost by 15% through predictive modeling.\\nData Scientist, TechInsights (2013 – 2017)\\nAnalyzed user behavior data to improve product features.\\nImplemented A/B testing frameworks to evaluate product changes.\\nSkills:\\n\\nProgramming Languages: Python, Java, SQL\\nMachine Learning: Scikit-Learn, XGBoost\\nData Visualization: Seaborn, Plotly\\nDatabases: MySQL, MongoDB\\n'}\n",
"\n",
"{'id': 'a18a68c8-f05b-5c20-bd84-23cc9c829b58', 'created_at': 1761154372158, 'updated_at': 1761154372158, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': 'Senior Data Scientist (Machine Learning)\\n\\nCompany: TechNova Solutions\\nLocation: San Francisco, CA\\n\\nJob Description:\\n\\nTechNova Solutions is seeking a Senior Data Scientist specializing in Machine Learning to join our dynamic analytics team. The ideal candidate will have a strong background in developing and deploying machine learning models, working with large datasets, and translating complex data into actionable insights.\\n\\nResponsibilities:\\n\\nDevelop and implement advanced machine learning algorithms and models.\\nAnalyze large, complex datasets to extract meaningful patterns and insights.\\nCollaborate with cross-functional teams to integrate predictive models into products.\\nStay updated with the latest advancements in machine learning and data science.\\nMentor junior data scientists and provide technical guidance.\\nQualifications:\\n\\nMaster’s or Ph.D. in Data Science, Computer Science, Statistics, or a related field.\\n5+ years of experience in data science and machine learning.\\nProficient in Python, R, and SQL.\\nExperience with deep learning frameworks (e.g., TensorFlow, PyTorch).\\nStrong problem-solving skills and attention to detail.\\nCandidate CVs\\n'}\n",
"\n"
]
}
],
"source": [
"search_results = await cognee.search(query_type=SearchType.CHUNKS, query_text=node_name)\n",
"print(\"\\n\\nExtracted chunks are:\\n\")\n",
"for result in search_results:\n",
" print(f\"{result}\\n\")"
]
},
{
"cell_type": "markdown",
"id": "47f0112f",
"metadata": {},
"source": [
" In this example we'll use cognee search to give us insights from the knowledge graph related to the node most related to \"sarah.nguyen@example.com\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "706a3954",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"\u001b[2m2025-10-22T17:32:56.440035\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mGraph projection completed: 79 nodes, 155 edges in 0.00s\u001b[0m [\u001b[0m\u001b[1m\u001b[34mCogneeGraph\u001b[0m]\u001b[0m\n",
"\n",
"\u001b[2m2025-10-22T17:32:56.733002\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.06s\u001b[0m [\u001b[0m\u001b[1m\u001b[34mcognee.shared.logging_utils\u001b[0m]\u001b[0m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Extracted sentences are:\n",
"\n",
"Sarah Nguyen is a Data Scientist specializing in machine learning, with 6 years of experience. She holds an M.S. in Statistics from the University of Washington and a B.S. in Applied Mathematics from the University of Texas at Austin. She currently works at QuantumTech and previously worked at DataCore Solutions.\n",
"\n"
]
},
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
"\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
"\u001b[1;31mClick here for more info. \n",
"\u001b[1;31mView Jupyter log for further details."
]
}
],
"source": [
"search_results = await cognee.search(query_type=SearchType.GRAPH_COMPLETION, query_text=node_name)\n",
"print(\"\\n\\nExtracted sentences are:\\n\")\n",
"for result in search_results:\n",
" print(f\"{result}\\n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d42b3245",
"metadata": {},
"outputs": [
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
"\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
"\u001b[1;31mClick here for more info. \n",
"\u001b[1;31mView Jupyter log for further details."
]
}
],
"source": [
"# Only exit in interactive mode, not during GitHub Actions\n",
"import os\n",
"\n",
"# Skip exit if we're running in GitHub Actions\n",
"if not os.environ.get('GITHUB_ACTIONS'):\n",
" print(\"Exiting kernel to clean up resources...\")\n",
" os._exit(0)\n",
"else:\n",
" print(\"Skipping kernel exit - running in GitHub Actions\")"
]
},
{
"cell_type": "markdown",
"id": "288ab570",
"metadata": {},
"source": [
"### Give us a star if you like it!\n",
"https://github.com/topoteretes/cognee"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}