<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
308 lines
8.6 KiB
Text
308 lines
8.6 KiB
Text
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "25cf0a40e669a70",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Understanding Ontologies with Cognee\n",
|
|
"\n",
|
|
"This notebook demonstrates how to work with ontologies in scientific research using the Cognee framework. We'll explore how ontologies can enhance our understanding and querying of scientific papers.\n",
|
|
"\n",
|
|
"## What is an Ontology?\n",
|
|
"\n",
|
|
"An ontology is a formal representation of knowledge that defines:\n",
|
|
"- Concepts within a domain\n",
|
|
"- Relationships between concepts\n",
|
|
"- Properties and attributes\n",
|
|
"- Rules and constraints\n",
|
|
"\n",
|
|
"Key terms:\n",
|
|
"- **Classes**: Categories or types (e.g., Disease, Symptom)\n",
|
|
"- **Instances**: Specific examples of classes (e.g., Type 2 Diabetes)\n",
|
|
"- **Properties**: Relationships between classes/instances (e.g., hasSymptom)\n",
|
|
"- **Axioms**: Logical statements defining relationships"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "441248da37f2b901",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"First, let's install the required packages and set up our environment:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"id": "8cf7ba29f9a150af",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-03-26T16:17:55.937140Z",
|
|
"start_time": "2025-03-26T16:17:55.908542Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Install required package\n",
|
|
"# !pip install cognee"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"id": "d825d126b3a0ec26",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-03-26T16:18:09.382400Z",
|
|
"start_time": "2025-03-26T16:18:09.342349Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Import required libraries\n",
|
|
"import cognee\n",
|
|
"import asyncio\n",
|
|
"from cognee.shared.logging_utils import get_logger\n",
|
|
"import os\n",
|
|
"import textwrap\n",
|
|
"from cognee.api.v1.search import SearchType\n",
|
|
"from cognee.api.v1.visualize.visualize import visualize_graph\n",
|
|
"\n",
|
|
"logger = get_logger()\n",
|
|
"\n",
|
|
"# Set up OpenAI API key (required for Cognee's LLM functionality)\n",
|
|
"os.environ[\"LLM_API_KEY\"] = \"your-api-key-here\" # Replace with your API key"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6af350837e86b7a1",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Creating the Pipeline\n",
|
|
"\n",
|
|
"Let's create a pipeline that will:\n",
|
|
"1. Clean existing data\n",
|
|
"2. Process scientific papers\n",
|
|
"3. Apply ontological knowledge"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"id": "4d0e4a58e4207a7d",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-09T17:12:54.006718Z",
|
|
"start_time": "2025-04-09T17:12:53.992906Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"async def run_pipeline(ontology_path=None):\n",
|
|
" # Clean existing data\n",
|
|
" await cognee.prune.prune_data()\n",
|
|
" await cognee.prune.prune_system(metadata=True)\n",
|
|
" \n",
|
|
" # Set up path to scientific papers\n",
|
|
" scientific_papers_dir = os.path.join(\n",
|
|
" os.path.dirname(os.path.dirname(os.path.abspath(\".\"))), \n",
|
|
" \"cognee\",\n",
|
|
" \"examples\",\n",
|
|
" \"data\", \n",
|
|
" \"scientific_papers/\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Add papers to the system\n",
|
|
" await cognee.add(scientific_papers_dir)\n",
|
|
" \n",
|
|
" # Cognify with optional ontology\n",
|
|
" return await cognee.cognify(ontology_file_path=ontology_path)\n",
|
|
"\n",
|
|
"async def query_pipeline(questions):\n",
|
|
" answers = []\n",
|
|
" for question in questions:\n",
|
|
" search_results = await cognee.search(\n",
|
|
" query_type=SearchType.GRAPH_COMPLETION,\n",
|
|
" query_text=question,\n",
|
|
" )\n",
|
|
" answers.append(search_results)\n",
|
|
" return answers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c87c21a75d6f4d79",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Running the Demo\n",
|
|
"\n",
|
|
"Let's test our system with some medical questions, comparing results with and without ontological knowledge:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"id": "1363772d2b48f5c0",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-09T17:14:31.818452Z",
|
|
"start_time": "2025-04-09T17:12:55.491598Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Test questions\n",
|
|
"questions = [\n",
|
|
" \"What are common risk factors for Type 2 Diabetes?\",\n",
|
|
" \"What preventive measures reduce the risk of Hypertension?\",\n",
|
|
" \"What symptoms indicate possible Cardiovascular Disease?\",\n",
|
|
" \"What diseases are associated with Obesity?\"\n",
|
|
"]\n",
|
|
"\n",
|
|
"# Path to medical ontology\n",
|
|
"ontology_path = \"examples/python/ontology_input_example/enriched_medical_ontology_with_classes.owl\" # Update with your ontology path\n",
|
|
"\n",
|
|
"# Run with ontology\n",
|
|
"print(\"\\n--- Results WITH ontology ---\\n\")\n",
|
|
"await run_pipeline(ontology_path=ontology_path)\n",
|
|
"answers_with = await query_pipeline(questions)\n",
|
|
"for q, a in zip(questions, answers_with):\n",
|
|
" print(f\"Q: {q}\\nA: {a}\\n\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "89e2e53dcecb78eb",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"id": "3aa18f4cdd5ceff6",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-09T14:32:24.891560Z",
|
|
"start_time": "2025-04-09T14:30:47.863808Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Run without ontology\n",
|
|
"print(\"\\n--- Results WITHOUT ontology ---\\n\")\n",
|
|
"await run_pipeline()\n",
|
|
"answers_without = await query_pipeline(questions)\n",
|
|
"for q, a in zip(questions, answers_without):\n",
|
|
" print(f\"Q: {q}\\nA: {a}\\n\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c60533d2423acdb0",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Visualizing the Knowledge Graph\n",
|
|
"\n",
|
|
"Let's visualize how our ontology connects different medical concepts:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"id": "36ee2a360f47a054",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-09T15:25:33.512697Z",
|
|
"start_time": "2025-04-09T15:25:33.471854Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from cognee.api.v1.visualize import visualize_graph\n",
|
|
"await visualize_graph()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9268fa61dbc81664",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-10T16:34:04.760472Z",
|
|
"start_time": "2025-04-10T16:34:04.736095Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ff39326921b75273",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Understanding the Results\n",
|
|
"\n",
|
|
"The demonstration above shows how ontologies enhance our analysis by:\n",
|
|
"\n",
|
|
"1. **Making Connections**: \n",
|
|
" - Linking related medical concepts even when not explicitly stated\n",
|
|
" - Identifying relationships between symptoms, diseases, and risk factors\n",
|
|
"\n",
|
|
"2. **Standardizing Terms**: \n",
|
|
" - Unifying different ways of referring to the same medical condition\n",
|
|
" - Ensuring consistent terminology across documents\n",
|
|
"\n",
|
|
"3. **Enabling Inference**: \n",
|
|
" - Drawing conclusions based on ontological relationships\n",
|
|
" - Discovering implicit connections in the data\n",
|
|
"\n",
|
|
"## Next Steps\n",
|
|
"\n",
|
|
"To learn more about Cognee and ontologies:\n",
|
|
"1. Check out the [Cognee documentation](https://docs.cognee.ai/)\n",
|
|
"2. Explore more examples in the `examples` directory\n",
|
|
"3. Try creating your own domain-specific ontology\n",
|
|
"\n",
|
|
"Remember to:\n",
|
|
"- Place your scientific papers in the appropriate directory\n",
|
|
"- Update the ontology path to point to your .owl file\n",
|
|
"- Replace the API key with your own OpenAI key"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8d2a0fe555a7bc0f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 2
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython2",
|
|
"version": "2.7.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|