<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: vasilije <vas.markovic@gmail.com> Co-authored-by: Igor Ilic <30923996+dexters1@users.noreply.github.com> Co-authored-by: Vasilije <8619304+Vasilije1990@users.noreply.github.com> Co-authored-by: Igor Ilic <igorilic03@gmail.com> Co-authored-by: Hande <159312713+hande-k@users.noreply.github.com> Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com> Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com> Co-authored-by: Daniel Molnar <soobrosa@gmail.com> Co-authored-by: Diego Baptista Theuerkauf <34717973+diegoabt@users.noreply.github.com>
308 lines
8.6 KiB
Text
Vendored
308 lines
8.6 KiB
Text
Vendored
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "25cf0a40e669a70",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Understanding Ontologies with Cognee\n",
|
|
"\n",
|
|
"This notebook demonstrates how to work with ontologies in scientific research using the Cognee framework. We'll explore how ontologies can enhance our understanding and querying of scientific papers.\n",
|
|
"\n",
|
|
"## What is an Ontology?\n",
|
|
"\n",
|
|
"An ontology is a formal representation of knowledge that defines:\n",
|
|
"- Concepts within a domain\n",
|
|
"- Relationships between concepts\n",
|
|
"- Properties and attributes\n",
|
|
"- Rules and constraints\n",
|
|
"\n",
|
|
"Key terms:\n",
|
|
"- **Classes**: Categories or types (e.g., Disease, Symptom)\n",
|
|
"- **Instances**: Specific examples of classes (e.g., Type 2 Diabetes)\n",
|
|
"- **Properties**: Relationships between classes/instances (e.g., hasSymptom)\n",
|
|
"- **Axioms**: Logical statements defining relationships"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "441248da37f2b901",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"First, let's install the required packages and set up our environment:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"id": "8cf7ba29f9a150af",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-03-26T16:17:55.937140Z",
|
|
"start_time": "2025-03-26T16:17:55.908542Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Install required package\n",
|
|
"# !pip install cognee"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"id": "d825d126b3a0ec26",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-03-26T16:18:09.382400Z",
|
|
"start_time": "2025-03-26T16:18:09.342349Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Import required libraries\n",
|
|
"import cognee\n",
|
|
"import asyncio\n",
|
|
"from cognee.shared.logging_utils import get_logger\n",
|
|
"import os\n",
|
|
"import textwrap\n",
|
|
"from cognee.api.v1.search import SearchType\n",
|
|
"from cognee.api.v1.visualize.visualize import visualize_graph\n",
|
|
"\n",
|
|
"logger = get_logger()\n",
|
|
"\n",
|
|
"# Set up OpenAI API key (required for Cognee's LLM functionality)\n",
|
|
"os.environ[\"LLM_API_KEY\"] = \"your-api-key-here\" # Replace with your API key"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6af350837e86b7a1",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Creating the Pipeline\n",
|
|
"\n",
|
|
"Let's create a pipeline that will:\n",
|
|
"1. Clean existing data\n",
|
|
"2. Process scientific papers\n",
|
|
"3. Apply ontological knowledge"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"id": "4d0e4a58e4207a7d",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-09T17:12:54.006718Z",
|
|
"start_time": "2025-04-09T17:12:53.992906Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"async def run_pipeline(ontology_path=None):\n",
|
|
" # Clean existing data\n",
|
|
" await cognee.prune.prune_data()\n",
|
|
" await cognee.prune.prune_system(metadata=True)\n",
|
|
" \n",
|
|
" # Set up path to scientific papers\n",
|
|
" scientific_papers_dir = os.path.join(\n",
|
|
" os.path.dirname(os.path.dirname(os.path.abspath(\".\"))), \n",
|
|
" \"cognee\",\n",
|
|
" \"examples\",\n",
|
|
" \"data\", \n",
|
|
" \"scientific_papers/\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Add papers to the system\n",
|
|
" await cognee.add(scientific_papers_dir)\n",
|
|
" \n",
|
|
" # Cognify with optional ontology\n",
|
|
" return await cognee.cognify(ontology_file_path=ontology_path)\n",
|
|
"\n",
|
|
"async def query_pipeline(questions):\n",
|
|
" answers = []\n",
|
|
" for question in questions:\n",
|
|
" search_results = await cognee.search(\n",
|
|
" query_type=SearchType.GRAPH_COMPLETION,\n",
|
|
" query_text=question,\n",
|
|
" )\n",
|
|
" answers.append(search_results)\n",
|
|
" return answers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c87c21a75d6f4d79",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Running the Demo\n",
|
|
"\n",
|
|
"Let's test our system with some medical questions, comparing results with and without ontological knowledge:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"id": "1363772d2b48f5c0",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-09T17:14:31.818452Z",
|
|
"start_time": "2025-04-09T17:12:55.491598Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Test questions\n",
|
|
"questions = [\n",
|
|
" \"What are common risk factors for Type 2 Diabetes?\",\n",
|
|
" \"What preventive measures reduce the risk of Hypertension?\",\n",
|
|
" \"What symptoms indicate possible Cardiovascular Disease?\",\n",
|
|
" \"What diseases are associated with Obesity?\"\n",
|
|
"]\n",
|
|
"\n",
|
|
"# Path to medical ontology\n",
|
|
"ontology_path = \"examples/python/ontology_input_example/enriched_medical_ontology_with_classes.owl\" # Update with your ontology path\n",
|
|
"\n",
|
|
"# Run with ontology\n",
|
|
"print(\"\\n--- Results WITH ontology ---\\n\")\n",
|
|
"await run_pipeline(ontology_path=ontology_path)\n",
|
|
"answers_with = await query_pipeline(questions)\n",
|
|
"for q, a in zip(questions, answers_with):\n",
|
|
" print(f\"Q: {q}\\nA: {a}\\n\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "89e2e53dcecb78eb",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"id": "3aa18f4cdd5ceff6",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-09T14:32:24.891560Z",
|
|
"start_time": "2025-04-09T14:30:47.863808Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Run without ontology\n",
|
|
"print(\"\\n--- Results WITHOUT ontology ---\\n\")\n",
|
|
"await run_pipeline()\n",
|
|
"answers_without = await query_pipeline(questions)\n",
|
|
"for q, a in zip(questions, answers_without):\n",
|
|
" print(f\"Q: {q}\\nA: {a}\\n\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c60533d2423acdb0",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Visualizing the Knowledge Graph\n",
|
|
"\n",
|
|
"Let's visualize how our ontology connects different medical concepts:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"id": "36ee2a360f47a054",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-09T15:25:33.512697Z",
|
|
"start_time": "2025-04-09T15:25:33.471854Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from cognee.api.v1.visualize import visualize_graph\n",
|
|
"await visualize_graph()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9268fa61dbc81664",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2025-04-10T16:34:04.760472Z",
|
|
"start_time": "2025-04-10T16:34:04.736095Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ff39326921b75273",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Understanding the Results\n",
|
|
"\n",
|
|
"The demonstration above shows how ontologies enhance our analysis by:\n",
|
|
"\n",
|
|
"1. **Making Connections**: \n",
|
|
" - Linking related medical concepts even when not explicitly stated\n",
|
|
" - Identifying relationships between symptoms, diseases, and risk factors\n",
|
|
"\n",
|
|
"2. **Standardizing Terms**: \n",
|
|
" - Unifying different ways of referring to the same medical condition\n",
|
|
" - Ensuring consistent terminology across documents\n",
|
|
"\n",
|
|
"3. **Enabling Inference**: \n",
|
|
" - Drawing conclusions based on ontological relationships\n",
|
|
" - Discovering implicit connections in the data\n",
|
|
"\n",
|
|
"## Next Steps\n",
|
|
"\n",
|
|
"To learn more about Cognee and ontologies:\n",
|
|
"1. Check out the [Cognee documentation](https://docs.cognee.ai/)\n",
|
|
"2. Explore more examples in the `examples` directory\n",
|
|
"3. Try creating your own domain-specific ontology\n",
|
|
"\n",
|
|
"Remember to:\n",
|
|
"- Place your scientific papers in the appropriate directory\n",
|
|
"- Update the ontology path to point to your .owl file\n",
|
|
"- Replace the API key with your own OpenAI key"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8d2a0fe555a7bc0f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 2
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython2",
|
|
"version": "2.7.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|