299 lines
8.8 KiB
Text
Vendored
299 lines
8.8 KiB
Text
Vendored
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "25cf0a40e669a70",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Understanding Ontologies with Cognee\n",
|
|
"\n",
|
|
"This notebook demonstrates how to work with ontologies in scientific research using the Cognee framework. We'll explore how ontologies can enhance our understanding and querying of scientific papers.\n",
|
|
"\n",
|
|
"## What is an Ontology?\n",
|
|
"\n",
|
|
"An ontology is a formal representation of knowledge that defines:\n",
|
|
"- Concepts within a domain\n",
|
|
"- Relationships between concepts\n",
|
|
"- Properties and attributes\n",
|
|
"- Rules and constraints\n",
|
|
"\n",
|
|
"Key terms:\n",
|
|
"- **Classes**: Categories or types (e.g., Disease, Symptom)\n",
|
|
"- **Instances**: Specific examples of classes (e.g., Type 2 Diabetes)\n",
|
|
"- **Properties**: Relationships between classes/instances (e.g., hasSymptom)\n",
|
|
"- **Axioms**: Logical statements defining relationships"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "441248da37f2b901",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"First, let's install the required packages and set up our environment:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "8cf7ba29f9a150af",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Install required package\n",
|
|
"# !pip install cognee"
|
|
],
|
|
"outputs": [],
|
|
"execution_count": null
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "abb86851",
|
|
"metadata": {},
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"# Set up OpenAI API key (required for Cognee's LLM functionality)\n",
|
|
"if \"LLM_API_KEY\" not in os.environ:\n",
|
|
" os.environ[\"LLM_API_KEY\"] = \"your-api-key-here\" # Replace with your API key"
|
|
],
|
|
"outputs": [],
|
|
"execution_count": null
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "d825d126b3a0ec26",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Import required libraries\n",
|
|
"import cognee\n",
|
|
"from cognee.shared.logging_utils import get_logger\n",
|
|
"\n",
|
|
"cognee.config.set_llm_model(\"gpt-4o-mini\")\n",
|
|
"cognee.config.set_llm_provider(\"openai\")\n",
|
|
"from cognee.api.v1.search import SearchType\n",
|
|
"\n",
|
|
"logger = get_logger()"
|
|
],
|
|
"outputs": [],
|
|
"execution_count": null
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6af350837e86b7a1",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Creating the Pipeline\n",
|
|
"\n",
|
|
"Let's create a pipeline that will:\n",
|
|
"1. Clean existing data\n",
|
|
"2. Process scientific papers\n",
|
|
"3. Apply ontological knowledge"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "4d0e4a58e4207a7d",
|
|
"metadata": {},
|
|
"source": [
|
|
"async def run_pipeline(config=None):\n",
|
|
" # Clean existing data\n",
|
|
" await cognee.prune.prune_data()\n",
|
|
" await cognee.prune.prune_system(metadata=True)\n",
|
|
" \n",
|
|
" # Set up path to scientific papers\n",
|
|
" scientific_papers_dir = os.path.join(\n",
|
|
" os.path.dirname(os.path.dirname(os.path.abspath(\".\"))), \n",
|
|
" \"cognee\",\n",
|
|
" \"examples\",\n",
|
|
" \"data\", \n",
|
|
" \"scientific_papers/\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Add papers to the system\n",
|
|
" await cognee.add(scientific_papers_dir)\n",
|
|
" \n",
|
|
" # Cognify with optional ontology\n",
|
|
" return await cognee.cognify(config=config)\n",
|
|
"\n",
|
|
"async def query_pipeline(questions):\n",
|
|
" answers = []\n",
|
|
" for question in questions:\n",
|
|
" search_results = await cognee.search(\n",
|
|
" query_type=SearchType.GRAPH_COMPLETION,\n",
|
|
" query_text=question,\n",
|
|
" )\n",
|
|
" answers.append(search_results)\n",
|
|
" return answers"
|
|
],
|
|
"outputs": [],
|
|
"execution_count": null
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c87c21a75d6f4d79",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Running the Demo\n",
|
|
"\n",
|
|
"Let's test our system with some medical questions, comparing results with and without ontological knowledge:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "1363772d2b48f5c0",
|
|
"metadata": {},
|
|
"source": [
|
|
"from cognee.modules.ontology.rdf_xml.RDFLibOntologyResolver import RDFLibOntologyResolver\n",
|
|
"from cognee.modules.ontology.ontology_config import Config\n",
|
|
"# Test questions\n",
|
|
"questions = [\n",
|
|
" \"What are common risk factors for Type 2 Diabetes?\",\n",
|
|
" \"What preventive measures reduce the risk of Hypertension?\",\n",
|
|
" \"What symptoms indicate possible Cardiovascular Disease?\",\n",
|
|
" \"What diseases are associated with Obesity?\"\n",
|
|
"]\n",
|
|
"\n",
|
|
"# Path to medical ontology\n",
|
|
"ontology_path = \"../examples/python/ontology_input_example/enriched_medical_ontology_with_classes.owl\" # Update with your ontology path\n",
|
|
"\n",
|
|
"config: Config = {\n",
|
|
" \"ontology_config\": {\n",
|
|
" \"ontology_resolver\": RDFLibOntologyResolver(ontology_file=ontology_path)\n",
|
|
" }\n",
|
|
" }\n",
|
|
"\n",
|
|
"# Run with ontology\n",
|
|
"print(\"\\n--- Results WITH ontology ---\\n\")\n",
|
|
"await run_pipeline(config=config)\n",
|
|
"answers_with = await query_pipeline(questions)\n",
|
|
"for q, a in zip(questions, answers_with):\n",
|
|
" print(f\"Q: {q}\\nA: {a}\\n\")"
|
|
],
|
|
"outputs": [],
|
|
"execution_count": null
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "3aa18f4cdd5ceff6",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Run without ontology\n",
|
|
"print(\"\\n--- Results WITHOUT ontology ---\\n\")\n",
|
|
"await run_pipeline()\n",
|
|
"answers_without = await query_pipeline(questions)\n",
|
|
"for q, a in zip(questions, answers_without):\n",
|
|
" print(f\"Q: {q}\\nA: {a}\\n\")"
|
|
],
|
|
"outputs": [],
|
|
"execution_count": null
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c60533d2423acdb0",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Visualizing the Knowledge Graph\n",
|
|
"\n",
|
|
"Let's visualize how our ontology connects different medical concepts:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "36ee2a360f47a054",
|
|
"metadata": {},
|
|
"source": [
|
|
"import webbrowser\n",
|
|
"import os\n",
|
|
"from cognee.api.v1.visualize.visualize import visualize_graph\n",
|
|
"html = await visualize_graph()\n",
|
|
"home_dir = os.path.expanduser(\"~\")\n",
|
|
"html_file = os.path.join(home_dir, \"graph_visualization.html\")\n",
|
|
"display(html_file)\n",
|
|
"webbrowser.open(f\"file://{html_file}\")"
|
|
],
|
|
"outputs": [],
|
|
"execution_count": null
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ff39326921b75273",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Understanding the Results\n",
|
|
"\n",
|
|
"The demonstration above shows how ontologies enhance our analysis by:\n",
|
|
"\n",
|
|
"1. **Making Connections**: \n",
|
|
" - Linking related medical concepts even when not explicitly stated\n",
|
|
" - Identifying relationships between symptoms, diseases, and risk factors\n",
|
|
"\n",
|
|
"2. **Standardizing Terms**: \n",
|
|
" - Unifying different ways of referring to the same medical condition\n",
|
|
" - Ensuring consistent terminology across documents\n",
|
|
"\n",
|
|
"3. **Enabling Inference**: \n",
|
|
" - Drawing conclusions based on ontological relationships\n",
|
|
" - Discovering implicit connections in the data\n",
|
|
"\n",
|
|
"## Next Steps\n",
|
|
"\n",
|
|
"To learn more about Cognee and ontologies:\n",
|
|
"1. Check out the [Cognee documentation](https://docs.cognee.ai/)\n",
|
|
"2. Explore more examples in the `examples` directory\n",
|
|
"3. Try creating your own domain-specific ontology\n",
|
|
"\n",
|
|
"Remember to:\n",
|
|
"- Place your scientific papers in the appropriate directory\n",
|
|
"- Update the ontology path to point to your .owl file\n",
|
|
"- Replace the API key with your own OpenAI key"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"id": "8d2a0fe555a7bc0f",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Only exit in interactive mode, not during GitHub Actions\n",
|
|
"import os\n",
|
|
"\n",
|
|
"# Skip exit if we're running in GitHub Actions\n",
|
|
"if not os.environ.get('GITHUB_ACTIONS'):\n",
|
|
" print(\"Exiting kernel to clean up resources...\")\n",
|
|
" os._exit(0)\n",
|
|
"else:\n",
|
|
" print(\"Skipping kernel exit - running in GitHub Actions\")"
|
|
],
|
|
"outputs": [],
|
|
"execution_count": null
|
|
},
|
|
{
|
|
"metadata": {},
|
|
"cell_type": "code",
|
|
"source": "",
|
|
"id": "adb6601890237b6a",
|
|
"outputs": [],
|
|
"execution_count": null
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": ".venv",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.12.7"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|