Merge branch 'dev' into COG-3146

2026-01-11 16:16:06 +01:00 · 2026-01-11 16:16:06 +01:00 · 6ec44f64f7
commit 6ec44f64f7
parent e38c33c1b5 ab990f7c5c
4 changed files with 629 additions and 7 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,588 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Cognee is an open-source AI memory platform that transforms raw data into persistent knowledge graphs for AI agents. It replaces traditional RAG (Retrieval-Augmented Generation) with an ECL (Extract, Cognify, Load) pipeline combining vector search, graph databases, and LLM-powered entity extraction.
+
+**Requirements**: Python 3.9 - 3.12
+
+## Development Commands
+
+### Setup
+```bash
+# Create virtual environment (recommended: uv)
+uv venv && source .venv/bin/activate
+
+# Install with pip, poetry, or uv
+uv pip install -e .
+
+# Install with dev dependencies
+uv pip install -e ".[dev]"
+
+# Install with specific extras
+uv pip install -e ".[postgres,neo4j,docs,chromadb]"
+
+# Set up pre-commit hooks
+pre-commit install
+```
+
+### Available Installation Extras
+- **postgres** / **postgres-binary** - PostgreSQL + PGVector support
+- **neo4j** - Neo4j graph database support
+- **neptune** - AWS Neptune support
+- **chromadb** - ChromaDB vector database
+- **docs** - Document processing (unstructured library)
+- **scraping** - Web scraping (Tavily, BeautifulSoup, Playwright)
+- **langchain** - LangChain integration
+- **llama-index** - LlamaIndex integration
+- **anthropic** - Anthropic Claude models
+- **gemini** - Google Gemini models
+- **ollama** - Ollama local models
+- **mistral** - Mistral AI models
+- **groq** - Groq API support
+- **llama-cpp** - Llama.cpp local inference
+- **huggingface** - HuggingFace transformers
+- **aws** - S3 storage backend
+- **redis** - Redis caching
+- **graphiti** - Graphiti-core integration
+- **baml** - BAML structured output
+- **dlt** - Data load tool (dlt) integration
+- **docling** - Docling document processing
+- **codegraph** - Code graph extraction
+- **evals** - Evaluation tools
+- **deepeval** - DeepEval testing framework
+- **posthog** - PostHog analytics
+- **monitoring** - Sentry + Langfuse observability
+- **distributed** - Modal distributed execution
+- **dev** - All development tools (pytest, mypy, ruff, etc.)
+- **debug** - Debugpy for debugging
+
+### Testing
+```bash
+# Run all tests
+pytest
+
+# Run with coverage
+pytest --cov=cognee --cov-report=html
+
+# Run specific test file
+pytest cognee/tests/test_custom_model.py
+
+# Run specific test function
+pytest cognee/tests/test_custom_model.py::test_function_name
+
+# Run async tests
+pytest -v cognee/tests/integration/
+
+# Run unit tests only
+pytest cognee/tests/unit/
+
+# Run integration tests only
+pytest cognee/tests/integration/
+```
+
+### Code Quality
+```bash
+# Run ruff linter
+ruff check .
+
+# Run ruff formatter
+ruff format .
+
+# Run both linting and formatting (pre-commit)
+pre-commit run --all-files
+
+# Type checking with mypy
+mypy cognee/
+
+# Run pylint
+pylint cognee/
+```
+
+### Running Cognee
+```bash
+# Using Python SDK
+python examples/python/simple_example.py
+
+# Using CLI
+cognee-cli add "Your text here"
+cognee-cli cognify
+cognee-cli search "Your query"
+cognee-cli delete --all
+
+# Launch full stack with UI
+cognee-cli -ui
+```
+
+## Architecture Overview
+
+### Core Workflow: add → cognify → search/memify
+
+1. **add()** - Ingest data (files, URLs, text) into datasets
+2. **cognify()** - Extract entities/relationships and build knowledge graph
+3. **search()** - Query knowledge using various retrieval strategies
+4. **memify()** - Enrich graph with additional context and rules
+
+### Key Architectural Patterns
+
+#### 1. Pipeline-Based Processing
+All data flows through task-based pipelines (`cognee/modules/pipelines/`). Tasks are composable units that can run sequentially or in parallel. Example pipeline tasks: `classify_documents`, `extract_graph_from_data`, `add_data_points`.
+
+#### 2. Interface-Based Database Adapters
+Multiple backends are supported through adapter interfaces:
+- **Graph**: Kuzu (default), Neo4j, Neptune via `GraphDBInterface`
+- **Vector**: LanceDB (default), ChromaDB, PGVector via `VectorDBInterface`
+- **Relational**: SQLite (default), PostgreSQL
+
+Key files:
+- `cognee/infrastructure/databases/graph/graph_db_interface.py`
+- `cognee/infrastructure/databases/vector/vector_db_interface.py`
+
+#### 3. Multi-Tenant Access Control
+User → Dataset → Data hierarchy with permission-based filtering. Enable with `ENABLE_BACKEND_ACCESS_CONTROL=True`. Each user+dataset combination can have isolated graph/vector databases (when using supported backends: Kuzu, LanceDB, SQLite, Postgres).
+
+### Layer Structure
+
+```
+API Layer (cognee/api/v1/)
+    ↓
+Main Functions (add, cognify, search, memify)
+    ↓
+Pipeline Orchestrator (cognee/modules/pipelines/)
+    ↓
+Task Execution Layer (cognee/tasks/)
+    ↓
+Domain Modules (graph, retrieval, ingestion, etc.)
+    ↓
+Infrastructure Adapters (LLM, databases)
+    ↓
+External Services (OpenAI, Kuzu, LanceDB, etc.)
+```
+
+### Critical Data Flow Paths
+
+#### ADD: Data Ingestion
+`add()` → `resolve_data_directories` → `ingest_data` → `save_data_item_to_storage` → Create Dataset + Data records in relational DB
+
+Key files: `cognee/api/v1/add/add.py`, `cognee/tasks/ingestion/ingest_data.py`
+
+#### COGNIFY: Knowledge Graph Construction
+`cognify()` → `classify_documents` → `extract_chunks_from_documents` → `extract_graph_from_data` (LLM extracts entities/relationships using Instructor) → `summarize_text` → `add_data_points` (store in graph + vector DBs)
+
+Key files:
+- `cognee/api/v1/cognify/cognify.py`
+- `cognee/tasks/graph/extract_graph_from_data.py`
+- `cognee/tasks/storage/add_data_points.py`
+
+#### SEARCH: Retrieval
+`search(query_text, query_type)` → route to retriever type → filter by permissions → return results
+
+Available search types (from `cognee/modules/search/types/SearchType.py`):
+- **GRAPH_COMPLETION** (default) - Graph traversal + LLM completion
+- **GRAPH_SUMMARY_COMPLETION** - Uses pre-computed summaries with graph context
+- **GRAPH_COMPLETION_COT** - Chain-of-thought reasoning over graph
+- **GRAPH_COMPLETION_CONTEXT_EXTENSION** - Extended context graph retrieval
+- **TRIPLET_COMPLETION** - Triplet-based (subject-predicate-object) search
+- **RAG_COMPLETION** - Traditional RAG with chunks
+- **CHUNKS** - Vector similarity search over chunks
+- **CHUNKS_LEXICAL** - Lexical (keyword) search over chunks
+- **SUMMARIES** - Search pre-computed document summaries
+- **CYPHER** - Direct Cypher query execution (requires `ALLOW_CYPHER_QUERY=True`)
+- **NATURAL_LANGUAGE** - Natural language to structured query
+- **TEMPORAL** - Time-aware graph search
+- **FEELING_LUCKY** - Automatic search type selection
+- **FEEDBACK** - User feedback-based refinement
+- **CODING_RULES** - Code-specific search rules
+
+Key files:
+- `cognee/api/v1/search/search.py`
+- `cognee/modules/retrieval/context_providers/TripletSearchContextProvider.py`
+- `cognee/modules/search/types/SearchType.py`
+
+### Core Data Models
+
+#### Engine Models (`cognee/infrastructure/engine/models/`)
+- **DataPoint** - Base class for all graph nodes (versioned, with metadata)
+- **Edge** - Graph relationships (source, target, relationship type)
+- **Triplet** - (Subject, Predicate, Object) representation
+
+#### Graph Models (`cognee/shared/data_models.py`)
+- **KnowledgeGraph** - Container for nodes and edges
+- **Node** - Entity (id, name, type, description)
+- **Edge** - Relationship (source_node_id, target_node_id, relationship_name)
+
+### Key Infrastructure Components
+
+#### LLM Gateway (`cognee/infrastructure/llm/LLMGateway.py`)
+Unified interface for multiple LLM providers: OpenAI, Anthropic, Gemini, Ollama, Mistral, Bedrock. Uses Instructor for structured output extraction.
+
+#### Embedding Engines
+Factory pattern for embeddings: `cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py`
+
+#### Document Loaders
+Support for PDF, DOCX, CSV, images, audio, code files in `cognee/infrastructure/files/`
+
+## Important Configuration
+
+### Environment Setup
+Copy `.env.template` to `.env` and configure:
+
+```bash
+# Minimal setup (defaults to OpenAI + local file-based databases)
+LLM_API_KEY="your_openai_api_key"
+LLM_MODEL="openai/gpt-4o-mini"  # Default model
+```
+
+**Important**: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both to avoid unexpected defaults.
+
+Default databases (no extra setup needed):
+- **Relational**: SQLite (metadata and state storage)
+- **Vector**: LanceDB (embeddings for semantic search)
+- **Graph**: Kuzu (knowledge graph and relationships)
+
+All stored in `.venv` by default. Override with `DATA_ROOT_DIRECTORY` and `SYSTEM_ROOT_DIRECTORY`.
+
+### Switching Databases
+
+#### Relational Databases
+```bash
+# PostgreSQL (requires postgres extra: pip install cognee[postgres])
+DB_PROVIDER=postgres
+DB_HOST=localhost
+DB_PORT=5432
+DB_USERNAME=cognee
+DB_PASSWORD=cognee
+DB_NAME=cognee_db
+```
+
+#### Vector Databases
+Supported: lancedb (default), pgvector, chromadb, qdrant, weaviate, milvus
+```bash
+# ChromaDB (requires chromadb extra)
+VECTOR_DB_PROVIDER=chromadb
+
+# PGVector (requires postgres extra)
+VECTOR_DB_PROVIDER=pgvector
+VECTOR_DB_URL=postgresql://cognee:cognee@localhost:5432/cognee_db
+```
+
+#### Graph Databases
+Supported: kuzu (default), neo4j, neptune, kuzu-remote
+```bash
+# Neo4j (requires neo4j extra: pip install cognee[neo4j])
+GRAPH_DATABASE_PROVIDER=neo4j
+GRAPH_DATABASE_URL=bolt://localhost:7687
+GRAPH_DATABASE_NAME=neo4j
+GRAPH_DATABASE_USERNAME=neo4j
+GRAPH_DATABASE_PASSWORD=yourpassword
+
+# Remote Kuzu
+GRAPH_DATABASE_PROVIDER=kuzu-remote
+GRAPH_DATABASE_URL=http://localhost:8000
+GRAPH_DATABASE_USERNAME=your_username
+GRAPH_DATABASE_PASSWORD=your_password
+```
+
+### LLM Provider Configuration
+
+Supported providers: OpenAI (default), Azure OpenAI, Google Gemini, Anthropic, AWS Bedrock, Ollama, LM Studio, Custom (OpenAI-compatible APIs)
+
+#### OpenAI (Recommended - Minimal Setup)
+```bash
+LLM_API_KEY="your_openai_api_key"
+LLM_MODEL="openai/gpt-4o-mini"  # or gpt-4o, gpt-4-turbo, etc.
+LLM_PROVIDER="openai"
+```
+
+#### Azure OpenAI
+```bash
+LLM_PROVIDER="azure"
+LLM_MODEL="azure/gpt-4o-mini"
+LLM_ENDPOINT="https://YOUR-RESOURCE.openai.azure.com/openai/deployments/gpt-4o-mini"
+LLM_API_KEY="your_azure_api_key"
+LLM_API_VERSION="2024-12-01-preview"
+```
+
+#### Google Gemini (requires gemini extra)
+```bash
+LLM_PROVIDER="gemini"
+LLM_MODEL="gemini/gemini-2.0-flash-exp"
+LLM_API_KEY="your_gemini_api_key"
+```
+
+#### Anthropic Claude (requires anthropic extra)
+```bash
+LLM_PROVIDER="anthropic"
+LLM_MODEL="claude-3-5-sonnet-20241022"
+LLM_API_KEY="your_anthropic_api_key"
+```
+
+#### Ollama (Local - requires ollama extra)
+```bash
+LLM_PROVIDER="ollama"
+LLM_MODEL="llama3.1:8b"
+LLM_ENDPOINT="http://localhost:11434/v1"
+LLM_API_KEY="ollama"
+EMBEDDING_PROVIDER="ollama"
+EMBEDDING_MODEL="nomic-embed-text:latest"
+EMBEDDING_ENDPOINT="http://localhost:11434/api/embed"
+HUGGINGFACE_TOKENIZER="nomic-ai/nomic-embed-text-v1.5"
+```
+
+#### Custom / OpenRouter / vLLM
+```bash
+LLM_PROVIDER="custom"
+LLM_MODEL="openrouter/google/gemini-2.0-flash-lite-preview-02-05:free"
+LLM_ENDPOINT="https://openrouter.ai/api/v1"
+LLM_API_KEY="your_api_key"
+```
+
+#### AWS Bedrock (requires aws extra)
+```bash
+LLM_PROVIDER="bedrock"
+LLM_MODEL="anthropic.claude-3-sonnet-20240229-v1:0"
+AWS_REGION="us-east-1"
+AWS_ACCESS_KEY_ID="your_access_key"
+AWS_SECRET_ACCESS_KEY="your_secret_key"
+# Optional for temporary credentials:
+# AWS_SESSION_TOKEN="your_session_token"
+```
+
+#### LLM Rate Limiting
+```bash
+LLM_RATE_LIMIT_ENABLED=true
+LLM_RATE_LIMIT_REQUESTS=60  # Requests per interval
+LLM_RATE_LIMIT_INTERVAL=60  # Interval in seconds
+```
+
+#### Instructor Mode (Structured Output)
+```bash
+# LLM_INSTRUCTOR_MODE controls how structured data is extracted
+# Each LLM has its own default (e.g., gpt-4o models use "json_schema_mode")
+# Override if needed:
+LLM_INSTRUCTOR_MODE="json_schema_mode"  # or "tool_call", "md_json", etc.
+```
+
+### Structured Output Framework
+```bash
+# Use Instructor (default, via litellm)
+STRUCTURED_OUTPUT_FRAMEWORK="instructor"
+
+# Or use BAML (requires baml extra: pip install cognee[baml])
+STRUCTURED_OUTPUT_FRAMEWORK="baml"
+BAML_LLM_PROVIDER=openai
+BAML_LLM_MODEL="gpt-4o-mini"
+BAML_LLM_API_KEY="your_api_key"
+```
+
+### Storage Backend
+```bash
+# Local filesystem (default)
+STORAGE_BACKEND="local"
+
+# S3 (requires aws extra: pip install cognee[aws])
+STORAGE_BACKEND="s3"
+STORAGE_BUCKET_NAME="your-bucket-name"
+AWS_REGION="us-east-1"
+AWS_ACCESS_KEY_ID="your_access_key"
+AWS_SECRET_ACCESS_KEY="your_secret_key"
+DATA_ROOT_DIRECTORY="s3://your-bucket/cognee/data"
+SYSTEM_ROOT_DIRECTORY="s3://your-bucket/cognee/system"
+```
+
+## Extension Points
+
+### Adding New Functionality
+
+1. **New Task Type**: Create task function in `cognee/tasks/`, return Task object, register in pipeline
+2. **New Database Backend**: Implement `GraphDBInterface` or `VectorDBInterface` in `cognee/infrastructure/databases/`
+3. **New LLM Provider**: Add configuration in LLM config (uses litellm)
+4. **New Document Processor**: Extend loaders in `cognee/modules/data/processing/`
+5. **New Search Type**: Add to `SearchType` enum and implement retriever in `cognee/modules/retrieval/`
+6. **Custom Graph Models**: Define Pydantic models extending `DataPoint` in your code
+
+### Working with Ontologies
+Cognee supports ontology-based entity extraction to ground knowledge graphs in standardized semantic frameworks (e.g., OWL ontologies).
+
+Configuration:
+```bash
+ONTOLOGY_RESOLVER=rdflib  # Default: uses rdflib and OWL files
+MATCHING_STRATEGY=fuzzy   # Default: fuzzy matching with 80% similarity
+ONTOLOGY_FILE_PATH=/path/to/your/ontology.owl  # Full path to ontology file
+```
+
+Implementation: `cognee/modules/ontology/`
+
+## Branching Strategy
+
+**IMPORTANT**: Always branch from `dev`, not `main`. The `dev` branch is the active development branch.
+
+```bash
+git checkout dev
+git pull origin dev
+git checkout -b feature/your-feature-name
+```
+
+## Code Style
+
+- Ruff for linting and formatting (configured in `pyproject.toml`)
+- Line length: 100 characters
+- Pre-commit hooks run ruff automatically
+- Type hints encouraged (mypy checks enabled)
+
+## Testing Strategy
+
+Tests are organized in `cognee/tests/`:
+- `unit/` - Unit tests for individual modules
+- `integration/` - Full pipeline integration tests
+- `cli_tests/` - CLI command tests
+- `tasks/` - Task-specific tests
+
+When adding features, add corresponding tests. Integration tests should cover the full add → cognify → search flow.
+
+## API Structure
+
+FastAPI application with versioned routes under `cognee/api/v1/`:
+- `/add` - Data ingestion
+- `/cognify` - Knowledge graph processing
+- `/search` - Query interface
+- `/memify` - Graph enrichment
+- `/datasets` - Dataset management
+- `/users` - Authentication (if `REQUIRE_AUTHENTICATION=True`)
+- `/visualize` - Graph visualization server
+
+## Python SDK Entry Points
+
+Main functions exported from `cognee/__init__.py`:
+- `add(data, dataset_name)` - Ingest data
+- `cognify(datasets)` - Build knowledge graph
+- `search(query_text, query_type)` - Query knowledge
+- `memify(extraction_tasks, enrichment_tasks)` - Enrich graph
+- `delete(data_id)` - Remove data
+- `config()` - Configuration management
+- `datasets()` - Dataset operations
+
+All functions are async - use `await` or `asyncio.run()`.
+
+## Security Considerations
+
+Several security environment variables in `.env`:
+- `ACCEPT_LOCAL_FILE_PATH` - Allow local file paths (default: True)
+- `ALLOW_HTTP_REQUESTS` - Allow HTTP requests from Cognee (default: True)
+- `ALLOW_CYPHER_QUERY` - Allow raw Cypher queries (default: True)
+- `REQUIRE_AUTHENTICATION` - Enable API authentication (default: False)
+- `ENABLE_BACKEND_ACCESS_CONTROL` - Multi-tenant isolation (default: True)
+
+For production deployments, review and tighten these settings.
+
+## Common Patterns
+
+### Creating a Custom Pipeline Task
+```python
+from cognee.modules.pipelines.tasks.Task import Task
+
+async def my_custom_task(data):
+    # Your logic here
+    processed_data = process(data)
+    return processed_data
+
+# Use in pipeline
+task = Task(my_custom_task)
+```
+
+### Accessing Databases Directly
+```python
+from cognee.infrastructure.databases.graph import get_graph_engine
+from cognee.infrastructure.databases.vector import get_vector_engine
+
+graph_engine = await get_graph_engine()
+vector_engine = await get_vector_engine()
+```
+
+### Using LLM Gateway
+```python
+from cognee.infrastructure.llm.get_llm_client import get_llm_client
+
+llm_client = get_llm_client()
+response = await llm_client.acreate_structured_output(
+    text_input="Your prompt",
+    system_prompt="System instructions",
+    response_model=YourPydanticModel
+)
+```
+
+## Key Concepts
+
+### Datasets
+Datasets are project-level containers that support organization, permissions, and isolated processing workflows. Each user can have multiple datasets with different access permissions.
+
+```python
+# Create/use a dataset
+await cognee.add(data, dataset_name="my_project")
+await cognee.cognify(datasets=["my_project"])
+```
+
+### DataPoints
+Atomic knowledge units that form the foundation of graph structures. All graph nodes extend the `DataPoint` base class with versioning and metadata support.
+
+### Permissions System
+Multi-tenant architecture with users, roles, and Access Control Lists (ACLs):
+- Read, write, delete, and share permissions per dataset
+- Enable with `ENABLE_BACKEND_ACCESS_CONTROL=True`
+- Supports isolated databases per user+dataset (Kuzu, LanceDB, SQLite, Postgres)
+
+### Graph Visualization
+Launch visualization server:
+```bash
+# Via CLI
+cognee-cli -ui  # Launches full stack with UI at http://localhost:3000
+
+# Via Python
+from cognee.api.v1.visualize import start_visualization_server
+await start_visualization_server(port=8080)
+```
+
+## Debugging & Troubleshooting
+
+### Debug Configuration
+- Set `LITELLM_LOG="DEBUG"` for verbose LLM logs (default: "ERROR")
+- Enable debug mode: `ENV="development"` or `ENV="debug"`
+- Disable telemetry: `TELEMETRY_DISABLED=1`
+- Check logs in structured format (uses structlog)
+- Use `debugpy` optional dependency for debugging: `pip install cognee[debug]`
+
+### Common Issues
+
+**Ollama + OpenAI Embeddings NoDataError**
+- Issue: Mixing Ollama with OpenAI embeddings can cause errors
+- Solution: Configure both LLM and embeddings to use the same provider, or ensure `HUGGINGFACE_TOKENIZER` is set when using Ollama
+
+**LM Studio Structured Output**
+- Issue: LM Studio requires explicit instructor mode
+- Solution: Set `LLM_INSTRUCTOR_MODE="json_schema_mode"` (or appropriate mode)
+
+**Default Provider Fallback**
+- Issue: Configuring only LLM or only embeddings defaults the other to OpenAI
+- Solution: Always configure both LLM and embedding providers, or ensure valid OpenAI API key
+
+**Permission Denied on Search**
+- Behavior: Returns empty list rather than error (prevents information leakage)
+- Solution: Check dataset permissions and user access rights
+
+**Database Connection Issues**
+- Check: Verify database URLs, credentials, and that services are running
+- Docker users: Use `DB_HOST=host.docker.internal` for local databases
+
+**Rate Limiting Errors**
+- Enable client-side rate limiting: `LLM_RATE_LIMIT_ENABLED=true`
+- Adjust limits: `LLM_RATE_LIMIT_REQUESTS` and `LLM_RATE_LIMIT_INTERVAL`
+
+## Resources
+
+- [Documentation](https://docs.cognee.ai/)
+- [Discord Community](https://discord.gg/NQPKmU5CCg)
+- [GitHub Issues](https://github.com/topoteretes/cognee/issues)
+- [Example Notebooks](examples/python/)
+- [Research Paper](https://arxiv.org/abs/2505.24478) - Optimizing knowledge graphs for LLM reasoning
--- a/cognee/api/v1/search/routers/get_search_router.py
+++ b/cognee/api/v1/search/routers/get_search_router.py
@ -8,12 +8,14 @@ from fastapi.encoders import jsonable_encoder

 from cognee.modules.search.types import SearchType, SearchResult, CombinedSearchResult
 from cognee.api.DTO import InDTO, OutDTO
-from cognee.modules.users.exceptions.exceptions import PermissionDeniedError
+from cognee.modules.users.exceptions.exceptions import PermissionDeniedError, UserNotFoundError
 from cognee.modules.users.models import User
 from cognee.modules.search.operations import get_history
 from cognee.modules.users.methods import get_authenticated_user
 from cognee.shared.utils import send_telemetry
 from cognee import __version__ as cognee_version
+from cognee.infrastructure.databases.exceptions import DatabaseNotCreatedError
+from cognee.exceptions import CogneeValidationError


 # Note: Datasets sent by name will only map to datasets owned by the request sender
@ -138,6 +140,17 @@ def get_search_router() -> APIRouter:
            )

            return jsonable_encoder(results)
+        except (DatabaseNotCreatedError, UserNotFoundError, CogneeValidationError) as e:
+            # Return a clear 422 with actionable guidance instead of leaking a stacktrace
+            status_code = getattr(e, "status_code", 422)
+            return JSONResponse(
+                status_code=status_code,
+                content={
+                    "error": "Search prerequisites not met",
+                    "detail": str(e),
+                    "hint": "Run `await cognee.add(...)` then `await cognee.cognify()` before searching.",
+                },
+            )
        except PermissionDeniedError:
            return []
        except Exception as error:
--- a/cognee/api/v1/search/search.py
+++ b/cognee/api/v1/search/search.py
@ -11,6 +11,9 @@ from cognee.modules.data.methods import get_authorized_existing_datasets
 from cognee.modules.data.exceptions import DatasetNotFoundError
 from cognee.context_global_variables import set_session_user_context_variable
 from cognee.shared.logging_utils import get_logger
+from cognee.infrastructure.databases.exceptions import DatabaseNotCreatedError
+from cognee.exceptions import CogneeValidationError
+from cognee.modules.users.exceptions.exceptions import UserNotFoundError

 logger = get_logger()

@ -176,7 +179,18 @@ async def search(
        datasets = [datasets]

    if user is None:
-        user = await get_default_user()
+        try:
+            user = await get_default_user()
+        except (DatabaseNotCreatedError, UserNotFoundError) as error:
+            # Provide a clear, actionable message instead of surfacing low-level stacktraces
+            raise CogneeValidationError(
+                message=(
+                    "Search prerequisites not met: no database/default user found. "
+                    "Initialize Cognee before searching by:\n"
+                    "• running `await cognee.add(...)` followed by `await cognee.cognify()`."
+                ),
+                name="SearchPreconditionError",
+            ) from error

    await set_session_user_context_variable(user)

--- a/cognee/shared/utils.py
+++ b/cognee/shared/utils.py
@ -8,7 +8,8 @@ import http.server
 import socketserver
 from threading import Thread
 import pathlib
-from uuid import uuid4, uuid5, NAMESPACE_OID
+from typing import Union, Any, Dict, List
+from uuid import uuid4, uuid5, NAMESPACE_OID, UUID

 from cognee.base_config import get_base_config
 from cognee.shared.logging_utils import get_logger
@ -58,7 +59,7 @@ def get_anonymous_id():
    return anonymous_id


-def _sanitize_nested_properties(obj, property_names: list[str]):
+def _sanitize_nested_properties(obj: Any, property_names: list[str]) -> Any:
    """
    Recursively replaces any property whose key matches one of `property_names`
    (e.g., ['url', 'path']) in a nested dict or list with a uuid5 hash
@ -78,7 +79,9 @@ def _sanitize_nested_properties(obj, property_names: list[str]):
        return obj


-def send_telemetry(event_name: str, user_id, additional_properties: dict = {}):
+def send_telemetry(event_name: str, user_id: Union[str, UUID], additional_properties: dict = {}):
+    if additional_properties is None:
+        additional_properties = {}
    if os.getenv("TELEMETRY_DISABLED"):
        return

@ -108,7 +111,7 @@ def send_telemetry(event_name: str, user_id, additional_properties: dict = {}):
        print(f"Error sending telemetry through proxy: {response.status_code}")


-def embed_logo(p, layout_scale, logo_alpha, position):
+def embed_logo(p: Any, layout_scale: float, logo_alpha: float, position: str):
    """
    Embed a logo into the graph visualization as a watermark.
    """
@ -138,7 +141,11 @@ def embed_logo(p, layout_scale, logo_alpha, position):


 def start_visualization_server(
-    host="0.0.0.0", port=8001, handler_class=http.server.SimpleHTTPRequestHandler
+    host: str = "0.0.0.0",
+    port: int = 8001,
+    handler_class: type[
+        http.server.SimpleHTTPRequestHandler
+    ] = http.server.SimpleHTTPRequestHandler,
 ):
    """
    Spin up a simple HTTP server in a background thread to serve files.