Add OpenTelemetry distributed tracing support (#982 )

* Add OpenTelemetry distributed tracing support

- Add tracer abstraction with no-op and OpenTelemetry implementations
- Instrument add_episode and add_episode_bulk with tracing spans
- Instrument LLM client with cache-aware tracing
- Add configurable span name prefix support
- Refactor add_episode methods to improve code quality
- Add OTEL_TRACING.md documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix linting errors in tracing implementation

- Remove unused episodes_by_uuid variable
- Fix tracer type annotations for context manager support
- Replace isinstance tuple with union syntax
- Use contextlib.suppress for exception handling
- Fix import ordering and use AbstractContextManager

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Address PR review feedback on tracing implementation

Critical fixes:
- Remove flawed error span creation in graphiti.py that created orphaned spans
- Restructure LLM client tracing to create span once at start, eliminating code duplication
- Initialize LLM client tracer to NoOpTracer by default to fix type checking

Enhancements:
- Add comprehensive span attributes to add_episode: reference_time, entity/edge type counts, previous episodes count, invalidated edge count, community count
- Optimize isinstance check for better performance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add prompt name tracking to OpenTelemetry tracing spans

Add prompt_name parameter to all LLM client generate_response() methods
and set it as a span attribute in the llm.generate span. This enables
better observability by identifying which prompt template was used for
each LLM call.

Changes:
- Add prompt_name parameter to LLMClient.generate_response() base method
- Add prompt_name parameter and tracing to OpenAIBaseClient,
  AnthropicClient, GeminiClient, and OpenAIGenericClient
- Update all 14 LLM call sites across maintenance operations to include
  prompt_name:
  - edge_operations.py: 4 calls
  - node_operations.py: 6 calls (note: 7 listed but only 6 unique)
  - temporal_operations.py: 2 calls
  - community_operations.py: 2 calls

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix exception handling in add_episode to record errors in OpenTelemetry span

Moved try-except block inside the OpenTelemetry span context and added
proper error recording with span.set_status() and span.record_exception().
This ensures exceptions are captured in the distributed trace, matching
the pattern used in add_episode_bulk.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>

2025-10-05 12:26:14 -07:00

4.9 KiB

Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Graphiti is a Python framework for building temporally-aware knowledge graphs designed for AI agents. It enables real-time incremental updates to knowledge graphs without batch recomputation, making it suitable for dynamic environments.

Key features:

Bi-temporal data model with explicit tracking of event occurrence times
Hybrid retrieval combining semantic embeddings, keyword search (BM25), and graph traversal
Support for custom entity definitions via Pydantic models
Integration with Neo4j and FalkorDB as graph storage backends
Optional OpenTelemetry distributed tracing support

Development Commands

Main Development Commands (run from project root)

# Install dependencies
uv sync --extra dev

# Format code (ruff import sorting + formatting)
make format

# Lint code (ruff + pyright type checking)
make lint

# Run tests
make test

# Run all checks (format, lint, test)
make check

Server Development (run from server/ directory)

cd server/
# Install server dependencies
uv sync --extra dev

# Run server in development mode
uvicorn graph_service.main:app --reload

# Format, lint, test server code
make format
make lint
make test

MCP Server Development (run from mcp_server/ directory)

cd mcp_server/
# Install MCP server dependencies
uv sync

# Run with Docker Compose
docker-compose up

Code Architecture

Core Library (`graphiti_core/`)

Main Entry Point: graphiti.py - Contains the main Graphiti class that orchestrates all functionality
Graph Storage: driver/ - Database drivers for Neo4j and FalkorDB
LLM Integration: llm_client/ - Clients for OpenAI, Anthropic, Gemini, Groq
Embeddings: embedder/ - Embedding clients for various providers
Graph Elements: nodes.py, edges.py - Core graph data structures
Search: search/ - Hybrid search implementation with configurable strategies
Prompts: prompts/ - LLM prompts for entity extraction, deduplication, summarization
Utilities: utils/ - Maintenance operations, bulk processing, datetime handling

Server (`server/`)

FastAPI Service: graph_service/main.py - REST API server
Routers: routers/ - API endpoints for ingestion and retrieval
DTOs: dto/ - Data transfer objects for API contracts

MCP Server (`mcp_server/`)

MCP Implementation: graphiti_mcp_server.py - Model Context Protocol server for AI assistants
Docker Support: Containerized deployment with Neo4j

Testing

Unit Tests: tests/ - Comprehensive test suite using pytest
Integration Tests: Tests marked with _int suffix require database connections
Evaluation: tests/evals/ - End-to-end evaluation scripts

Configuration

Environment Variables

OPENAI_API_KEY - Required for LLM inference and embeddings
USE_PARALLEL_RUNTIME - Optional boolean for Neo4j parallel runtime (enterprise only)
Provider-specific keys: ANTHROPIC_API_KEY, GOOGLE_API_KEY, GROQ_API_KEY, VOYAGE_API_KEY

Database Setup

Neo4j: Version 5.26+ required, available via Neo4j Desktop
- Database name defaults to neo4j (hardcoded in Neo4jDriver)
- Override by passing database parameter to driver constructor
FalkorDB: Version 1.1.2+ as alternative backend
- Database name defaults to default_db (hardcoded in FalkorDriver)
- Override by passing database parameter to driver constructor

Development Guidelines

Code Style

Use Ruff for formatting and linting (configured in pyproject.toml)
Line length: 100 characters
Quote style: single quotes
Type checking with Pyright is enforced
Main project uses typeCheckingMode = "basic", server uses typeCheckingMode = "standard"

Testing Requirements

Run tests with make test or pytest
Integration tests require database connections and are marked with _int suffix
Use pytest-xdist for parallel test execution
Run specific test files: pytest tests/test_specific_file.py
Run specific test methods: pytest tests/test_file.py::test_method_name
Run only integration tests: pytest tests/ -k "_int"
Run only unit tests: pytest tests/ -k "not _int"

LLM Provider Support

The codebase supports multiple LLM providers but works best with services supporting structured output (OpenAI, Gemini). Other providers may cause schema validation issues, especially with smaller models.

MCP Server Usage Guidelines

When working with the MCP server, follow the patterns established in mcp_server/cursor_rules.md:

Always search for existing knowledge before adding new information
Use specific entity type filters (Preference, Procedure, Requirement)
Store new information immediately using add_memory
Follow discovered procedures and respect established preferences

4.9 KiB Raw Permalink Blame History