graphiti/graphiti_core/tracer.py
Daniel Chalef 6ad695186a
Add OpenTelemetry distributed tracing support (#982)
* Add OpenTelemetry distributed tracing support

- Add tracer abstraction with no-op and OpenTelemetry implementations
- Instrument add_episode and add_episode_bulk with tracing spans
- Instrument LLM client with cache-aware tracing
- Add configurable span name prefix support
- Refactor add_episode methods to improve code quality
- Add OTEL_TRACING.md documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix linting errors in tracing implementation

- Remove unused episodes_by_uuid variable
- Fix tracer type annotations for context manager support
- Replace isinstance tuple with union syntax
- Use contextlib.suppress for exception handling
- Fix import ordering and use AbstractContextManager

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Address PR review feedback on tracing implementation

Critical fixes:
- Remove flawed error span creation in graphiti.py that created orphaned spans
- Restructure LLM client tracing to create span once at start, eliminating code duplication
- Initialize LLM client tracer to NoOpTracer by default to fix type checking

Enhancements:
- Add comprehensive span attributes to add_episode: reference_time, entity/edge type counts, previous episodes count, invalidated edge count, community count
- Optimize isinstance check for better performance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add prompt name tracking to OpenTelemetry tracing spans

Add prompt_name parameter to all LLM client generate_response() methods
and set it as a span attribute in the llm.generate span. This enables
better observability by identifying which prompt template was used for
each LLM call.

Changes:
- Add prompt_name parameter to LLMClient.generate_response() base method
- Add prompt_name parameter and tracing to OpenAIBaseClient,
  AnthropicClient, GeminiClient, and OpenAIGenericClient
- Update all 14 LLM call sites across maintenance operations to include
  prompt_name:
  - edge_operations.py: 4 calls
  - node_operations.py: 6 calls (note: 7 listed but only 6 unique)
  - temporal_operations.py: 2 calls
  - community_operations.py: 2 calls

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix exception handling in add_episode to record errors in OpenTelemetry span

Moved try-except block inside the OpenTelemetry span context and added
proper error recording with span.set_status() and span.record_exception().
This ensures exceptions are captured in the distributed trace, matching
the pattern used in add_episode_bulk.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-05 12:26:14 -07:00

193 lines
6 KiB
Python

"""
Copyright 2024, Zep Software, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""
from abc import ABC, abstractmethod
from collections.abc import Generator
from contextlib import AbstractContextManager, contextmanager, suppress
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from opentelemetry.trace import Span, StatusCode
try:
from opentelemetry.trace import Span, StatusCode
OTEL_AVAILABLE = True
except ImportError:
OTEL_AVAILABLE = False
class TracerSpan(ABC):
"""Abstract base class for tracer spans."""
@abstractmethod
def add_attributes(self, attributes: dict[str, Any]) -> None:
"""Add attributes to the span."""
pass
@abstractmethod
def set_status(self, status: str, description: str | None = None) -> None:
"""Set the status of the span."""
pass
@abstractmethod
def record_exception(self, exception: Exception) -> None:
"""Record an exception in the span."""
pass
class Tracer(ABC):
"""Abstract base class for tracers."""
@abstractmethod
def start_span(self, name: str) -> AbstractContextManager[TracerSpan]:
"""Start a new span with the given name."""
pass
class NoOpSpan(TracerSpan):
"""No-op span implementation that does nothing."""
def add_attributes(self, attributes: dict[str, Any]) -> None:
pass
def set_status(self, status: str, description: str | None = None) -> None:
pass
def record_exception(self, exception: Exception) -> None:
pass
class NoOpTracer(Tracer):
"""No-op tracer implementation that does nothing."""
@contextmanager
def start_span(self, name: str) -> Generator[NoOpSpan, None, None]:
"""Return a no-op span."""
yield NoOpSpan()
class OpenTelemetrySpan(TracerSpan):
"""Wrapper for OpenTelemetry span."""
def __init__(self, span: 'Span'):
self._span = span
def add_attributes(self, attributes: dict[str, Any]) -> None:
"""Add attributes to the OpenTelemetry span."""
try:
# Filter out None values and convert all values to appropriate types
filtered_attrs = {}
for key, value in attributes.items():
if value is not None:
# Convert to string if not a primitive type
if isinstance(value, str | int | float | bool):
filtered_attrs[key] = value
else:
filtered_attrs[key] = str(value)
if filtered_attrs:
self._span.set_attributes(filtered_attrs)
except Exception:
# Silently ignore tracing errors
pass
def set_status(self, status: str, description: str | None = None) -> None:
"""Set the status of the OpenTelemetry span."""
try:
if OTEL_AVAILABLE:
if status == 'error':
self._span.set_status(StatusCode.ERROR, description)
elif status == 'ok':
self._span.set_status(StatusCode.OK, description)
except Exception:
# Silently ignore tracing errors
pass
def record_exception(self, exception: Exception) -> None:
"""Record an exception in the OpenTelemetry span."""
with suppress(Exception):
self._span.record_exception(exception)
class OpenTelemetryTracer(Tracer):
"""Wrapper for OpenTelemetry tracer with configurable span name prefix."""
def __init__(self, tracer: Any, span_prefix: str = 'graphiti'):
"""
Initialize the OpenTelemetry tracer wrapper.
Parameters
----------
tracer : opentelemetry.trace.Tracer
The OpenTelemetry tracer instance.
span_prefix : str, optional
Prefix to prepend to all span names. Defaults to 'graphiti'.
"""
if not OTEL_AVAILABLE:
raise ImportError(
'OpenTelemetry is not installed. Install it with: pip install opentelemetry-api'
)
self._tracer = tracer
self._span_prefix = span_prefix.rstrip('.')
@contextmanager
def start_span(self, name: str) -> Generator[OpenTelemetrySpan | NoOpSpan, None, None]:
"""Start a new OpenTelemetry span with the configured prefix."""
try:
full_name = f'{self._span_prefix}.{name}'
with self._tracer.start_as_current_span(full_name) as span:
yield OpenTelemetrySpan(span)
except Exception:
# If tracing fails, yield a no-op span to prevent breaking the operation
yield NoOpSpan()
def create_tracer(otel_tracer: Any | None = None, span_prefix: str = 'graphiti') -> Tracer:
"""
Create a tracer instance.
Parameters
----------
otel_tracer : opentelemetry.trace.Tracer | None, optional
An OpenTelemetry tracer instance. If None, a no-op tracer is returned.
span_prefix : str, optional
Prefix to prepend to all span names. Defaults to 'graphiti'.
Returns
-------
Tracer
A tracer instance (either OpenTelemetryTracer or NoOpTracer).
Examples
--------
Using with OpenTelemetry:
>>> from opentelemetry import trace
>>> otel_tracer = trace.get_tracer(__name__)
>>> tracer = create_tracer(otel_tracer, span_prefix='myapp.graphiti')
Using no-op tracer:
>>> tracer = create_tracer() # Returns NoOpTracer
"""
if otel_tracer is None:
return NoOpTracer()
if not OTEL_AVAILABLE:
return NoOpTracer()
return OpenTelemetryTracer(otel_tracer, span_prefix)