feat: add telemetry with PostHog and update Docker configurations (#633)

* Implement telemetry feature for anonymous usage statistics collection in Graphiti; update Dockerfile CMD format for better signal handling; adjust Neo4j URI and healthcheck in docker-compose.yml; add new dependencies in pyproject.toml and poetry.lock.

* remove duplicated properties

* Update Dockerfile CMD to use JSON array format for improved signal handling

* remove tommlib dep only in 3.11

* Delete server/graph_service/logging_config.py
This commit is contained in:
Daniel Chalef 2025-06-27 12:23:30 -07:00 committed by GitHub
parent a7ca777af5
commit cb4e187aed
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 5881 additions and 1 deletions

11
.github/secret_scanning.yml vendored Normal file
View file

@ -0,0 +1,11 @@
# Secret scanning configuration
# This file excludes specific files/directories from secret scanning alerts
paths-ignore:
# PostHog public API key for anonymous telemetry
# This is a public key intended for client-side use and safe to commit
# Key: phc_UG6EcfDbuXz92neb3rMlQFDY0csxgMqRcIPWESqnSmo
- "graphiti_core/telemetry/telemetry.py"
# Example/test directories that may contain dummy credentials
- "tests/**/fixtures/**"

133
CLAUDE.md Normal file
View file

@ -0,0 +1,133 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Graphiti is a Python framework for building temporally-aware knowledge graphs designed for AI agents. It enables real-time incremental updates to knowledge graphs without batch recomputation, making it suitable for dynamic environments.
Key features:
- Bi-temporal data model with explicit tracking of event occurrence times
- Hybrid retrieval combining semantic embeddings, keyword search (BM25), and graph traversal
- Support for custom entity definitions via Pydantic models
- Integration with Neo4j and FalkorDB as graph storage backends
## Development Commands
### Main Development Commands (run from project root)
```bash
# Install dependencies
uv sync --extra dev
# Format code (ruff import sorting + formatting)
make format
# Lint code (ruff + mypy type checking)
make lint
# Run tests
make test
# Run all checks (format, lint, test)
make check
```
### Server Development (run from server/ directory)
```bash
cd server/
# Install server dependencies
uv sync --extra dev
# Run server in development mode
uvicorn graph_service.main:app --reload
# Format, lint, test server code
make format
make lint
make test
```
### MCP Server Development (run from mcp_server/ directory)
```bash
cd mcp_server/
# Install MCP server dependencies
uv sync
# Run with Docker Compose
docker-compose up
```
## Code Architecture
### Core Library (`graphiti_core/`)
- **Main Entry Point**: `graphiti.py` - Contains the main `Graphiti` class that orchestrates all functionality
- **Graph Storage**: `driver/` - Database drivers for Neo4j and FalkorDB
- **LLM Integration**: `llm_client/` - Clients for OpenAI, Anthropic, Gemini, Groq
- **Embeddings**: `embedder/` - Embedding clients for various providers
- **Graph Elements**: `nodes.py`, `edges.py` - Core graph data structures
- **Search**: `search/` - Hybrid search implementation with configurable strategies
- **Prompts**: `prompts/` - LLM prompts for entity extraction, deduplication, summarization
- **Utilities**: `utils/` - Maintenance operations, bulk processing, datetime handling
### Server (`server/`)
- **FastAPI Service**: `graph_service/main.py` - REST API server
- **Routers**: `routers/` - API endpoints for ingestion and retrieval
- **DTOs**: `dto/` - Data transfer objects for API contracts
### MCP Server (`mcp_server/`)
- **MCP Implementation**: `graphiti_mcp_server.py` - Model Context Protocol server for AI assistants
- **Docker Support**: Containerized deployment with Neo4j
## Testing
- **Unit Tests**: `tests/` - Comprehensive test suite using pytest
- **Integration Tests**: Tests marked with `_int` suffix require database connections
- **Evaluation**: `tests/evals/` - End-to-end evaluation scripts
## Configuration
### Environment Variables
- `OPENAI_API_KEY` - Required for LLM inference and embeddings
- `USE_PARALLEL_RUNTIME` - Optional boolean for Neo4j parallel runtime (enterprise only)
- Provider-specific keys: `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `GROQ_API_KEY`, `VOYAGE_API_KEY`
### Database Setup
- **Neo4j**: Version 5.26+ required, available via Neo4j Desktop
- **FalkorDB**: Version 1.1.2+ as alternative backend
## Development Guidelines
### Code Style
- Use Ruff for formatting and linting (configured in pyproject.toml)
- Line length: 100 characters
- Quote style: single quotes
- Type checking with MyPy is enforced
### Testing Requirements
- Run tests with `make test` or `pytest`
- Integration tests require database connections
- Use `pytest-xdist` for parallel test execution
### LLM Provider Support
The codebase supports multiple LLM providers but works best with services supporting structured output (OpenAI, Gemini). Other providers may cause schema validation issues, especially with smaller models.
### MCP Server Usage Guidelines
When working with the MCP server, follow the patterns established in `mcp_server/cursor_rules.md`:
- Always search for existing knowledge before adding new information
- Use specific entity type filters (`Preference`, `Procedure`, `Requirement`)
- Store new information immediately using `add_memory`
- Follow discovered procedures and respect established preferences

View file

@ -84,4 +84,4 @@ ENV PORT=8000
EXPOSE $PORT
# Use uv run for execution
CMD ["uv", "run", "uvicorn", "graph_service.main:app", "--host", "0.0.0.0", "--port", "8000"]
CMD ["uv", "run", "uvicorn", "graph_service.main:app", "--host", "0.0.0.0", "--port", "8000"]

View file

@ -351,6 +351,87 @@ Ensure Ollama is running (`ollama serve`) and that you have pulled the models yo
- [Quick Start](https://help.getzep.com/graphiti/graphiti/quick-start)
- [Building an agent with LangChain's LangGraph and Graphiti](https://help.getzep.com/graphiti/graphiti/lang-graph-agent)
## Telemetry
Graphiti collects anonymous usage statistics to help us understand how the framework is being used and improve it for everyone. We believe transparency is important, so here's exactly what we collect and why.
### What We Collect
When you initialize a Graphiti instance, we collect:
- **Anonymous identifier**: A randomly generated UUID stored locally in `~/.cache/graphiti/telemetry_anon_id`
- **System information**: Operating system, Python version, and system architecture
- **Graphiti version**: The version you're using
- **Configuration choices**:
- LLM provider type (OpenAI, Azure, Anthropic, etc.)
- Database backend (Neo4j, FalkorDB)
- Embedder provider (OpenAI, Azure, Voyage, etc.)
### What We Don't Collect
We are committed to protecting your privacy. We **never** collect:
- Personal information or identifiers
- API keys or credentials
- Your actual data, queries, or graph content
- IP addresses or hostnames
- File paths or system-specific information
- Any content from your episodes, nodes, or edges
### Why We Collect This Data
This information helps us:
- Understand which configurations are most popular to prioritize support and testing
- Identify which LLM and database providers to focus development efforts on
- Track adoption patterns to guide our roadmap
- Ensure compatibility across different Python versions and operating systems
By sharing this anonymous information, you help us make Graphiti better for everyone in the community.
### View the Telemetry Code
The Telemetry code [may be found here](graphiti_core/telemetry/telemetry.py).
### How to Disable Telemetry
Telemetry is **opt-out** and can be disabled at any time. To disable telemetry collection:
**Option 1: Environment Variable**
```bash
export GRAPHITI_TELEMETRY_ENABLED=false
```
**Option 2: Set in your shell profile**
```bash
# For bash users (~/.bashrc or ~/.bash_profile)
echo 'export GRAPHITI_TELEMETRY_ENABLED=false' >> ~/.bashrc
# For zsh users (~/.zshrc)
echo 'export GRAPHITI_TELEMETRY_ENABLED=false' >> ~/.zshrc
```
**Option 3: Set for a specific Python session**
```python
import os
os.environ['GRAPHITI_TELEMETRY_ENABLED'] = 'false'
# Then initialize Graphiti as usual
from graphiti_core import Graphiti
graphiti = Graphiti(...)
```
Telemetry is automatically disabled during test runs (when `pytest` is detected).
### Technical Details
- Telemetry uses PostHog for anonymous analytics collection
- All telemetry operations are designed to fail silently - they will never interrupt your application or affect Graphiti functionality
- The anonymous ID is stored locally and is not tied to any personal information
## Status and Roadmap
Graphiti is under active development. We aim to maintain API stability while working on:

View file

@ -51,6 +51,7 @@ from graphiti_core.search.search_utils import (
get_mentioned_nodes,
get_relevant_edges,
)
from graphiti_core.telemetry import capture_event
from graphiti_core.utils.bulk_utils import (
RawEpisode,
add_nodes_and_edges_bulk,
@ -186,6 +187,61 @@ class Graphiti:
cross_encoder=self.cross_encoder,
)
# Capture telemetry event
self._capture_initialization_telemetry()
def _capture_initialization_telemetry(self):
"""Capture telemetry event for Graphiti initialization."""
try:
# Detect provider types from class names
llm_provider = self._get_provider_type(self.llm_client)
embedder_provider = self._get_provider_type(self.embedder)
reranker_provider = self._get_provider_type(self.cross_encoder)
database_provider = self._get_provider_type(self.driver)
properties = {
'llm_provider': llm_provider,
'embedder_provider': embedder_provider,
'reranker_provider': reranker_provider,
'database_provider': database_provider,
}
capture_event('graphiti_initialized', properties)
except Exception:
# Silently handle telemetry errors
pass
def _get_provider_type(self, client) -> str:
"""Get provider type from client class name."""
if client is None:
return 'none'
class_name = client.__class__.__name__.lower()
# LLM providers
if 'openai' in class_name:
return 'openai'
elif 'azure' in class_name:
return 'azure'
elif 'anthropic' in class_name:
return 'anthropic'
elif 'crossencoder' in class_name:
return 'crossencoder'
elif 'gemini' in class_name:
return 'gemini'
elif 'groq' in class_name:
return 'groq'
# Database providers
elif 'neo4j' in class_name:
return 'neo4j'
elif 'falkor' in class_name:
return 'falkordb'
# Embedder providers
elif 'voyage' in class_name:
return 'voyage'
else:
return 'unknown'
async def close(self):
"""
Close the connection to the Neo4j database.

View file

@ -0,0 +1,9 @@
"""
Telemetry module for Graphiti.
This module provides anonymous usage analytics to help improve Graphiti.
"""
from .telemetry import capture_event, is_telemetry_enabled
__all__ = ['capture_event', 'is_telemetry_enabled']

View file

@ -0,0 +1,117 @@
"""
Telemetry client for Graphiti.
Collects anonymous usage statistics to help improve the product.
"""
import contextlib
import os
import platform
import sys
import uuid
from pathlib import Path
from typing import Any
# PostHog configuration
# Note: This is a public API key intended for client-side use and safe to commit
# PostHog public keys are designed to be exposed in client applications
POSTHOG_API_KEY = 'phc_UG6EcfDbuXz92neb3rMlQFDY0csxgMqRcIPWESqnSmo'
POSTHOG_HOST = 'https://us.i.posthog.com'
# Environment variable to control telemetry
TELEMETRY_ENV_VAR = 'GRAPHITI_TELEMETRY_ENABLED'
# Cache directory for anonymous ID
CACHE_DIR = Path.home() / '.cache' / 'graphiti'
ANON_ID_FILE = CACHE_DIR / 'telemetry_anon_id'
def is_telemetry_enabled() -> bool:
"""Check if telemetry is enabled."""
# Disable during pytest runs
if 'pytest' in sys.modules:
return False
# Check environment variable (default: enabled)
env_value = os.environ.get(TELEMETRY_ENV_VAR, 'true').lower()
return env_value in ('true', '1', 'yes', 'on')
def get_anonymous_id() -> str:
"""Get or create anonymous user ID."""
try:
# Create cache directory if it doesn't exist
CACHE_DIR.mkdir(parents=True, exist_ok=True)
# Try to read existing ID
if ANON_ID_FILE.exists():
try:
return ANON_ID_FILE.read_text().strip()
except Exception:
pass
# Generate new ID
anon_id = str(uuid.uuid4())
# Save to file
with contextlib.suppress(Exception):
ANON_ID_FILE.write_text(anon_id)
return anon_id
except Exception:
return 'UNKNOWN'
def get_graphiti_version() -> str:
"""Get Graphiti version."""
try:
# Try to get version from package metadata
import importlib.metadata
return importlib.metadata.version('graphiti-core')
except Exception:
return 'unknown'
def initialize_posthog():
"""Initialize PostHog client."""
try:
import posthog
posthog.api_key = POSTHOG_API_KEY
posthog.host = POSTHOG_HOST
return posthog
except ImportError:
# PostHog not installed, silently disable telemetry
return None
except Exception:
# Any other error, silently disable telemetry
return None
def capture_event(event_name: str, properties: dict[str, Any] | None = None) -> None:
"""Capture a telemetry event."""
if not is_telemetry_enabled():
return
try:
posthog_client = initialize_posthog()
if posthog_client is None:
return
# Get anonymous ID
user_id = get_anonymous_id()
# Prepare event properties
event_properties = {
'$process_person_profile': False,
'graphiti_version': get_graphiti_version(),
'architecture': platform.machine(),
**(properties or {}),
}
# Capture the event
posthog_client.capture(distinct_id=user_id, event=event_name, properties=event_properties)
except Exception:
# Silently handle all telemetry errors to avoid disrupting the main application
pass

View file

@ -349,6 +349,32 @@ The Graphiti MCP Server container uses the SSE MCP transport. Claude Desktop doe
- OpenAI API key (for LLM operations and embeddings)
- MCP-compatible client
## Telemetry
The Graphiti MCP server uses the Graphiti core library, which includes anonymous telemetry collection. When you initialize the Graphiti MCP server, anonymous usage statistics are collected to help improve the framework.
### What's Collected
- Anonymous identifier and system information (OS, Python version)
- Graphiti version and configuration choices (LLM provider, database backend, embedder type)
- **No personal data, API keys, or actual graph content is ever collected**
### How to Disable
To disable telemetry in the MCP server, set the environment variable:
```bash
export GRAPHITI_TELEMETRY_ENABLED=false
```
Or add it to your `.env` file:
```
GRAPHITI_TELEMETRY_ENABLED=false
```
For complete details about what's collected and why, see the [Telemetry section in the main Graphiti README](../README.md#telemetry).
## License
This project is licensed under the same license as the parent Graphiti project.

5446
poetry.lock generated Normal file

File diff suppressed because it is too large Load diff

View file

@ -18,6 +18,7 @@ dependencies = [
"tenacity>=9.0.0",
"numpy>=1.0.0",
"python-dotenv>=1.0.1",
"posthog>=3.0.0",
]
[project.urls]