feat: add telemetry with PostHog and update Docker configurations (#633)

* Implement telemetry feature for anonymous usage statistics collection in Graphiti; update Dockerfile CMD format for better signal handling; adjust Neo4j URI and healthcheck in docker-compose.yml; add new dependencies in pyproject.toml and poetry.lock. * remove duplicated properties * Update Dockerfile CMD to use JSON array format for improved signal handling * remove tommlib dep only in 3.11 * Delete server/graph_service/logging_config.py
2025-06-27 12:23:30 -07:00 · 2025-06-27 12:23:30 -07:00 · cb4e187aed
commit cb4e187aed
parent a7ca777af5
10 changed files with 5881 additions and 1 deletions
--- a/.github/secret_scanning.yml
+++ b/.github/secret_scanning.yml
@ -0,0 +1,11 @@
+# Secret scanning configuration
+# This file excludes specific files/directories from secret scanning alerts
+
+paths-ignore:
+  # PostHog public API key for anonymous telemetry
+  # This is a public key intended for client-side use and safe to commit
+  # Key: phc_UG6EcfDbuXz92neb3rMlQFDY0csxgMqRcIPWESqnSmo
+  - "graphiti_core/telemetry/telemetry.py"
+  
+  # Example/test directories that may contain dummy credentials
+  - "tests/**/fixtures/**" 
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,133 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+Graphiti is a Python framework for building temporally-aware knowledge graphs designed for AI agents. It enables real-time incremental updates to knowledge graphs without batch recomputation, making it suitable for dynamic environments.
+
+Key features:
+
+- Bi-temporal data model with explicit tracking of event occurrence times
+- Hybrid retrieval combining semantic embeddings, keyword search (BM25), and graph traversal
+- Support for custom entity definitions via Pydantic models
+- Integration with Neo4j and FalkorDB as graph storage backends
+
+## Development Commands
+
+### Main Development Commands (run from project root)
+
+```bash
+# Install dependencies
+uv sync --extra dev
+
+# Format code (ruff import sorting + formatting)
+make format
+
+# Lint code (ruff + mypy type checking)
+make lint
+
+# Run tests
+make test
+
+# Run all checks (format, lint, test)
+make check
+```
+
+### Server Development (run from server/ directory)
+
+```bash
+cd server/
+# Install server dependencies
+uv sync --extra dev
+
+# Run server in development mode
+uvicorn graph_service.main:app --reload
+
+# Format, lint, test server code
+make format
+make lint
+make test
+```
+
+### MCP Server Development (run from mcp_server/ directory)
+
+```bash
+cd mcp_server/
+# Install MCP server dependencies
+uv sync
+
+# Run with Docker Compose
+docker-compose up
+```
+
+## Code Architecture
+
+### Core Library (`graphiti_core/`)
+
+- **Main Entry Point**: `graphiti.py` - Contains the main `Graphiti` class that orchestrates all functionality
+- **Graph Storage**: `driver/` - Database drivers for Neo4j and FalkorDB
+- **LLM Integration**: `llm_client/` - Clients for OpenAI, Anthropic, Gemini, Groq
+- **Embeddings**: `embedder/` - Embedding clients for various providers
+- **Graph Elements**: `nodes.py`, `edges.py` - Core graph data structures
+- **Search**: `search/` - Hybrid search implementation with configurable strategies
+- **Prompts**: `prompts/` - LLM prompts for entity extraction, deduplication, summarization
+- **Utilities**: `utils/` - Maintenance operations, bulk processing, datetime handling
+
+### Server (`server/`)
+
+- **FastAPI Service**: `graph_service/main.py` - REST API server
+- **Routers**: `routers/` - API endpoints for ingestion and retrieval
+- **DTOs**: `dto/` - Data transfer objects for API contracts
+
+### MCP Server (`mcp_server/`)
+
+- **MCP Implementation**: `graphiti_mcp_server.py` - Model Context Protocol server for AI assistants
+- **Docker Support**: Containerized deployment with Neo4j
+
+## Testing
+
+- **Unit Tests**: `tests/` - Comprehensive test suite using pytest
+- **Integration Tests**: Tests marked with `_int` suffix require database connections
+- **Evaluation**: `tests/evals/` - End-to-end evaluation scripts
+
+## Configuration
+
+### Environment Variables
+
+- `OPENAI_API_KEY` - Required for LLM inference and embeddings
+- `USE_PARALLEL_RUNTIME` - Optional boolean for Neo4j parallel runtime (enterprise only)
+- Provider-specific keys: `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `GROQ_API_KEY`, `VOYAGE_API_KEY`
+
+### Database Setup
+
+- **Neo4j**: Version 5.26+ required, available via Neo4j Desktop
+- **FalkorDB**: Version 1.1.2+ as alternative backend
+
+## Development Guidelines
+
+### Code Style
+
+- Use Ruff for formatting and linting (configured in pyproject.toml)
+- Line length: 100 characters
+- Quote style: single quotes
+- Type checking with MyPy is enforced
+
+### Testing Requirements
+
+- Run tests with `make test` or `pytest`
+- Integration tests require database connections
+- Use `pytest-xdist` for parallel test execution
+
+### LLM Provider Support
+
+The codebase supports multiple LLM providers but works best with services supporting structured output (OpenAI, Gemini). Other providers may cause schema validation issues, especially with smaller models.
+
+### MCP Server Usage Guidelines
+
+When working with the MCP server, follow the patterns established in `mcp_server/cursor_rules.md`:
+
+- Always search for existing knowledge before adding new information
+- Use specific entity type filters (`Preference`, `Procedure`, `Requirement`)
+- Store new information immediately using `add_memory`
+- Follow discovered procedures and respect established preferences
--- a/2
+++ b/2
@ -84,4 +84,4 @@ ENV PORT=8000
 EXPOSE $PORT

 # Use uv run for execution
-CMD ["uv", "run", "uvicorn", "graph_service.main:app", "--host", "0.0.0.0", "--port", "8000"]
+CMD ["uv", "run", "uvicorn", "graph_service.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/README.md
+++ b/README.md
@ -351,6 +351,87 @@ Ensure Ollama is running (`ollama serve`) and that you have pulled the models yo
 - [Quick Start](https://help.getzep.com/graphiti/graphiti/quick-start)
 - [Building an agent with LangChain's LangGraph and Graphiti](https://help.getzep.com/graphiti/graphiti/lang-graph-agent)

+## Telemetry
+
+Graphiti collects anonymous usage statistics to help us understand how the framework is being used and improve it for everyone. We believe transparency is important, so here's exactly what we collect and why.
+
+### What We Collect
+
+When you initialize a Graphiti instance, we collect:
+
+- **Anonymous identifier**: A randomly generated UUID stored locally in `~/.cache/graphiti/telemetry_anon_id`
+- **System information**: Operating system, Python version, and system architecture
+- **Graphiti version**: The version you're using
+- **Configuration choices**:
+  - LLM provider type (OpenAI, Azure, Anthropic, etc.)
+  - Database backend (Neo4j, FalkorDB)
+  - Embedder provider (OpenAI, Azure, Voyage, etc.)
+
+### What We Don't Collect
+
+We are committed to protecting your privacy. We **never** collect:
+
+- Personal information or identifiers
+- API keys or credentials
+- Your actual data, queries, or graph content
+- IP addresses or hostnames
+- File paths or system-specific information
+- Any content from your episodes, nodes, or edges
+
+### Why We Collect This Data
+
+This information helps us:
+
+- Understand which configurations are most popular to prioritize support and testing
+- Identify which LLM and database providers to focus development efforts on
+- Track adoption patterns to guide our roadmap
+- Ensure compatibility across different Python versions and operating systems
+
+By sharing this anonymous information, you help us make Graphiti better for everyone in the community.
+
+### View the Telemetry Code
+
+The Telemetry code [may be found here](graphiti_core/telemetry/telemetry.py).
+
+### How to Disable Telemetry
+
+Telemetry is **opt-out** and can be disabled at any time. To disable telemetry collection:
+
+**Option 1: Environment Variable**
+
+```bash
+export GRAPHITI_TELEMETRY_ENABLED=false
+```
+
+**Option 2: Set in your shell profile**
+
+```bash
+# For bash users (~/.bashrc or ~/.bash_profile)
+echo 'export GRAPHITI_TELEMETRY_ENABLED=false' >> ~/.bashrc
+
+# For zsh users (~/.zshrc)
+echo 'export GRAPHITI_TELEMETRY_ENABLED=false' >> ~/.zshrc
+```
+
+**Option 3: Set for a specific Python session**
+
+```python
+import os
+os.environ['GRAPHITI_TELEMETRY_ENABLED'] = 'false'
+
+# Then initialize Graphiti as usual
+from graphiti_core import Graphiti
+graphiti = Graphiti(...)
+```
+
+Telemetry is automatically disabled during test runs (when `pytest` is detected).
+
+### Technical Details
+
+- Telemetry uses PostHog for anonymous analytics collection
+- All telemetry operations are designed to fail silently - they will never interrupt your application or affect Graphiti functionality
+- The anonymous ID is stored locally and is not tied to any personal information
+
 ## Status and Roadmap

 Graphiti is under active development. We aim to maintain API stability while working on:
--- a/graphiti_core/graphiti.py
+++ b/graphiti_core/graphiti.py
@ -51,6 +51,7 @@ from graphiti_core.search.search_utils import (
    get_mentioned_nodes,
    get_relevant_edges,
 )
+from graphiti_core.telemetry import capture_event
 from graphiti_core.utils.bulk_utils import (
    RawEpisode,
    add_nodes_and_edges_bulk,
@ -186,6 +187,61 @@ class Graphiti:
            cross_encoder=self.cross_encoder,
        )

+        # Capture telemetry event
+        self._capture_initialization_telemetry()
+
+    def _capture_initialization_telemetry(self):
+        """Capture telemetry event for Graphiti initialization."""
+        try:
+            # Detect provider types from class names
+            llm_provider = self._get_provider_type(self.llm_client)
+            embedder_provider = self._get_provider_type(self.embedder)
+            reranker_provider = self._get_provider_type(self.cross_encoder)
+            database_provider = self._get_provider_type(self.driver)
+
+            properties = {
+                'llm_provider': llm_provider,
+                'embedder_provider': embedder_provider,
+                'reranker_provider': reranker_provider,
+                'database_provider': database_provider,
+            }
+
+            capture_event('graphiti_initialized', properties)
+        except Exception:
+            # Silently handle telemetry errors
+            pass
+
+    def _get_provider_type(self, client) -> str:
+        """Get provider type from client class name."""
+        if client is None:
+            return 'none'
+
+        class_name = client.__class__.__name__.lower()
+
+        # LLM providers
+        if 'openai' in class_name:
+            return 'openai'
+        elif 'azure' in class_name:
+            return 'azure'
+        elif 'anthropic' in class_name:
+            return 'anthropic'
+        elif 'crossencoder' in class_name:
+            return 'crossencoder'
+        elif 'gemini' in class_name:
+            return 'gemini'
+        elif 'groq' in class_name:
+            return 'groq'
+        # Database providers
+        elif 'neo4j' in class_name:
+            return 'neo4j'
+        elif 'falkor' in class_name:
+            return 'falkordb'
+        # Embedder providers
+        elif 'voyage' in class_name:
+            return 'voyage'
+        else:
+            return 'unknown'
+
    async def close(self):
        """
        Close the connection to the Neo4j database.
--- a/graphiti_core/telemetry/init.py
+++ b/graphiti_core/telemetry/init.py
@ -0,0 +1,9 @@
+"""
+Telemetry module for Graphiti.
+
+This module provides anonymous usage analytics to help improve Graphiti.
+"""
+
+from .telemetry import capture_event, is_telemetry_enabled
+
+__all__ = ['capture_event', 'is_telemetry_enabled']
--- a/graphiti_core/telemetry/telemetry.py
+++ b/graphiti_core/telemetry/telemetry.py
@ -0,0 +1,117 @@
+"""
+Telemetry client for Graphiti.
+
+Collects anonymous usage statistics to help improve the product.
+"""
+
+import contextlib
+import os
+import platform
+import sys
+import uuid
+from pathlib import Path
+from typing import Any
+
+# PostHog configuration
+# Note: This is a public API key intended for client-side use and safe to commit
+# PostHog public keys are designed to be exposed in client applications
+POSTHOG_API_KEY = 'phc_UG6EcfDbuXz92neb3rMlQFDY0csxgMqRcIPWESqnSmo'
+POSTHOG_HOST = 'https://us.i.posthog.com'
+
+# Environment variable to control telemetry
+TELEMETRY_ENV_VAR = 'GRAPHITI_TELEMETRY_ENABLED'
+
+# Cache directory for anonymous ID
+CACHE_DIR = Path.home() / '.cache' / 'graphiti'
+ANON_ID_FILE = CACHE_DIR / 'telemetry_anon_id'
+
+
+def is_telemetry_enabled() -> bool:
+    """Check if telemetry is enabled."""
+    # Disable during pytest runs
+    if 'pytest' in sys.modules:
+        return False
+
+    # Check environment variable (default: enabled)
+    env_value = os.environ.get(TELEMETRY_ENV_VAR, 'true').lower()
+    return env_value in ('true', '1', 'yes', 'on')
+
+
+def get_anonymous_id() -> str:
+    """Get or create anonymous user ID."""
+    try:
+        # Create cache directory if it doesn't exist
+        CACHE_DIR.mkdir(parents=True, exist_ok=True)
+
+        # Try to read existing ID
+        if ANON_ID_FILE.exists():
+            try:
+                return ANON_ID_FILE.read_text().strip()
+            except Exception:
+                pass
+
+        # Generate new ID
+        anon_id = str(uuid.uuid4())
+
+        # Save to file
+        with contextlib.suppress(Exception):
+            ANON_ID_FILE.write_text(anon_id)
+
+        return anon_id
+    except Exception:
+        return 'UNKNOWN'
+
+
+def get_graphiti_version() -> str:
+    """Get Graphiti version."""
+    try:
+        # Try to get version from package metadata
+        import importlib.metadata
+
+        return importlib.metadata.version('graphiti-core')
+    except Exception:
+        return 'unknown'
+
+
+def initialize_posthog():
+    """Initialize PostHog client."""
+    try:
+        import posthog
+
+        posthog.api_key = POSTHOG_API_KEY
+        posthog.host = POSTHOG_HOST
+        return posthog
+    except ImportError:
+        # PostHog not installed, silently disable telemetry
+        return None
+    except Exception:
+        # Any other error, silently disable telemetry
+        return None
+
+
+def capture_event(event_name: str, properties: dict[str, Any] | None = None) -> None:
+    """Capture a telemetry event."""
+    if not is_telemetry_enabled():
+        return
+
+    try:
+        posthog_client = initialize_posthog()
+        if posthog_client is None:
+            return
+
+        # Get anonymous ID
+        user_id = get_anonymous_id()
+
+        # Prepare event properties
+        event_properties = {
+            '$process_person_profile': False,
+            'graphiti_version': get_graphiti_version(),
+            'architecture': platform.machine(),
+            **(properties or {}),
+        }
+
+        # Capture the event
+        posthog_client.capture(distinct_id=user_id, event=event_name, properties=event_properties)
+    except Exception:
+        # Silently handle all telemetry errors to avoid disrupting the main application
+        pass
--- a/mcp_server/README.md
+++ b/mcp_server/README.md
@ -349,6 +349,32 @@ The Graphiti MCP Server container uses the SSE MCP transport. Claude Desktop doe
 - OpenAI API key (for LLM operations and embeddings)
 - MCP-compatible client

+## Telemetry
+
+The Graphiti MCP server uses the Graphiti core library, which includes anonymous telemetry collection. When you initialize the Graphiti MCP server, anonymous usage statistics are collected to help improve the framework.
+
+### What's Collected
+
+- Anonymous identifier and system information (OS, Python version)
+- Graphiti version and configuration choices (LLM provider, database backend, embedder type)
+- **No personal data, API keys, or actual graph content is ever collected**
+
+### How to Disable
+
+To disable telemetry in the MCP server, set the environment variable:
+
+```bash
+export GRAPHITI_TELEMETRY_ENABLED=false
+```
+
+Or add it to your `.env` file:
+
+```
+GRAPHITI_TELEMETRY_ENABLED=false
+```
+
+For complete details about what's collected and why, see the [Telemetry section in the main Graphiti README](../README.md#telemetry).
+
 ## License

 This project is licensed under the same license as the parent Graphiti project.
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@ -18,6 +18,7 @@ dependencies = [
    "tenacity>=9.0.0",
    "numpy>=1.0.0",
    "python-dotenv>=1.0.1",
+    "posthog>=3.0.0",
 ]

 [project.urls]