feat(mcp): add configurable similarity thresholds

Allows users to tune search quality for their domain via environment variables:
- GRAPHITI_MIN_SIMILARITY_SCORE: Controls semantic search filtering (default 0.6)
- GRAPHITI_RERANKER_MIN_SCORE: Controls post-RRF filtering (default 0.0)

This enables users to optimize search precision/recall for their specific use case.
For example, technical documentation benefits from higher thresholds (0.8) to filter noise.
This commit is contained in:
Brandt Weary 2025-09-19 16:20:05 -07:00
parent 3efe085a92
commit d7290df5be
2 changed files with 13 additions and 0 deletions

View file

@ -101,6 +101,8 @@ The server uses the following environment variables:
- `AZURE_OPENAI_EMBEDDING_API_VERSION`: Optional Azure OpenAI API version
- `AZURE_OPENAI_USE_MANAGED_IDENTITY`: Optional use Azure Managed Identities for authentication
- `SEMAPHORE_LIMIT`: Episode processing concurrency. See [Concurrency and LLM Provider 429 Rate Limit Errors](#concurrency-and-llm-provider-429-rate-limit-errors)
- `GRAPHITI_MIN_SIMILARITY_SCORE`: Minimum similarity score for semantic search (default: `0.6`, range: 0.0-1.0). Higher values filter more aggressively.
- `GRAPHITI_RERANKER_MIN_SCORE`: Minimum score after RRF fusion (default: `0.0`). Typically left at 0.0 as RRF is rank-based.
You can set these variables in a `.env` file in the project directory.

View file

@ -864,6 +864,17 @@ async def search_memory_nodes(
search_config = NODE_HYBRID_SEARCH_RRF.model_copy(deep=True)
search_config.limit = max_nodes
# Apply configurable similarity thresholds from environment
# sim_min_score filters during initial semantic search (recommended to tune for your domain)
sim_min_score = float(os.getenv('GRAPHITI_MIN_SIMILARITY_SCORE', '0.6'))
# reranker_min_score filters after RRF fusion (recommended to keep at 0.0 per industry consensus,
# as RRF is rank-based and doesn't require score thresholds)
reranker_min_score = float(os.getenv('GRAPHITI_RERANKER_MIN_SCORE', '0.0'))
if search_config.node_config:
search_config.node_config.sim_min_score = sim_min_score
search_config.reranker_min_score = reranker_min_score
filters = SearchFilters()
if entity != '':
filters.node_labels = [entity]