Add separate endpoint configuration for LLM and embeddings in evaluation

- Split LLM and embedding API configs
- Add fallback chain for API keys
- Update docs with usage examples
This commit is contained in:
yangdx 2025-11-05 18:54:38 +08:00
parent 994a82dc7f
commit 9c05706062
3 changed files with 132 additions and 26 deletions

View file

@ -399,14 +399,24 @@ MEMGRAPH_DATABASE=memgraph
### Evaluation Configuration ### Evaluation Configuration
############################ ############################
### RAGAS evaluation models (used for RAG quality assessment) ### RAGAS evaluation models (used for RAG quality assessment)
### ⚠️ IMPORTANT: Both LLM and Embedding endpoints MUST be OpenAI-compatible
### Default uses OpenAI models for evaluation ### Default uses OpenAI models for evaluation
### LLM Configuration for Evaluation
# EVAL_LLM_MODEL=gpt-4o-mini # EVAL_LLM_MODEL=gpt-4o-mini
# EVAL_EMBEDDING_MODEL=text-embedding-3-large ### API key for LLM evaluation (fallback to OPENAI_API_KEY if not set)
### API key for evaluation (fallback to OPENAI_API_KEY if not set)
# EVAL_LLM_BINDING_API_KEY=your_api_key # EVAL_LLM_BINDING_API_KEY=your_api_key
### Custom endpoint for evaluation models (optional, for OpenAI-compatible services) ### Custom OpenAI-compatible endpoint for LLM evaluation (optional)
# EVAL_LLM_BINDING_HOST=https://api.openai.com/v1 # EVAL_LLM_BINDING_HOST=https://api.openai.com/v1
### Embedding Configuration for Evaluation
# EVAL_EMBEDDING_MODEL=text-embedding-3-large
### API key for embeddings (fallback: EVAL_LLM_BINDING_API_KEY -> OPENAI_API_KEY)
# EVAL_EMBEDDING_BINDING_API_KEY=your_embedding_api_key
### Custom OpenAI-compatible endpoint for embeddings (fallback: EVAL_LLM_BINDING_HOST)
# EVAL_EMBEDDING_BINDING_HOST=https://api.openai.com/v1
### Performance Tuning
### Number of concurrent test case evaluations ### Number of concurrent test case evaluations
### Lower values reduce API rate limit issues but increase evaluation time ### Lower values reduce API rate limit issues but increase evaluation time
# EVAL_MAX_CONCURRENT=2 # EVAL_MAX_CONCURRENT=2
@ -415,4 +425,4 @@ MEMGRAPH_DATABASE=memgraph
# EVAL_QUERY_TOP_K=10 # EVAL_QUERY_TOP_K=10
### LLM request retry and timeout settings for evaluation ### LLM request retry and timeout settings for evaluation
# EVAL_LLM_MAX_RETRIES=5 # EVAL_LLM_MAX_RETRIES=5
# EVAL_LLM_TIMEOUT=120 # EVAL_LLM_TIMEOUT=180

View file

@ -147,12 +147,22 @@ python lightrag/evaluation/eval_rag_quality.py --help
The evaluation framework supports customization through environment variables: The evaluation framework supports customization through environment variables:
**⚠️ IMPORTANT: Both LLM and Embedding endpoints MUST be OpenAI-compatible**
- The RAGAS framework requires OpenAI-compatible API interfaces
- Custom endpoints must implement the OpenAI API format (e.g., vLLM, SGLang, LocalAI)
- Non-compatible endpoints will cause evaluation failures
| Variable | Default | Description | | Variable | Default | Description |
|----------|---------|-------------| |----------|---------|-------------|
| **LLM Configuration** | | |
| `EVAL_LLM_MODEL` | `gpt-4o-mini` | LLM model used for RAGAS evaluation | | `EVAL_LLM_MODEL` | `gpt-4o-mini` | LLM model used for RAGAS evaluation |
| `EVAL_LLM_BINDING_API_KEY` | falls back to `OPENAI_API_KEY` | API key for LLM evaluation |
| `EVAL_LLM_BINDING_HOST` | (optional) | Custom OpenAI-compatible endpoint URL for LLM |
| **Embedding Configuration** | | |
| `EVAL_EMBEDDING_MODEL` | `text-embedding-3-large` | Embedding model for evaluation | | `EVAL_EMBEDDING_MODEL` | `text-embedding-3-large` | Embedding model for evaluation |
| `EVAL_LLM_BINDING_API_KEY` | falls back to `OPENAI_API_KEY` | API key for evaluation models | | `EVAL_EMBEDDING_BINDING_API_KEY` | falls back to `EVAL_LLM_BINDING_API_KEY``OPENAI_API_KEY` | API key for embeddings |
| `EVAL_LLM_BINDING_HOST` | (optional) | Custom endpoint URL for OpenAI-compatible services | | `EVAL_EMBEDDING_BINDING_HOST` | falls back to `EVAL_LLM_BINDING_HOST` | Custom OpenAI-compatible endpoint URL for embeddings |
| **Performance Tuning** | | |
| `EVAL_MAX_CONCURRENT` | 2 | Number of concurrent test case evaluations (1=serial) | | `EVAL_MAX_CONCURRENT` | 2 | Number of concurrent test case evaluations (1=serial) |
| `EVAL_QUERY_TOP_K` | 10 | Number of documents to retrieve per query | | `EVAL_QUERY_TOP_K` | 10 | Number of documents to retrieve per query |
| `EVAL_LLM_MAX_RETRIES` | 5 | Maximum LLM request retries | | `EVAL_LLM_MAX_RETRIES` | 5 | Maximum LLM request retries |
@ -160,13 +170,14 @@ The evaluation framework supports customization through environment variables:
### Usage Examples ### Usage Examples
**Default Configuration (OpenAI):** **Example 1: Default Configuration (OpenAI Official API)**
```bash ```bash
export OPENAI_API_KEY=sk-xxx export OPENAI_API_KEY=sk-xxx
python lightrag/evaluation/eval_rag_quality.py python lightrag/evaluation/eval_rag_quality.py
``` ```
Both LLM and embeddings use OpenAI's official API with default models.
**Custom Model:** **Example 2: Custom Models on OpenAI**
```bash ```bash
export OPENAI_API_KEY=sk-xxx export OPENAI_API_KEY=sk-xxx
export EVAL_LLM_MODEL=gpt-4o-mini export EVAL_LLM_MODEL=gpt-4o-mini
@ -174,11 +185,60 @@ export EVAL_EMBEDDING_MODEL=text-embedding-3-large
python lightrag/evaluation/eval_rag_quality.py python lightrag/evaluation/eval_rag_quality.py
``` ```
**OpenAI-Compatible Endpoint:** **Example 3: Same Custom OpenAI-Compatible Endpoint for Both**
```bash ```bash
# Both LLM and embeddings use the same custom endpoint
export EVAL_LLM_BINDING_API_KEY=your-custom-key export EVAL_LLM_BINDING_API_KEY=your-custom-key
export EVAL_LLM_BINDING_HOST=https://api.openai.com/v1 export EVAL_LLM_BINDING_HOST=http://localhost:8000/v1
export EVAL_LLM_MODEL=qwen-plus export EVAL_LLM_MODEL=qwen-plus
export EVAL_EMBEDDING_MODEL=BAAI/bge-m3
python lightrag/evaluation/eval_rag_quality.py
```
Embeddings automatically inherit LLM endpoint configuration.
**Example 4: Separate Endpoints (Cost Optimization)**
```bash
# Use OpenAI for LLM (high quality)
export EVAL_LLM_BINDING_API_KEY=sk-openai-key
export EVAL_LLM_MODEL=gpt-4o-mini
# No EVAL_LLM_BINDING_HOST means use OpenAI official API
# Use local vLLM for embeddings (cost-effective)
export EVAL_EMBEDDING_BINDING_API_KEY=local-key
export EVAL_EMBEDDING_BINDING_HOST=http://localhost:8001/v1
export EVAL_EMBEDDING_MODEL=BAAI/bge-m3
python lightrag/evaluation/eval_rag_quality.py
```
LLM uses OpenAI official API, embeddings use local custom endpoint.
**Example 5: Different Custom Endpoints for LLM and Embeddings**
```bash
# LLM on one OpenAI-compatible server
export EVAL_LLM_BINDING_API_KEY=key1
export EVAL_LLM_BINDING_HOST=http://llm-server:8000/v1
export EVAL_LLM_MODEL=custom-llm
# Embeddings on another OpenAI-compatible server
export EVAL_EMBEDDING_BINDING_API_KEY=key2
export EVAL_EMBEDDING_BINDING_HOST=http://embedding-server:8001/v1
export EVAL_EMBEDDING_MODEL=custom-embedding
python lightrag/evaluation/eval_rag_quality.py
```
Both use different custom OpenAI-compatible endpoints.
**Example 6: Using Environment Variables from .env File**
```bash
# Create .env file in project root
cat > .env << EOF
EVAL_LLM_BINDING_API_KEY=your-key
EVAL_LLM_BINDING_HOST=http://localhost:8000/v1
EVAL_LLM_MODEL=qwen-plus
EVAL_EMBEDDING_MODEL=BAAI/bge-m3
EOF
# Run evaluation (automatically loads .env)
python lightrag/evaluation/eval_rag_quality.py python lightrag/evaluation/eval_rag_quality.py
``` ```

View file

@ -127,8 +127,10 @@ class RAGEvaluator:
Environment Variables: Environment Variables:
EVAL_LLM_MODEL: LLM model for evaluation (default: gpt-4o-mini) EVAL_LLM_MODEL: LLM model for evaluation (default: gpt-4o-mini)
EVAL_EMBEDDING_MODEL: Embedding model for evaluation (default: text-embedding-3-small) EVAL_EMBEDDING_MODEL: Embedding model for evaluation (default: text-embedding-3-small)
EVAL_LLM_BINDING_API_KEY: API key for evaluation models (fallback to OPENAI_API_KEY) EVAL_LLM_BINDING_API_KEY: API key for LLM (fallback to OPENAI_API_KEY)
EVAL_LLM_BINDING_HOST: Custom endpoint URL for evaluation models (optional) EVAL_LLM_BINDING_HOST: Custom endpoint URL for LLM (optional)
EVAL_EMBEDDING_BINDING_API_KEY: API key for embeddings (fallback: EVAL_LLM_BINDING_API_KEY -> OPENAI_API_KEY)
EVAL_EMBEDDING_BINDING_HOST: Custom endpoint URL for embeddings (fallback: EVAL_LLM_BINDING_HOST)
Raises: Raises:
ImportError: If ragas or datasets packages are not installed ImportError: If ragas or datasets packages are not installed
@ -141,11 +143,11 @@ class RAGEvaluator:
"Install with: pip install ragas datasets" "Install with: pip install ragas datasets"
) )
# Configure evaluation models (for RAGAS scoring) # Configure evaluation LLM (for RAGAS scoring)
eval_api_key = os.getenv("EVAL_LLM_BINDING_API_KEY") or os.getenv( eval_llm_api_key = os.getenv("EVAL_LLM_BINDING_API_KEY") or os.getenv(
"OPENAI_API_KEY" "OPENAI_API_KEY"
) )
if not eval_api_key: if not eval_llm_api_key:
raise EnvironmentError( raise EnvironmentError(
"EVAL_LLM_BINDING_API_KEY or OPENAI_API_KEY is required for evaluation. " "EVAL_LLM_BINDING_API_KEY or OPENAI_API_KEY is required for evaluation. "
"Set EVAL_LLM_BINDING_API_KEY to use a custom API key, " "Set EVAL_LLM_BINDING_API_KEY to use a custom API key, "
@ -153,23 +155,40 @@ class RAGEvaluator:
) )
eval_model = os.getenv("EVAL_LLM_MODEL", "gpt-4o-mini") eval_model = os.getenv("EVAL_LLM_MODEL", "gpt-4o-mini")
eval_llm_base_url = os.getenv("EVAL_LLM_BINDING_HOST")
# Configure evaluation embeddings (for RAGAS scoring)
# Fallback chain: EVAL_EMBEDDING_BINDING_API_KEY -> EVAL_LLM_BINDING_API_KEY -> OPENAI_API_KEY
eval_embedding_api_key = (
os.getenv("EVAL_EMBEDDING_BINDING_API_KEY")
or os.getenv("EVAL_LLM_BINDING_API_KEY")
or os.getenv("OPENAI_API_KEY")
)
eval_embedding_model = os.getenv( eval_embedding_model = os.getenv(
"EVAL_EMBEDDING_MODEL", "text-embedding-3-large" "EVAL_EMBEDDING_MODEL", "text-embedding-3-large"
) )
eval_base_url = os.getenv("EVAL_LLM_BINDING_HOST") # Fallback chain: EVAL_EMBEDDING_BINDING_HOST -> EVAL_LLM_BINDING_HOST -> None
eval_embedding_base_url = os.getenv("EVAL_EMBEDDING_BINDING_HOST") or os.getenv(
"EVAL_LLM_BINDING_HOST"
)
# Create LLM and Embeddings instances for RAGAS # Create LLM and Embeddings instances for RAGAS
llm_kwargs = { llm_kwargs = {
"model": eval_model, "model": eval_model,
"api_key": eval_api_key, "api_key": eval_llm_api_key,
"max_retries": int(os.getenv("EVAL_LLM_MAX_RETRIES", "5")), "max_retries": int(os.getenv("EVAL_LLM_MAX_RETRIES", "5")),
"request_timeout": int(os.getenv("EVAL_LLM_TIMEOUT", "180")), "request_timeout": int(os.getenv("EVAL_LLM_TIMEOUT", "180")),
} }
embedding_kwargs = {"model": eval_embedding_model, "api_key": eval_api_key} embedding_kwargs = {
"model": eval_embedding_model,
"api_key": eval_embedding_api_key,
}
if eval_base_url: if eval_llm_base_url:
llm_kwargs["base_url"] = eval_base_url llm_kwargs["base_url"] = eval_llm_base_url
embedding_kwargs["base_url"] = eval_base_url
if eval_embedding_base_url:
embedding_kwargs["base_url"] = eval_embedding_base_url
# Create base LangChain LLM # Create base LangChain LLM
base_llm = ChatOpenAI(**llm_kwargs) base_llm = ChatOpenAI(**llm_kwargs)
@ -209,7 +228,8 @@ class RAGEvaluator:
# Store configuration values for display # Store configuration values for display
self.eval_model = eval_model self.eval_model = eval_model
self.eval_embedding_model = eval_embedding_model self.eval_embedding_model = eval_embedding_model
self.eval_base_url = eval_base_url self.eval_llm_base_url = eval_llm_base_url
self.eval_embedding_base_url = eval_embedding_base_url
self.eval_max_retries = llm_kwargs["max_retries"] self.eval_max_retries = llm_kwargs["max_retries"]
self.eval_timeout = llm_kwargs["request_timeout"] self.eval_timeout = llm_kwargs["request_timeout"]
@ -221,13 +241,29 @@ class RAGEvaluator:
logger.info("Evaluation Models:") logger.info("Evaluation Models:")
logger.info(" • LLM Model: %s", self.eval_model) logger.info(" • LLM Model: %s", self.eval_model)
logger.info(" • Embedding Model: %s", self.eval_embedding_model) logger.info(" • Embedding Model: %s", self.eval_embedding_model)
if self.eval_base_url:
logger.info(" • Custom Endpoint: %s", self.eval_base_url) # Display LLM endpoint
if self.eval_llm_base_url:
logger.info(" • LLM Endpoint: %s", self.eval_llm_base_url)
logger.info( logger.info(
" • Bypass N-Parameter: Enabled (use LangchainLLMWrapperfor compatibility)" " • Bypass N-Parameter: Enabled (use LangchainLLMWrapper for compatibility)"
) )
else: else:
logger.info(" • Endpoint: OpenAI Official API") logger.info(" • LLM Endpoint: OpenAI Official API")
# Display Embedding endpoint (only if different from LLM)
if self.eval_embedding_base_url:
if self.eval_embedding_base_url != self.eval_llm_base_url:
logger.info(
" • Embedding Endpoint: %s", self.eval_embedding_base_url
)
# If same as LLM endpoint, no need to display separately
elif not self.eval_llm_base_url:
# Both using OpenAI - already displayed above
pass
else:
# LLM uses custom endpoint, but embeddings use OpenAI
logger.info(" • Embedding Endpoint: OpenAI Official API")
logger.info("Concurrency & Rate Limiting:") logger.info("Concurrency & Rate Limiting:")
query_top_k = int(os.getenv("EVAL_QUERY_TOP_K", "10")) query_top_k = int(os.getenv("EVAL_QUERY_TOP_K", "10"))