cherry-pick 9c057060
This commit is contained in:
parent
89f8048df5
commit
56b8806256
3 changed files with 980 additions and 242 deletions
66
env.example
66
env.example
|
|
@ -50,6 +50,8 @@ OLLAMA_EMULATING_MODEL_TAG=latest
|
||||||
# JWT_ALGORITHM=HS256
|
# JWT_ALGORITHM=HS256
|
||||||
|
|
||||||
### API-Key to access LightRAG Server API
|
### API-Key to access LightRAG Server API
|
||||||
|
### Use this key in HTTP requests with the 'X-API-Key' header
|
||||||
|
### Example: curl -H "X-API-Key: your-secure-api-key-here" http://localhost:9621/query
|
||||||
# LIGHTRAG_API_KEY=your-secure-api-key-here
|
# LIGHTRAG_API_KEY=your-secure-api-key-here
|
||||||
# WHITELIST_PATHS=/health,/api/*
|
# WHITELIST_PATHS=/health,/api/*
|
||||||
|
|
||||||
|
|
@ -73,16 +75,6 @@ ENABLE_LLM_CACHE=true
|
||||||
# MAX_RELATION_TOKENS=8000
|
# MAX_RELATION_TOKENS=8000
|
||||||
### control the maximum tokens send to LLM (include entities, relations and chunks)
|
### control the maximum tokens send to LLM (include entities, relations and chunks)
|
||||||
# MAX_TOTAL_TOKENS=30000
|
# MAX_TOTAL_TOKENS=30000
|
||||||
### control the maximum chunk_ids stored in vector and graph db
|
|
||||||
# MAX_SOURCE_IDS_PER_ENTITY=300
|
|
||||||
# MAX_SOURCE_IDS_PER_RELATION=300
|
|
||||||
### control chunk_ids limitation method: KEEP, FIFO (KEPP: Ingore New Chunks, FIFO: New chunks replace old chunks)
|
|
||||||
# SOURCE_IDS_LIMIT_METHOD=KEEP
|
|
||||||
|
|
||||||
### maximum number of related chunks per source entity or relation
|
|
||||||
### The chunk picker uses this value to determine the total number of chunks selected from KG(knowledge graph)
|
|
||||||
### Higher values increase re-ranking time
|
|
||||||
# RELATED_CHUNK_NUMBER=5
|
|
||||||
|
|
||||||
### chunk selection strategies
|
### chunk selection strategies
|
||||||
### VECTOR: Pick KG chunks by vector similarity, delivered chunks to the LLM aligning more closely with naive retrieval
|
### VECTOR: Pick KG chunks by vector similarity, delivered chunks to the LLM aligning more closely with naive retrieval
|
||||||
|
|
@ -110,9 +102,6 @@ RERANK_BINDING=null
|
||||||
# RERANK_MODEL=rerank-v3.5
|
# RERANK_MODEL=rerank-v3.5
|
||||||
# RERANK_BINDING_HOST=https://api.cohere.com/v2/rerank
|
# RERANK_BINDING_HOST=https://api.cohere.com/v2/rerank
|
||||||
# RERANK_BINDING_API_KEY=your_rerank_api_key_here
|
# RERANK_BINDING_API_KEY=your_rerank_api_key_here
|
||||||
### Cohere rerank chunking configuration (useful for models with token limits like ColBERT)
|
|
||||||
# RERANK_ENABLE_CHUNKING=true
|
|
||||||
# RERANK_MAX_TOKENS_PER_DOC=480
|
|
||||||
|
|
||||||
### Default value for Jina AI
|
### Default value for Jina AI
|
||||||
# RERANK_MODEL=jina-reranker-v2-base-multilingual
|
# RERANK_MODEL=jina-reranker-v2-base-multilingual
|
||||||
|
|
@ -132,6 +121,9 @@ ENABLE_LLM_CACHE_FOR_EXTRACT=true
|
||||||
### Document processing output language: English, Chinese, French, German ...
|
### Document processing output language: English, Chinese, French, German ...
|
||||||
SUMMARY_LANGUAGE=English
|
SUMMARY_LANGUAGE=English
|
||||||
|
|
||||||
|
### PDF decryption password for protected PDF files
|
||||||
|
# PDF_DECRYPT_PASSWORD=your_pdf_password_here
|
||||||
|
|
||||||
### Entity types that the LLM will attempt to recognize
|
### Entity types that the LLM will attempt to recognize
|
||||||
# ENTITY_TYPES='["Person", "Creature", "Organization", "Location", "Event", "Concept", "Method", "Content", "Data", "Artifact", "NaturalObject"]'
|
# ENTITY_TYPES='["Person", "Creature", "Organization", "Location", "Event", "Concept", "Method", "Content", "Data", "Artifact", "NaturalObject"]'
|
||||||
|
|
||||||
|
|
@ -148,6 +140,22 @@ SUMMARY_LANGUAGE=English
|
||||||
### Maximum context size sent to LLM for description summary
|
### Maximum context size sent to LLM for description summary
|
||||||
# SUMMARY_CONTEXT_SIZE=12000
|
# SUMMARY_CONTEXT_SIZE=12000
|
||||||
|
|
||||||
|
### control the maximum chunk_ids stored in vector and graph db
|
||||||
|
# MAX_SOURCE_IDS_PER_ENTITY=300
|
||||||
|
# MAX_SOURCE_IDS_PER_RELATION=300
|
||||||
|
### control chunk_ids limitation method: FIFO, KEEP
|
||||||
|
### FIFO: First in first out
|
||||||
|
### KEEP: Keep oldest (less merge action and faster)
|
||||||
|
# SOURCE_IDS_LIMIT_METHOD=FIFO
|
||||||
|
|
||||||
|
# Maximum number of file paths stored in entity/relation file_path field (For displayed only, does not affect query performance)
|
||||||
|
# MAX_FILE_PATHS=100
|
||||||
|
|
||||||
|
### maximum number of related chunks per source entity or relation
|
||||||
|
### The chunk picker uses this value to determine the total number of chunks selected from KG(knowledge graph)
|
||||||
|
### Higher values increase re-ranking time
|
||||||
|
# RELATED_CHUNK_NUMBER=5
|
||||||
|
|
||||||
###############################
|
###############################
|
||||||
### Concurrency Configuration
|
### Concurrency Configuration
|
||||||
###############################
|
###############################
|
||||||
|
|
@ -386,3 +394,35 @@ MEMGRAPH_USERNAME=
|
||||||
MEMGRAPH_PASSWORD=
|
MEMGRAPH_PASSWORD=
|
||||||
MEMGRAPH_DATABASE=memgraph
|
MEMGRAPH_DATABASE=memgraph
|
||||||
# MEMGRAPH_WORKSPACE=forced_workspace_name
|
# MEMGRAPH_WORKSPACE=forced_workspace_name
|
||||||
|
|
||||||
|
############################
|
||||||
|
### Evaluation Configuration
|
||||||
|
############################
|
||||||
|
### RAGAS evaluation models (used for RAG quality assessment)
|
||||||
|
### ⚠️ IMPORTANT: Both LLM and Embedding endpoints MUST be OpenAI-compatible
|
||||||
|
### Default uses OpenAI models for evaluation
|
||||||
|
|
||||||
|
### LLM Configuration for Evaluation
|
||||||
|
# EVAL_LLM_MODEL=gpt-4o-mini
|
||||||
|
### API key for LLM evaluation (fallback to OPENAI_API_KEY if not set)
|
||||||
|
# EVAL_LLM_BINDING_API_KEY=your_api_key
|
||||||
|
### Custom OpenAI-compatible endpoint for LLM evaluation (optional)
|
||||||
|
# EVAL_LLM_BINDING_HOST=https://api.openai.com/v1
|
||||||
|
|
||||||
|
### Embedding Configuration for Evaluation
|
||||||
|
# EVAL_EMBEDDING_MODEL=text-embedding-3-large
|
||||||
|
### API key for embeddings (fallback: EVAL_LLM_BINDING_API_KEY -> OPENAI_API_KEY)
|
||||||
|
# EVAL_EMBEDDING_BINDING_API_KEY=your_embedding_api_key
|
||||||
|
### Custom OpenAI-compatible endpoint for embeddings (fallback: EVAL_LLM_BINDING_HOST)
|
||||||
|
# EVAL_EMBEDDING_BINDING_HOST=https://api.openai.com/v1
|
||||||
|
|
||||||
|
### Performance Tuning
|
||||||
|
### Number of concurrent test case evaluations
|
||||||
|
### Lower values reduce API rate limit issues but increase evaluation time
|
||||||
|
# EVAL_MAX_CONCURRENT=2
|
||||||
|
### TOP_K query parameter of LightRAG (default: 10)
|
||||||
|
### Number of entities or relations retrieved from KG
|
||||||
|
# EVAL_QUERY_TOP_K=10
|
||||||
|
### LLM request retry and timeout settings for evaluation
|
||||||
|
# EVAL_LLM_MAX_RETRIES=5
|
||||||
|
# EVAL_LLM_TIMEOUT=180
|
||||||
|
|
|
||||||
|
|
@ -1,12 +1,8 @@
|
||||||
# 📊 LightRAG Evaluation Framework
|
# 📊 RAGAS-based Evaluation Framework
|
||||||
|
|
||||||
RAGAS-based offline evaluation of your LightRAG system.
|
|
||||||
|
|
||||||
## What is RAGAS?
|
## What is RAGAS?
|
||||||
|
|
||||||
**RAGAS** (Retrieval Augmented Generation Assessment) is a framework for reference-free evaluation of RAG systems using LLMs.
|
**RAGAS** (Retrieval Augmented Generation Assessment) is a framework for reference-free evaluation of RAG systems using LLMs. RAGAS uses state-of-the-art evaluation metrics:
|
||||||
|
|
||||||
Instead of requiring human-annotated ground truth, RAGAS uses state-of-the-art evaluation metrics:
|
|
||||||
|
|
||||||
### Core Metrics
|
### Core Metrics
|
||||||
|
|
||||||
|
|
@ -18,9 +14,7 @@ Instead of requiring human-annotated ground truth, RAGAS uses state-of-the-art e
|
||||||
| **Context Precision** | Is retrieved context clean without irrelevant noise? | > 0.80 |
|
| **Context Precision** | Is retrieved context clean without irrelevant noise? | > 0.80 |
|
||||||
| **RAGAS Score** | Overall quality metric (average of above) | > 0.80 |
|
| **RAGAS Score** | Overall quality metric (average of above) | > 0.80 |
|
||||||
|
|
||||||
---
|
### 📁 LightRAG Evalua'tion Framework Directory Structure
|
||||||
|
|
||||||
## 📁 Structure
|
|
||||||
|
|
||||||
```
|
```
|
||||||
lightrag/evaluation/
|
lightrag/evaluation/
|
||||||
|
|
@ -42,7 +36,7 @@ lightrag/evaluation/
|
||||||
|
|
||||||
**Quick Test:** Index files from `sample_documents/` into LightRAG, then run the evaluator to reproduce results (~89-100% RAGAS score per question).
|
**Quick Test:** Index files from `sample_documents/` into LightRAG, then run the evaluator to reproduce results (~89-100% RAGAS score per question).
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🚀 Quick Start
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
|
@ -55,20 +49,35 @@ pip install ragas datasets langfuse
|
||||||
Or use your project dependencies (already included in pyproject.toml):
|
Or use your project dependencies (already included in pyproject.toml):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -e ".[offline-llm]"
|
pip install -e ".[evaluation]"
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Run Evaluation
|
### 2. Run Evaluation
|
||||||
|
|
||||||
|
**Basic usage (uses defaults):**
|
||||||
```bash
|
```bash
|
||||||
cd /path/to/LightRAG
|
cd /path/to/LightRAG
|
||||||
python -m lightrag.evaluation.eval_rag_quality
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
```
|
```
|
||||||
|
|
||||||
Or directly:
|
**Specify custom dataset:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python lightrag/evaluation/eval_rag_quality.py
|
python lightrag/evaluation/eval_rag_quality.py --dataset my_test.json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Specify custom RAG endpoint:**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py --ragendpoint http://my-server.com:9621
|
||||||
|
```
|
||||||
|
|
||||||
|
**Specify both (short form):**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py -d my_test.json -r http://localhost:9621
|
||||||
|
```
|
||||||
|
|
||||||
|
**Get help:**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py --help
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. View Results
|
### 3. View Results
|
||||||
|
|
@ -87,7 +96,179 @@ results/
|
||||||
- 📋 Individual test case results
|
- 📋 Individual test case results
|
||||||
- 📈 Performance breakdown by question
|
- 📈 Performance breakdown by question
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
|
## 📋 Command-Line Arguments
|
||||||
|
|
||||||
|
The evaluation script supports command-line arguments for easy configuration:
|
||||||
|
|
||||||
|
| Argument | Short | Default | Description |
|
||||||
|
|----------|-------|---------|-------------|
|
||||||
|
| `--dataset` | `-d` | `sample_dataset.json` | Path to test dataset JSON file |
|
||||||
|
| `--ragendpoint` | `-r` | `http://localhost:9621` or `$LIGHTRAG_API_URL` | LightRAG API endpoint URL |
|
||||||
|
|
||||||
|
### Usage Examples
|
||||||
|
|
||||||
|
**Use default dataset and endpoint:**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
|
||||||
|
**Custom dataset with default endpoint:**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py --dataset path/to/my_dataset.json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Default dataset with custom endpoint:**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py --ragendpoint http://my-server.com:9621
|
||||||
|
```
|
||||||
|
|
||||||
|
**Custom dataset and endpoint:**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py -d my_dataset.json -r http://localhost:9621
|
||||||
|
```
|
||||||
|
|
||||||
|
**Absolute path to dataset:**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py -d /path/to/custom_dataset.json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Show help message:**
|
||||||
|
```bash
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py --help
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## ⚙️ Configuration
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
The evaluation framework supports customization through environment variables:
|
||||||
|
|
||||||
|
**⚠️ IMPORTANT: Both LLM and Embedding endpoints MUST be OpenAI-compatible**
|
||||||
|
- The RAGAS framework requires OpenAI-compatible API interfaces
|
||||||
|
- Custom endpoints must implement the OpenAI API format (e.g., vLLM, SGLang, LocalAI)
|
||||||
|
- Non-compatible endpoints will cause evaluation failures
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| **LLM Configuration** | | |
|
||||||
|
| `EVAL_LLM_MODEL` | `gpt-4o-mini` | LLM model used for RAGAS evaluation |
|
||||||
|
| `EVAL_LLM_BINDING_API_KEY` | falls back to `OPENAI_API_KEY` | API key for LLM evaluation |
|
||||||
|
| `EVAL_LLM_BINDING_HOST` | (optional) | Custom OpenAI-compatible endpoint URL for LLM |
|
||||||
|
| **Embedding Configuration** | | |
|
||||||
|
| `EVAL_EMBEDDING_MODEL` | `text-embedding-3-large` | Embedding model for evaluation |
|
||||||
|
| `EVAL_EMBEDDING_BINDING_API_KEY` | falls back to `EVAL_LLM_BINDING_API_KEY` → `OPENAI_API_KEY` | API key for embeddings |
|
||||||
|
| `EVAL_EMBEDDING_BINDING_HOST` | falls back to `EVAL_LLM_BINDING_HOST` | Custom OpenAI-compatible endpoint URL for embeddings |
|
||||||
|
| **Performance Tuning** | | |
|
||||||
|
| `EVAL_MAX_CONCURRENT` | 2 | Number of concurrent test case evaluations (1=serial) |
|
||||||
|
| `EVAL_QUERY_TOP_K` | 10 | Number of documents to retrieve per query |
|
||||||
|
| `EVAL_LLM_MAX_RETRIES` | 5 | Maximum LLM request retries |
|
||||||
|
| `EVAL_LLM_TIMEOUT` | 180 | LLM request timeout in seconds |
|
||||||
|
|
||||||
|
### Usage Examples
|
||||||
|
|
||||||
|
**Example 1: Default Configuration (OpenAI Official API)**
|
||||||
|
```bash
|
||||||
|
export OPENAI_API_KEY=sk-xxx
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
Both LLM and embeddings use OpenAI's official API with default models.
|
||||||
|
|
||||||
|
**Example 2: Custom Models on OpenAI**
|
||||||
|
```bash
|
||||||
|
export OPENAI_API_KEY=sk-xxx
|
||||||
|
export EVAL_LLM_MODEL=gpt-4o-mini
|
||||||
|
export EVAL_EMBEDDING_MODEL=text-embedding-3-large
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example 3: Same Custom OpenAI-Compatible Endpoint for Both**
|
||||||
|
```bash
|
||||||
|
# Both LLM and embeddings use the same custom endpoint
|
||||||
|
export EVAL_LLM_BINDING_API_KEY=your-custom-key
|
||||||
|
export EVAL_LLM_BINDING_HOST=http://localhost:8000/v1
|
||||||
|
export EVAL_LLM_MODEL=qwen-plus
|
||||||
|
export EVAL_EMBEDDING_MODEL=BAAI/bge-m3
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
Embeddings automatically inherit LLM endpoint configuration.
|
||||||
|
|
||||||
|
**Example 4: Separate Endpoints (Cost Optimization)**
|
||||||
|
```bash
|
||||||
|
# Use OpenAI for LLM (high quality)
|
||||||
|
export EVAL_LLM_BINDING_API_KEY=sk-openai-key
|
||||||
|
export EVAL_LLM_MODEL=gpt-4o-mini
|
||||||
|
# No EVAL_LLM_BINDING_HOST means use OpenAI official API
|
||||||
|
|
||||||
|
# Use local vLLM for embeddings (cost-effective)
|
||||||
|
export EVAL_EMBEDDING_BINDING_API_KEY=local-key
|
||||||
|
export EVAL_EMBEDDING_BINDING_HOST=http://localhost:8001/v1
|
||||||
|
export EVAL_EMBEDDING_MODEL=BAAI/bge-m3
|
||||||
|
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
LLM uses OpenAI official API, embeddings use local custom endpoint.
|
||||||
|
|
||||||
|
**Example 5: Different Custom Endpoints for LLM and Embeddings**
|
||||||
|
```bash
|
||||||
|
# LLM on one OpenAI-compatible server
|
||||||
|
export EVAL_LLM_BINDING_API_KEY=key1
|
||||||
|
export EVAL_LLM_BINDING_HOST=http://llm-server:8000/v1
|
||||||
|
export EVAL_LLM_MODEL=custom-llm
|
||||||
|
|
||||||
|
# Embeddings on another OpenAI-compatible server
|
||||||
|
export EVAL_EMBEDDING_BINDING_API_KEY=key2
|
||||||
|
export EVAL_EMBEDDING_BINDING_HOST=http://embedding-server:8001/v1
|
||||||
|
export EVAL_EMBEDDING_MODEL=custom-embedding
|
||||||
|
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
Both use different custom OpenAI-compatible endpoints.
|
||||||
|
|
||||||
|
**Example 6: Using Environment Variables from .env File**
|
||||||
|
```bash
|
||||||
|
# Create .env file in project root
|
||||||
|
cat > .env << EOF
|
||||||
|
EVAL_LLM_BINDING_API_KEY=your-key
|
||||||
|
EVAL_LLM_BINDING_HOST=http://localhost:8000/v1
|
||||||
|
EVAL_LLM_MODEL=qwen-plus
|
||||||
|
EVAL_EMBEDDING_MODEL=BAAI/bge-m3
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Run evaluation (automatically loads .env)
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Concurrency Control & Rate Limiting
|
||||||
|
|
||||||
|
The evaluation framework includes built-in concurrency control to prevent API rate limiting issues:
|
||||||
|
|
||||||
|
**Why Concurrency Control Matters:**
|
||||||
|
- RAGAS internally makes many concurrent LLM calls for each test case
|
||||||
|
- Context Precision metric calls LLM once per retrieved document
|
||||||
|
- Without control, this can easily exceed API rate limits
|
||||||
|
|
||||||
|
**Default Configuration (Conservative):**
|
||||||
|
```bash
|
||||||
|
EVAL_MAX_CONCURRENT=2 # Serial evaluation (one test at a time)
|
||||||
|
EVAL_QUERY_TOP_K=10 # OP_K query parameter of LightRAG
|
||||||
|
EVAL_LLM_MAX_RETRIES=5 # Retry failed requests 5 times
|
||||||
|
EVAL_LLM_TIMEOUT=180 # 3-minute timeout per request
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common Issues and Solutions:**
|
||||||
|
|
||||||
|
| Issue | Solution |
|
||||||
|
|-------|----------|
|
||||||
|
| **Warning: "LM returned 1 generations instead of 3"** | Reduce `EVAL_MAX_CONCURRENT` to 1 or decrease `EVAL_QUERY_TOP_K` |
|
||||||
|
| **Context Precision returns NaN** | Lower `EVAL_QUERY_TOP_K` to reduce LLM calls per test case |
|
||||||
|
| **Rate limit errors (429)** | Increase `EVAL_LLM_MAX_RETRIES` and decrease `EVAL_MAX_CONCURRENT` |
|
||||||
|
| **Request timeouts** | Increase `EVAL_LLM_TIMEOUT` to 180 or higher |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## 📝 Test Dataset
|
## 📝 Test Dataset
|
||||||
|
|
||||||
|
|
@ -101,7 +282,7 @@ results/
|
||||||
{
|
{
|
||||||
"question": "Your question here",
|
"question": "Your question here",
|
||||||
"ground_truth": "Expected answer from your data",
|
"ground_truth": "Expected answer from your data",
|
||||||
"context": "topic"
|
"project": "evaluation_project_name"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
@ -166,6 +347,50 @@ results/
|
||||||
pip install ragas datasets
|
pip install ragas datasets
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### "Warning: LM returned 1 generations instead of requested 3" or Context Precision NaN
|
||||||
|
|
||||||
|
**Cause**: This warning indicates API rate limiting or concurrent request overload:
|
||||||
|
- RAGAS makes multiple LLM calls per test case (faithfulness, relevancy, recall, precision)
|
||||||
|
- Context Precision calls LLM once per retrieved document (with `EVAL_QUERY_TOP_K=10`, that's 10 calls)
|
||||||
|
- Concurrent evaluation multiplies these calls: `EVAL_MAX_CONCURRENT × LLM calls per test`
|
||||||
|
|
||||||
|
**Solutions** (in order of effectiveness):
|
||||||
|
|
||||||
|
1. **Serial Evaluation** (Default):
|
||||||
|
```bash
|
||||||
|
export EVAL_MAX_CONCURRENT=1
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Reduce Retrieved Documents**:
|
||||||
|
```bash
|
||||||
|
export EVAL_QUERY_TOP_K=5 # Halves Context Precision LLM calls
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Increase Retry & Timeout**:
|
||||||
|
```bash
|
||||||
|
export EVAL_LLM_MAX_RETRIES=10
|
||||||
|
export EVAL_LLM_TIMEOUT=180
|
||||||
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Use Higher Quota API** (if available):
|
||||||
|
- Upgrade to OpenAI Tier 2+ for higher RPM limits
|
||||||
|
- Use self-hosted OpenAI-compatible service with no rate limits
|
||||||
|
|
||||||
|
### "AttributeError: 'InstructorLLM' object has no attribute 'agenerate_prompt'" or NaN results
|
||||||
|
|
||||||
|
This error occurs with RAGAS 0.3.x when LLM and Embeddings are not explicitly configured. The evaluation framework now handles this automatically by:
|
||||||
|
- Using environment variables to configure evaluation models
|
||||||
|
- Creating proper LLM and Embeddings instances for RAGAS
|
||||||
|
|
||||||
|
**Solution**: Ensure you have set one of the following:
|
||||||
|
- `OPENAI_API_KEY` environment variable (default)
|
||||||
|
- `EVAL_LLM_BINDING_API_KEY` for custom API key
|
||||||
|
|
||||||
|
The framework will automatically configure the evaluation models.
|
||||||
|
|
||||||
### "No sample_dataset.json found"
|
### "No sample_dataset.json found"
|
||||||
|
|
||||||
Make sure you're running from the project root:
|
Make sure you're running from the project root:
|
||||||
|
|
@ -175,11 +400,10 @@ cd /path/to/LightRAG
|
||||||
python lightrag/evaluation/eval_rag_quality.py
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
```
|
```
|
||||||
|
|
||||||
### "LLM API errors during evaluation"
|
### "LightRAG query API errors during evaluation"
|
||||||
|
|
||||||
The evaluation uses your configured LLM (OpenAI by default). Ensure:
|
The evaluation uses your configured LLM (OpenAI by default). Ensure:
|
||||||
- API keys are set in `.env`
|
- API keys are set in `.env`
|
||||||
- Have sufficient API quota
|
|
||||||
- Network connection is stable
|
- Network connection is stable
|
||||||
|
|
||||||
### Evaluation requires running LightRAG API
|
### Evaluation requires running LightRAG API
|
||||||
|
|
@ -189,15 +413,74 @@ The evaluator queries a running LightRAG API server at `http://localhost:9621`.
|
||||||
2. Documents are indexed in your LightRAG instance
|
2. Documents are indexed in your LightRAG instance
|
||||||
3. API is accessible at the configured URL
|
3. API is accessible at the configured URL
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📝 Next Steps
|
## 📝 Next Steps
|
||||||
|
|
||||||
1. Index documents into LightRAG (WebUI or API)
|
1. Start LightRAG API server
|
||||||
2. Start LightRAG API server
|
2. Upload sample documents into LightRAG throught WebUI
|
||||||
3. Run `python lightrag/evaluation/eval_rag_quality.py`
|
3. Run `python lightrag/evaluation/eval_rag_quality.py`
|
||||||
4. Review results (JSON/CSV) in `results/` folder
|
4. Review results (JSON/CSV) in `results/` folder
|
||||||
5. Adjust entity extraction prompts or retrieval settings based on scores
|
|
||||||
|
Evaluation Result Sample:
|
||||||
|
|
||||||
|
```
|
||||||
|
INFO: ======================================================================
|
||||||
|
INFO: 🔍 RAGAS Evaluation - Using Real LightRAG API
|
||||||
|
INFO: ======================================================================
|
||||||
|
INFO: Evaluation Models:
|
||||||
|
INFO: • LLM Model: gpt-4.1
|
||||||
|
INFO: • Embedding Model: text-embedding-3-large
|
||||||
|
INFO: • Endpoint: OpenAI Official API
|
||||||
|
INFO: Concurrency & Rate Limiting:
|
||||||
|
INFO: • Query Top-K: 10 Entities/Relations
|
||||||
|
INFO: • LLM Max Retries: 5
|
||||||
|
INFO: • LLM Timeout: 180 seconds
|
||||||
|
INFO: Test Configuration:
|
||||||
|
INFO: • Total Test Cases: 6
|
||||||
|
INFO: • Test Dataset: sample_dataset.json
|
||||||
|
INFO: • LightRAG API: http://localhost:9621
|
||||||
|
INFO: • Results Directory: results
|
||||||
|
INFO: ======================================================================
|
||||||
|
INFO: 🚀 Starting RAGAS Evaluation of LightRAG System
|
||||||
|
INFO: 🔧 RAGAS Evaluation (Stage 2): 2 concurrent
|
||||||
|
INFO: ======================================================================
|
||||||
|
INFO:
|
||||||
|
INFO: ===================================================================================================================
|
||||||
|
INFO: 📊 EVALUATION RESULTS SUMMARY
|
||||||
|
INFO: ===================================================================================================================
|
||||||
|
INFO: # | Question | Faith | AnswRel | CtxRec | CtxPrec | RAGAS | Status
|
||||||
|
INFO: -------------------------------------------------------------------------------------------------------------------
|
||||||
|
INFO: 1 | How does LightRAG solve the hallucination probl... | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | ✓
|
||||||
|
INFO: 2 | What are the three main components required in ... | 0.8500 | 0.5790 | 1.0000 | 1.0000 | 0.8573 | ✓
|
||||||
|
INFO: 3 | How does LightRAG's retrieval performance compa... | 0.8056 | 1.0000 | 1.0000 | 1.0000 | 0.9514 | ✓
|
||||||
|
INFO: 4 | What vector databases does LightRAG support and... | 0.8182 | 0.9807 | 1.0000 | 1.0000 | 0.9497 | ✓
|
||||||
|
INFO: 5 | What are the four key metrics for evaluating RA... | 1.0000 | 0.7452 | 1.0000 | 1.0000 | 0.9363 | ✓
|
||||||
|
INFO: 6 | What are the core benefits of LightRAG and how ... | 0.9583 | 0.8829 | 1.0000 | 1.0000 | 0.9603 | ✓
|
||||||
|
INFO: ===================================================================================================================
|
||||||
|
INFO:
|
||||||
|
INFO: ======================================================================
|
||||||
|
INFO: 📊 EVALUATION COMPLETE
|
||||||
|
INFO: ======================================================================
|
||||||
|
INFO: Total Tests: 6
|
||||||
|
INFO: Successful: 6
|
||||||
|
INFO: Failed: 0
|
||||||
|
INFO: Success Rate: 100.00%
|
||||||
|
INFO: Elapsed Time: 161.10 seconds
|
||||||
|
INFO: Avg Time/Test: 26.85 seconds
|
||||||
|
INFO:
|
||||||
|
INFO: ======================================================================
|
||||||
|
INFO: 📈 BENCHMARK RESULTS (Average)
|
||||||
|
INFO: ======================================================================
|
||||||
|
INFO: Average Faithfulness: 0.9053
|
||||||
|
INFO: Average Answer Relevance: 0.8646
|
||||||
|
INFO: Average Context Recall: 1.0000
|
||||||
|
INFO: Average Context Precision: 1.0000
|
||||||
|
INFO: Average RAGAS Score: 0.9425
|
||||||
|
INFO: ----------------------------------------------------------------------
|
||||||
|
INFO: Min RAGAS Score: 0.8573
|
||||||
|
INFO: Max RAGAS Score: 1.0000
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
File diff suppressed because it is too large
Load diff
Loading…
Add table
Reference in a new issue