Doc: Refactor evaluation README to improve clarity and structure
This commit is contained in:
parent
a73314a4ba
commit
f490622b72
1 changed files with 13 additions and 21 deletions
|
|
@ -1,12 +1,8 @@
|
||||||
# 📊 LightRAG Evaluation Framework
|
# 📊 RAGAS-based Evaluation Framework
|
||||||
|
|
||||||
RAGAS-based offline evaluation of your LightRAG system.
|
|
||||||
|
|
||||||
## What is RAGAS?
|
## What is RAGAS?
|
||||||
|
|
||||||
**RAGAS** (Retrieval Augmented Generation Assessment) is a framework for reference-free evaluation of RAG systems using LLMs.
|
**RAGAS** (Retrieval Augmented Generation Assessment) is a framework for reference-free evaluation of RAG systems using LLMs. RAGAS uses state-of-the-art evaluation metrics:
|
||||||
|
|
||||||
Instead of requiring human-annotated ground truth, RAGAS uses state-of-the-art evaluation metrics:
|
|
||||||
|
|
||||||
### Core Metrics
|
### Core Metrics
|
||||||
|
|
||||||
|
|
@ -18,9 +14,7 @@ Instead of requiring human-annotated ground truth, RAGAS uses state-of-the-art e
|
||||||
| **Context Precision** | Is retrieved context clean without irrelevant noise? | > 0.80 |
|
| **Context Precision** | Is retrieved context clean without irrelevant noise? | > 0.80 |
|
||||||
| **RAGAS Score** | Overall quality metric (average of above) | > 0.80 |
|
| **RAGAS Score** | Overall quality metric (average of above) | > 0.80 |
|
||||||
|
|
||||||
---
|
### 📁 LightRAG Evalua'tion Framework Directory Structure
|
||||||
|
|
||||||
## 📁 Structure
|
|
||||||
|
|
||||||
```
|
```
|
||||||
lightrag/evaluation/
|
lightrag/evaluation/
|
||||||
|
|
@ -42,7 +36,7 @@ lightrag/evaluation/
|
||||||
|
|
||||||
**Quick Test:** Index files from `sample_documents/` into LightRAG, then run the evaluator to reproduce results (~89-100% RAGAS score per question).
|
**Quick Test:** Index files from `sample_documents/` into LightRAG, then run the evaluator to reproduce results (~89-100% RAGAS score per question).
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🚀 Quick Start
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
|
@ -55,7 +49,7 @@ pip install ragas datasets langfuse
|
||||||
Or use your project dependencies (already included in pyproject.toml):
|
Or use your project dependencies (already included in pyproject.toml):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -e ".[offline-llm]"
|
pip install -e ".[evaluation]"
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Run Evaluation
|
### 2. Run Evaluation
|
||||||
|
|
@ -102,7 +96,7 @@ results/
|
||||||
- 📋 Individual test case results
|
- 📋 Individual test case results
|
||||||
- 📈 Performance breakdown by question
|
- 📈 Performance breakdown by question
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📋 Command-Line Arguments
|
## 📋 Command-Line Arguments
|
||||||
|
|
||||||
|
|
@ -145,7 +139,7 @@ python lightrag/evaluation/eval_rag_quality.py -d /path/to/custom_dataset.json
|
||||||
python lightrag/evaluation/eval_rag_quality.py --help
|
python lightrag/evaluation/eval_rag_quality.py --help
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## ⚙️ Configuration
|
## ⚙️ Configuration
|
||||||
|
|
||||||
|
|
@ -214,7 +208,7 @@ EVAL_LLM_TIMEOUT=180 # 3-minute timeout per request
|
||||||
| **Rate limit errors (429)** | Increase `EVAL_LLM_MAX_RETRIES` and decrease `EVAL_MAX_CONCURRENT` |
|
| **Rate limit errors (429)** | Increase `EVAL_LLM_MAX_RETRIES` and decrease `EVAL_MAX_CONCURRENT` |
|
||||||
| **Request timeouts** | Increase `EVAL_LLM_TIMEOUT` to 180 or higher |
|
| **Request timeouts** | Increase `EVAL_LLM_TIMEOUT` to 180 or higher |
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📝 Test Dataset
|
## 📝 Test Dataset
|
||||||
|
|
||||||
|
|
@ -228,7 +222,7 @@ EVAL_LLM_TIMEOUT=180 # 3-minute timeout per request
|
||||||
{
|
{
|
||||||
"question": "Your question here",
|
"question": "Your question here",
|
||||||
"ground_truth": "Expected answer from your data",
|
"ground_truth": "Expected answer from your data",
|
||||||
"context": "topic"
|
"project": "evaluation_project_name"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
@ -346,11 +340,10 @@ cd /path/to/LightRAG
|
||||||
python lightrag/evaluation/eval_rag_quality.py
|
python lightrag/evaluation/eval_rag_quality.py
|
||||||
```
|
```
|
||||||
|
|
||||||
### "LLM API errors during evaluation"
|
### "LightRAG query API errors during evaluation"
|
||||||
|
|
||||||
The evaluation uses your configured LLM (OpenAI by default). Ensure:
|
The evaluation uses your configured LLM (OpenAI by default). Ensure:
|
||||||
- API keys are set in `.env`
|
- API keys are set in `.env`
|
||||||
- Have sufficient API quota
|
|
||||||
- Network connection is stable
|
- Network connection is stable
|
||||||
|
|
||||||
### Evaluation requires running LightRAG API
|
### Evaluation requires running LightRAG API
|
||||||
|
|
@ -360,15 +353,14 @@ The evaluator queries a running LightRAG API server at `http://localhost:9621`.
|
||||||
2. Documents are indexed in your LightRAG instance
|
2. Documents are indexed in your LightRAG instance
|
||||||
3. API is accessible at the configured URL
|
3. API is accessible at the configured URL
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📝 Next Steps
|
## 📝 Next Steps
|
||||||
|
|
||||||
1. Index sample documents into LightRAG (WebUI or API)
|
1. Start LightRAG API server
|
||||||
2. Start LightRAG API server
|
2. Upload sample documents into LightRAG throught WebUI
|
||||||
3. Run `python lightrag/evaluation/eval_rag_quality.py`
|
3. Run `python lightrag/evaluation/eval_rag_quality.py`
|
||||||
4. Review results (JSON/CSV) in `results/` folder
|
4. Review results (JSON/CSV) in `results/` folder
|
||||||
5. Adjust entity extraction prompts or retrieval settings based on scores
|
|
||||||
|
|
||||||
Evaluation Result Sample:
|
Evaluation Result Sample:
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue