Doc: Refactor evaluation README to improve clarity and structure

2025-11-05 10:43:55 +08:00 · 2025-11-05 10:43:55 +08:00 · f490622b72
commit f490622b72
parent a73314a4ba
1 changed files with 13 additions and 21 deletions
--- a/lightrag/evaluation/README.md
+++ b/lightrag/evaluation/README.md
@ -1,12 +1,8 @@
-# 📊 LightRAG Evaluation Framework
+# 📊 RAGAS-based Evaluation Framework
 RAGAS-based offline evaluation of your LightRAG system.
 ## What is RAGAS?
-**RAGAS** (Retrieval Augmented Generation Assessment) is a framework for reference-free evaluation of RAG systems using LLMs.
+**RAGAS** (Retrieval Augmented Generation Assessment) is a framework for reference-free evaluation of RAG systems using LLMs. RAGAS uses state-of-the-art evaluation metrics:
 Instead of requiring human-annotated ground truth, RAGAS uses state-of-the-art evaluation metrics:
 ### Core Metrics
@ -18,9 +14,7 @@ Instead of requiring human-annotated ground truth, RAGAS uses state-of-the-art e
 | **Context Precision** | Is retrieved context clean without irrelevant noise? | > 0.80 |
 | **RAGAS Score** | Overall quality metric (average of above) | > 0.80 |
---
+### 📁 LightRAG Evalua'tion Framework Directory Structure
 ## 📁 Structure
 ```
 lightrag/evaluation/
@ -42,7 +36,7 @@ lightrag/evaluation/
 **Quick Test:** Index files from `sample_documents/` into LightRAG, then run the evaluator to reproduce results (~89-100% RAGAS score per question).
---
+
 ## 🚀 Quick Start
@ -55,7 +49,7 @@ pip install ragas datasets langfuse
 Or use your project dependencies (already included in pyproject.toml):
 ```bash
-pip install -e ".[offline-llm]"
+pip install -e ".[evaluation]"
 ```
 ### 2. Run Evaluation
@ -102,7 +96,7 @@ results/
 - 📋 Individual test case results
 - 📈 Performance breakdown by question
---
+
 ## 📋 Command-Line Arguments
@ -145,7 +139,7 @@ python lightrag/evaluation/eval_rag_quality.py -d /path/to/custom_dataset.json
 python lightrag/evaluation/eval_rag_quality.py --help
 ```
---
+
 ## ⚙️ Configuration
@ -214,7 +208,7 @@ EVAL_LLM_TIMEOUT=180     # 3-minute timeout per request
 | **Rate limit errors (429)** | Increase `EVAL_LLM_MAX_RETRIES` and decrease `EVAL_MAX_CONCURRENT` |
 | **Request timeouts** | Increase `EVAL_LLM_TIMEOUT` to 180 or higher |
---
+
 ## 📝 Test Dataset
@ -228,7 +222,7 @@ EVAL_LLM_TIMEOUT=180     # 3-minute timeout per request
    {
      "question": "Your question here",
      "ground_truth": "Expected answer from your data",
-      "context": "topic"
+      "project": "evaluation_project_name"
    }
  ]
 }
@ -346,11 +340,10 @@ cd /path/to/LightRAG
 python lightrag/evaluation/eval_rag_quality.py
 ```
-### "LLM API errors during evaluation"
+### "LightRAG query API errors during evaluation"
 The evaluation uses your configured LLM (OpenAI by default). Ensure:
 - API keys are set in `.env`
 - Have sufficient API quota
 - Network connection is stable
 ### Evaluation requires running LightRAG API
@ -360,15 +353,14 @@ The evaluator queries a running LightRAG API server at `http://localhost:9621`.
 2. Documents are indexed in your LightRAG instance
 3. API is accessible at the configured URL
---
+
 ## 📝 Next Steps
-1. Index sample documents into LightRAG (WebUI or API)
+1. Start LightRAG API server
-2. Start LightRAG API server
+2. Upload sample documents into LightRAG  throught  WebUI
 3. Run `python lightrag/evaluation/eval_rag_quality.py`
 4. Review results (JSON/CSV) in `results/` folder
 5. Adjust entity extraction prompts or retrieval settings based on scores
 Evaluation Result Sample: