**Critical Fix: Contexts vs Ground Truth** - RAGAS metrics now evaluate actual retrieval performance - Previously: Used ground_truth as contexts (always perfect scores) - Now: Uses retrieved documents from LightRAG API (real evaluation) **Changes to generate_rag_response (lines 100-156)**: - Remove unused 'context' parameter - Change return type: Dict[str, str] → Dict[str, Any] - Extract contexts as list of strings from references[].text - Return 'contexts' key instead of 'context' (JSON dump) - Add response.raise_for_status() for better error handling - Add httpx.HTTPStatusError exception handler **Changes to evaluate_responses (lines 180-191)**: - Line 183: Extract retrieved_contexts from rag_response - Line 190: Use [retrieved_contexts] instead of [[ground_truth]] - Now correctly evaluates: retrieval quality, not ground_truth quality **Impact on RAGAS Metrics**: - Context Precision: Now ranks actual retrieved docs by relevance - Context Recall: Compares ground_truth against actual retrieval - Faithfulness: Verifies answer based on actual retrieved contexts - Answer Relevance: Unchanged (question-answer relevance) Fixes incorrect evaluation methodology. Based on RAGAS documentation: - contexts = retrieved documents from RAG system - ground_truth = reference answer for context_recall metric References: - https://docs.ragas.io/en/stable/concepts/components/eval_dataset/ - https://docs.ragas.io/en/stable/concepts/metrics/ |
||
|---|---|---|
| .. | ||
| api | ||
| evaluation | ||
| kg | ||
| llm | ||
| tools | ||
| __init__.py | ||
| base.py | ||
| constants.py | ||
| exceptions.py | ||
| lightrag.py | ||
| namespace.py | ||
| operate.py | ||
| prompt.py | ||
| rerank.py | ||
| types.py | ||
| utils.py | ||
| utils_graph.py | ||