LightRAG

Author	SHA1	Message	Date
anouarbm	026bca00d9	fix: Use actual retrieved contexts for RAGAS evaluation Critical Fix: Contexts vs Ground Truth - RAGAS metrics now evaluate actual retrieval performance - Previously: Used ground_truth as contexts (always perfect scores) - Now: Uses retrieved documents from LightRAG API (real evaluation) Changes to generate_rag_response (lines 100-156): - Remove unused 'context' parameter - Change return type: Dict[str, str] → Dict[str, Any] - Extract contexts as list of strings from references[].text - Return 'contexts' key instead of 'context' (JSON dump) - Add response.raise_for_status() for better error handling - Add httpx.HTTPStatusError exception handler Changes to evaluate_responses (lines 180-191): - Line 183: Extract retrieved_contexts from rag_response - Line 190: Use [retrieved_contexts] instead of [[ground_truth]] - Now correctly evaluates: retrieval quality, not ground_truth quality Impact on RAGAS Metrics: - Context Precision: Now ranks actual retrieved docs by relevance - Context Recall: Compares ground_truth against actual retrieval - Faithfulness: Verifies answer based on actual retrieved contexts - Answer Relevance: Unchanged (question-answer relevance) Fixes incorrect evaluation methodology. Based on RAGAS documentation: - contexts = retrieved documents from RAG system - ground_truth = reference answer for context_recall metric References: - https://docs.ragas.io/en/stable/concepts/components/eval_dataset/ - https://docs.ragas.io/en/stable/concepts/metrics/	2025-11-02 16:16:00 +01:00
anouarbm	b12b693a81	fixed ruff format of csv path	2025-11-02 11:46:22 +01:00
anouarbm	5cdb4b0ef2	fix: Apply ruff formatting and rename test_dataset to sample_dataset Lint Fixes (ruff): - Sort imports alphabetically (I001) - Add blank line after import traceback (E302) - Add trailing comma to dict literals (COM812) - Reformat writer.writerow for readability (E501) Rename test_dataset.json → sample_dataset.json: - Avoids .gitignore pattern conflict (test_* is ignored) - More descriptive name - it's a sample/template, not actual test data - Updated all references in eval_rag_quality.py and README.md Resolves lint-and-format CI check failure. Addresses reviewer feedback about test dataset naming.	2025-11-02 10:36:03 +01:00
anouarbm	aa916f28d2	docs: add generic test_dataset.json for evaluation examples Test cases with generic examples about: - LightRAG framework features and capabilities - RAG system architecture and components - Vector database support (ChromaDB, Neo4j, Milvus, etc.) - LLM provider integrations (OpenAI, Anthropic, Ollama, etc.) - RAG evaluation metrics explanation - Deployment options (Docker, FastAPI, direct integration) - Knowledge graph-based retrieval concepts Changes: - Added generic test_dataset.json with 8 LightRAG-focused test cases - File added with git add -f to override test_* pattern This provides realistic, reusable examples for users testing their LightRAG deployments and helps demonstrate the evaluation framework.	2025-11-01 22:27:26 +01:00
anouarbm	1ad0bf82f9	feat: add RAGAS evaluation framework for RAG quality assessment This contribution adds a comprehensive evaluation system using the RAGAS framework to assess LightRAG's retrieval and generation quality. Features: - RAGEvaluator class with four key metrics: * Faithfulness: Answer accuracy vs context * Answer Relevance: Query-response alignment * Context Recall: Retrieval completeness * Context Precision: Retrieved context quality - HTTP API integration for live system testing - JSON and CSV report generation - Configurable test datasets - Complete documentation with examples - Sample test dataset included Changes: - Added lightrag/evaluation/eval_rag_quality.py (RAGAS evaluator implementation) - Added lightrag/evaluation/README.md (comprehensive documentation) - Added lightrag/evaluation/__init__.py (package initialization) - Updated pyproject.toml with optional 'evaluation' dependencies - Updated .gitignore to exclude evaluation results directory Installation: pip install lightrag-hku[evaluation] Dependencies: - ragas>=0.3.7 - datasets>=4.3.0 - httpx>=0.28.1 - pytest>=8.4.2 - pytest-asyncio>=1.2.0	2025-11-01 21:36:39 +01:00

5 commits