Commit graph

7 commits

Author SHA1 Message Date
yangdx
41c26a3677 feat: add command-line args to RAG evaluation script
- Add --dataset and --ragendpoint flags
- Support short forms -d and -r
- Update README with usage examples
2025-11-04 21:40:27 +08:00
yangdx
d4b8a229b9 Update RAGAS evaluation to use gpt-4o-mini and improve compatibility
- Change default model to gpt-4o-mini
- Add deprecation warning suppression
- Update docs and comments for LightRAG
- Improve output formatting and timing
2025-11-04 18:50:53 +08:00
yangdx
7abc687742 Add comprehensive configuration and compatibility fixes for RAGAS
- Fix RAGAS LLM wrapper compatibility
- Add concurrency control for rate limits
- Add eval env vars for model config
- Improve error handling and logging
- Update documentation with examples
2025-11-04 14:39:27 +08:00
anouarbm
a172cf893d feat(evaluation): Add sample documents for reproducible RAGAS testing
Add 5 markdown documents that users can index to reproduce evaluation results.

Changes:
- Add sample_documents/ folder with 5 markdown files covering LightRAG features
- Update sample_dataset.json with 3 improved, specific test questions
- Shorten and correct evaluation README (removed outdated info about mock responses)
- Add sample_documents reference with expected ~95% RAGAS score

Test Results with sample documents:
- Average RAGAS Score: 95.28%
- Faithfulness: 100%, Answer Relevance: 96.67%
- Context Recall: 88.89%, Context Precision: 95.56%
2025-11-03 13:28:46 +01:00
anouarbm
36694eb9f2 fix(evaluation): Move import-time validation to runtime and improve documentation
Changes:
- Move sys.exit() calls from module level to __init__() method
- Raise proper exceptions (ImportError, ValueError, EnvironmentError) instead of sys.exit()
- Add lazy import for RAGEvaluator in __init__.py using __getattr__
- Update README to clarify sample_dataset.json contains generic test data (not personal)
- Fix README to reflect actual output format (JSON + CSV, not HTML)
- Improve documentation for custom test case creation

Addresses code review feedback about import-time validation and module exports.
2025-11-03 05:56:38 +01:00
anouarbm
5cdb4b0ef2 fix: Apply ruff formatting and rename test_dataset to sample_dataset
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)

**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md

Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.
2025-11-02 10:36:03 +01:00
anouarbm
1ad0bf82f9 feat: add RAGAS evaluation framework for RAG quality assessment
This contribution adds a comprehensive evaluation system using the RAGAS
framework to assess LightRAG's retrieval and generation quality.

Features:
- RAGEvaluator class with four key metrics:
  * Faithfulness: Answer accuracy vs context
  * Answer Relevance: Query-response alignment
  * Context Recall: Retrieval completeness
  * Context Precision: Retrieved context quality
- HTTP API integration for live system testing
- JSON and CSV report generation
- Configurable test datasets
- Complete documentation with examples
- Sample test dataset included

Changes:
- Added lightrag/evaluation/eval_rag_quality.py (RAGAS evaluator implementation)
- Added lightrag/evaluation/README.md (comprehensive documentation)
- Added lightrag/evaluation/__init__.py (package initialization)
- Updated pyproject.toml with optional 'evaluation' dependencies
- Updated .gitignore to exclude evaluation results directory

Installation:
pip install lightrag-hku[evaluation]

Dependencies:
- ragas>=0.3.7
- datasets>=4.3.0
- httpx>=0.28.1
- pytest>=8.4.2
- pytest-asyncio>=1.2.0
2025-11-01 21:36:39 +01:00