Commit graph

8 commits

Author SHA1 Message Date
anouarbm
349c1945db Optimize RAGAS evaluation with parallel execution and chunk content enrichment
Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking.

Key Features:
- Single API call per evaluation (2x faster than before)
- Parallel evaluation based on MAX_ASYNC environment variable
- Chunk content enrichment in /query endpoint responses
- Comprehensive benchmark statistics (moyennes)
- NaN-safe metric calculations

API Changes:
- Added include_chunk_content parameter to QueryRequest (backward compatible)
- /query endpoint enriches references with actual chunk content when requested
- No breaking changes - default behavior unchanged

Evaluation Improvements:
- Parallel execution using asyncio.Semaphore (respects MAX_ASYNC)
- Shared HTTP client with connection pooling
- Proper timeout handling (3min connect, 5min read)
- Debug output for context retrieval verification
- Benchmark statistics with averages, min/max scores

Results:
- Moyenne RAGAS Score: 0.9772
- Perfect Faithfulness: 1.0000
- Perfect Context Recall: 1.0000
- Perfect Context Precision: 1.0000
- Excellent Answer Relevance: 0.9087

(cherry picked from commit 0bbef9814e)
2025-12-04 19:11:20 +08:00
anouarbm
8650307e65 feat(evaluation): Add sample documents for reproducible RAGAS testing
Add 5 markdown documents that users can index to reproduce evaluation results.

Changes:
- Add sample_documents/ folder with 5 markdown files covering LightRAG features
- Update sample_dataset.json with 3 improved, specific test questions
- Shorten and correct evaluation README (removed outdated info about mock responses)
- Add sample_documents reference with expected ~95% RAGAS score

Test Results with sample documents:
- Average RAGAS Score: 95.28%
- Faithfulness: 100%, Answer Relevance: 96.67%
- Context Recall: 88.89%, Context Precision: 95.56%

(cherry picked from commit a172cf893d)
2025-12-04 19:11:09 +08:00
anouarbm
ccdd3c2786 fixed ruff format of csv path
(cherry picked from commit b12b693a81)
2025-12-04 19:11:08 +08:00
anouarbm
949bfc4228 fix: Apply ruff formatting and rename test_dataset to sample_dataset
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)

**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md

Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.

(cherry picked from commit 5cdb4b0ef2)
2025-12-04 19:11:08 +08:00
yangdx
7896c42fba Restructure semaphore control to manage entire evaluation pipeline
• Move rag_semaphore to wrap full function
• Increase RAG concurrency to 2x eval limit
• Prevent memory buildup from slow evals
• Keep eval_semaphore for RAGAS control

(cherry picked from commit e5abe9dd3d)
2025-12-04 19:09:02 +08:00
yangdx
c459caed26 Implement two-stage pipeline for RAG evaluation with separate semaphores
• Split RAG gen and eval stages
• Add rag_semaphore for stage 1
• Add eval_semaphore for stage 2
• Improve concurrency control
• Update connection pool limits

(cherry picked from commit 83715a3ac1)
2025-12-04 19:09:02 +08:00
ben moussa anouar
dd425e5513 Update lightrag/evaluation/eval_rag_quality.py for launguage
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit 98f0464a31)
2025-12-04 19:09:02 +08:00
yangdx
dec282694c Update .env loading and add API authentication to RAG evaluator
• Load .env from current directory
• Support LIGHTRAG_API_KEY auth header
• Override=False for env precedence
• Add Bearer token to API requests
• Enable per-instance .env configs

(cherry picked from commit 72db042667)
2025-12-04 19:04:25 +08:00