LightRAG

History

Claude ec70d9c857 Add comprehensive comparison of RAG evaluation methods This guide addresses the important question: "Is RAGAS the universally accepted standard?" TL;DR: ❌ RAGAS is NOT a universal standard ✅ RAGAS is the most popular open-source RAG evaluation framework (7k+ GitHub stars) ⚠️ RAG evaluation has no single "gold standard" yet - the field is too new Content: 1. Evaluation Method Landscape: - LLM-based (RAGAS, ARES, TruLens, G-Eval) - Embedding-based (BERTScore, Semantic Similarity) - Traditional NLP (BLEU, ROUGE, METEOR) - Retrieval metrics (MRR, NDCG, MAP) - Human evaluation - End-to-end task metrics 2. Detailed Framework Comparison: RAGAS (Most Popular) - Pros: Comprehensive, automated, low cost ($1-2/100 questions), easy to use - Cons: Depends on evaluation LLM, requires ground truth, non-deterministic - Best for: Quick prototyping, comparing configurations ARES (Stanford) - Pros: Low cost after training, fast, privacy-friendly - Cons: High upfront cost, domain-specific, cold start problem - Best for: Large-scale production (>10k evals/month) TruLens (Observability Platform) - Pros: Real-time monitoring, visualization, flexible - Cons: Complex, heavy dependencies - Best for: Production monitoring, debugging LlamaIndex Eval - Pros: Native LlamaIndex integration - Cons: Framework-specific, limited features - Best for: LlamaIndex users DeepEval - Pros: pytest-style testing, CI/CD friendly - Cons: Relatively new, smaller community - Best for: Development testing Traditional Metrics (BLEU/ROUGE/BERTScore) - Pros: Fast, free, deterministic - Cons: Surface-level, doesn't detect hallucination - Best for: Quick baselines, cost-sensitive scenarios 3. Comprehensive Comparison Matrix: - Comprehensiveness, automation, cost, speed, accuracy, ease of use - Cost estimates for 1000 questions ($0-$5000) - Academic vs industry practices 4. Real-World Recommendations: Prototyping: RAGAS + manual sampling (20-50 questions) Production Prep: RAGAS (100-500 cases) + expert review (50-100) + A/B test Production Running: TruLens/monitoring + RAGAS sampling + user feedback Large Scale: ARES training + real-time eval + sampling High-Risk: Automated + mandatory human review + compliance 5. Decision Tree: - Based on: ground truth availability, budget, monitoring needs, scale, risk level - Helps users choose the right evaluation strategy 6. LightRAG Recommendations: - Short-term: Add BLEU/ROUGE, retrieval metrics (Recall@K, MRR), human eval guide - Mid-term: TruLens integration (optional), custom eval functions - Long-term: Explore ARES for large-scale users 7. Key Insights: - No perfect evaluation method exists - Recommend combining multiple approaches - Automatic eval ≠ completely trustworthy - Real user feedback is the ultimate standard - Match evaluation strategy to use case References: - Academic papers (RAGAS 2023, ARES 2024, G-Eval 2023) - Open-source projects (links to all frameworks) - Industry reports (Anthropic, OpenAI, Gartner 2024) Helps users make informed decisions about RAG evaluation strategies beyond just RAGAS.		2025-11-19 13:36:56 +00:00
..
Algorithm.md	Create Algorithm.md	2025-01-24 21:19:04 +01:00
DockerDeployment.md	Add BuildKit cache mounts to optimize Docker build performance	2025-11-03 12:40:30 +08:00
EvaluatingEntityRelationQuality-zh.md	Add comprehensive entity/relation extraction quality evaluation guide	2025-11-19 12:45:31 +00:00
FrontendBuildGuide.md	Use frozen lockfile for consistent frontend builds	2025-10-14 03:34:55 +08:00
LightRAG_concurrent_explain.md	Update README	2025-07-27 17:26:49 +08:00
OfflineDeployment.md	refactor: move document deps to api group, remove dynamic imports	2025-11-13 13:34:09 +08:00
PerformanceFAQ-zh.md	Add comprehensive performance FAQ addressing max_async, LLM selection, and database optimization	2025-11-19 10:21:58 +00:00
PerformanceOptimization-zh.md	Add performance optimization guide and configuration for LightRAG indexing	2025-11-19 09:55:28 +00:00
PerformanceOptimization.md	Add performance optimization guide and configuration for LightRAG indexing	2025-11-19 09:55:28 +00:00
RAGEvaluationMethodsComparison-zh.md	Add comprehensive comparison of RAG evaluation methods	2025-11-19 13:36:56 +00:00
SelfHostedOptimization-zh.md	Add comprehensive self-hosted LLM optimization guide for LightRAG	2025-11-19 10:53:48 +00:00
UV_LOCK_GUIDE.md	Migrate Dockerfile from pip to uv package manager for faster builds	2025-10-16 01:54:20 +08:00
WhatIsGleaning-zh.md	Add comprehensive guide explaining gleaning concept in LightRAG	2025-11-19 11:45:07 +00:00
WhatIsRAGAS-zh.md	Add comprehensive RAGAS evaluation framework guide	2025-11-19 12:52:22 +00:00