History

lxobr cfe9c949a7 feat: unify comparative evals (#916 ) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - Comparative Framework: Independent benchmarking system for evaluating different RAG/QA systems - HotpotQA Dataset: 50 instances corpus and corresponding QA pairs for standardized evaluation - Base Class: Abstract QABenchmarkRAG with async pipeline for document ingestion and question answering - Three Benchmarks: Standalone implementations for Mem0, LightRAG, and Graphiti with specific dependencies ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: hajdul88 <52442977+hajdul88@users.noreply.github.com>		2025-06-11 10:06:09 +02:00
..
helpers	feat: unify comparative evals (#916 )	2025-06-11 10:06:09 +02:00
hotpot_50_corpus.json	feat: unify comparative evals (#916 )	2025-06-11 10:06:09 +02:00
hotpot_50_qa_pairs.json	feat: unify comparative evals (#916 )	2025-06-11 10:06:09 +02:00
qa_benchmark_base.py	feat: unify comparative evals (#916 )	2025-06-11 10:06:09 +02:00
qa_benchmark_graphiti.py	feat: unify comparative evals (#916 )	2025-06-11 10:06:09 +02:00
qa_benchmark_lightrag.py	feat: unify comparative evals (#916 )	2025-06-11 10:06:09 +02:00
qa_benchmark_mem0.py	feat: unify comparative evals (#916 )	2025-06-11 10:06:09 +02:00
README.md	feat: unify comparative evals (#916 )	2025-06-11 10:06:09 +02:00

Comparative QA Benchmarks

Independent benchmarks for different QA/RAG systems using HotpotQA dataset.

Dataset Files

Each benchmark can be run independently with appropriate dependencies:

pip install mem0ai openai
python qa_benchmark_mem0.py

pip install "lightrag-hku[api]"
python qa_benchmark_lightrag.py

pip install graphiti-core
python qa_benchmark_graphiti.py

Create .env with required API keys:

Each benchmark inherits from QABenchmarkRAG base class and can be configured independently.

Updated results will be posted soon.