**Critical Fix: Contexts vs Ground Truth**
- RAGAS metrics now evaluate actual retrieval performance
- Previously: Used ground_truth as contexts (always perfect scores)
- Now: Uses retrieved documents from LightRAG API (real evaluation)
**Changes to generate_rag_response (lines 100-156)**:
- Remove unused 'context' parameter
- Change return type: Dict[str, str] → Dict[str, Any]
- Extract contexts as list of strings from references[].text
- Return 'contexts' key instead of 'context' (JSON dump)
- Add response.raise_for_status() for better error handling
- Add httpx.HTTPStatusError exception handler
**Changes to evaluate_responses (lines 180-191)**:
- Line 183: Extract retrieved_contexts from rag_response
- Line 190: Use [retrieved_contexts] instead of [[ground_truth]]
- Now correctly evaluates: retrieval quality, not ground_truth quality
**Impact on RAGAS Metrics**:
- Context Precision: Now ranks actual retrieved docs by relevance
- Context Recall: Compares ground_truth against actual retrieval
- Faithfulness: Verifies answer based on actual retrieved contexts
- Answer Relevance: Unchanged (question-answer relevance)
Fixes incorrect evaluation methodology. Based on RAGAS documentation:
- contexts = retrieved documents from RAG system
- ground_truth = reference answer for context_recall metric
References:
- https://docs.ragas.io/en/stable/concepts/components/eval_dataset/
- https://docs.ragas.io/en/stable/concepts/metrics/
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)
**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md
Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.
This contribution adds optional Langfuse support for LLM observability and tracing.
Langfuse provides a drop-in replacement for the OpenAI client that automatically
tracks all LLM interactions without requiring code changes.
Features:
- Optional Langfuse integration with graceful fallback
- Automatic LLM request/response tracing
- Token usage tracking
- Latency metrics
- Error tracking
- Zero code changes required for existing functionality
Implementation:
- Modified lightrag/llm/openai.py to conditionally use Langfuse's AsyncOpenAI
- Falls back to standard OpenAI client if Langfuse is not installed
- Logs observability status on import
Configuration:
To enable Langfuse tracing, install the observability extras and set environment variables:
```bash
pip install lightrag-hku[observability]
export LANGFUSE_PUBLIC_KEY="your_public_key"
export LANGFUSE_SECRET_KEY="your_secret_key"
export LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted instance
```
If Langfuse is not installed or environment variables are not set, LightRAG
will use the standard OpenAI client without any functionality changes.
Changes:
- Modified lightrag/llm/openai.py (added optional Langfuse import)
- Updated pyproject.toml with optional 'observability' dependencies
Dependencies (optional):
- langfuse>=3.8.1
- Rename _build_llm_context to _build_context_str
- Change text_units_context to chunks_context
- Move string building before early return
- Update log messages and comments
- Consistent variable naming throughout
- Return bool from check_frontend_build()
- Add ⚠️ symbol to outdated versions
- Show tooltip with rebuild message
- Add translations for warning text
- Fix tailwind config filename typo
• Monitor pipeline busy->idle transitions
• Reload labels on dropdown open if needed
• Add onBeforeOpen callback to AsyncSelect
• Clear refresh flags after processing
• Improve label sync with backend state