Commit graph

34 commits

Author SHA1 Message Date
clssck
663ada943a chore: add citation system and enhance RAG UI components
Add citation tracking and display system across backend and frontend components.
Backend changes include citation.py for document attribution, enhanced query routes
with citation metadata, improved prompt templates, and PostgreSQL schema updates.
Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements,
and ChatMessage enhancements for displaying document sources. Update dependencies
and docker-compose test configuration for improved development workflow.
2025-12-01 17:50:00 +01:00
clssck
9f5948650e chore(lightrag): add wikipedia test dataset for evaluation
Add comprehensive test dataset with 7 domain-specific Wikipedia documents
(climate, finance, medical, sports) and corresponding test cases in JSON format.
Total of 2292 lines of test data across 8 files for RAG quality evaluation
and end-to-end testing infrastructure.
2025-11-30 20:14:52 +01:00
clssck
43af31f888 feat: add db_degree visibility and orphan connection UI
Graph Connectivity Awareness:
- Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph)
- Show database degree vs visual degree in node panel with amber badge
- Add visual indicator (amber border) for nodes with hidden connections
- Add "Load X hidden connection(s)" button to expand hidden neighbors
- Add configurable "Expand Depth" setting (1-5) in graph settings
- Use global maxNodes setting for node expansion consistency

Orphan Connection UI:
- Add OrphanConnectionDialog component for manual orphan entity connection
- Add OrphanConnectionControl button in graph sidebar
- Expose /graph/orphans/connect API endpoint for frontend use

Backend Improvements:
- Add get_orphan_entities() and connect_orphan_entities() to base storage
- Add orphan connection configuration parameters
- Improve entity extraction with relationship density requirements

Frontend:
- Add graphExpandDepth and graphIncludeOrphans to settings store
- Add min_degree and include_orphans graph filtering parameters
- Update translations (en.json, zh.json)
2025-11-29 21:08:07 +01:00
clssck
ef7327bb3e chore(docker-compose, lightrag): optimize test infrastructure and add evaluation tools
Add comprehensive E2E testing infrastructure with PostgreSQL performance tuning,
Gunicorn multi-worker support, and evaluation scripts for RAGAS-based quality
assessment. Introduces 4 new evaluation utilities: compare_results.py for A/B test
analysis, download_wikipedia.py for reproducible test datasets, e2e_test_harness.py
for automated evaluation pipelines, and ingest_test_docs.py for batch document
ingestion. Updates docker-compose.test.yml with aggressive async settings, memory
limits, and optimized chunking parameters. Parallelize entity summarization in
operate.py for improved extraction performance. Fix typos in merge node/edge logs.
2025-11-29 10:39:20 +01:00
yangdx
987bc09cab Update LLM cache migration docs and improve UX prompts 2025-11-08 23:48:19 +08:00
yangdx
9c05706062 Add separate endpoint configuration for LLM and embeddings in evaluation
- Split LLM and embedding API configs
- Add fallback chain for API keys
- Update docs with usage examples
2025-11-05 18:54:38 +08:00
yangdx
994a82dc7f Suppress token usage warnings for custom OpenAI-compatible endpoints
• Add warning filter for token usage
• Support vLLM, SGLang endpoints
• Non-critical for RAGAS evaluation
2025-11-05 18:25:28 +08:00
yangdx
f490622b72 Doc: Refactor evaluation README to improve clarity and structure 2025-11-05 10:43:55 +08:00
yangdx
a73314a4ba Refactor evaluation results display and logging format 2025-11-05 10:08:17 +08:00
yangdx
06b91d00f8 Improve RAG evaluation progress eval index display with zero padding 2025-11-05 09:46:07 +08:00
yangdx
2823f92fb6 Fix tqdm progress bar conflicts in concurrent RAG evaluation
• Add position pool for tqdm bars
• Serialize tqdm creation with lock
• Set leave=False to clear completed bars
• Pass position/lock to eval tasks
• Import tqdm.auto for better display
2025-11-05 02:04:13 +08:00
yangdx
e5abe9dd3d Restructure semaphore control to manage entire evaluation pipeline
• Move rag_semaphore to wrap full function
• Increase RAG concurrency to 2x eval limit
• Prevent memory buildup from slow evals
• Keep eval_semaphore for RAGAS control
2025-11-05 01:07:53 +08:00
yangdx
83715a3ac1 Implement two-stage pipeline for RAG evaluation with separate semaphores
• Split RAG gen and eval stages
• Add rag_semaphore for stage 1
• Add eval_semaphore for stage 2
• Improve concurrency control
• Update connection pool limits
2025-11-05 00:36:09 +08:00
yangdx
d36be1f499 Improve RAGAS evaluation progress tracking and clean up output handling
• Add tqdm progress bar for eval steps
• Pass progress bar to RAGAS evaluate
• Ensure progress bar cleanup in finally
• Remove redundant output buffer flushes
2025-11-05 00:16:02 +08:00
yangdx
c358f405a9 Update evaluation defaults and expand sample dataset
• Lower concurrent evals from 3 to 2
• Standardize project names in samples
• Add 3 new evaluation questions
• Expand ground truth detail coverage
• Improve dataset comprehensiveness
2025-11-04 22:17:17 +08:00
yangdx
41c26a3677 feat: add command-line args to RAG evaluation script
- Add --dataset and --ragendpoint flags
- Support short forms -d and -r
- Update README with usage examples
2025-11-04 21:40:27 +08:00
yangdx
d4b8a229b9 Update RAGAS evaluation to use gpt-4o-mini and improve compatibility
- Change default model to gpt-4o-mini
- Add deprecation warning suppression
- Update docs and comments for LightRAG
- Improve output formatting and timing
2025-11-04 18:50:53 +08:00
yangdx
6d61f70b92 Clean up RAG evaluator logging and remove excessive separator lines
• Remove excessive separator lines
• Add RAGAS concurrency comment
• Fix output buffer timing
2025-11-04 18:04:19 +08:00
yangdx
4e4b8d7e25 Update RAG evaluation metrics to use class instances instead of objects
• Import metric classes not instances
• Instantiate metrics with () syntax
2025-11-04 15:56:57 +08:00
yangdx
7abc687742 Add comprehensive configuration and compatibility fixes for RAGAS
- Fix RAGAS LLM wrapper compatibility
- Add concurrency control for rate limits
- Add eval env vars for model config
- Improve error handling and logging
- Update documentation with examples
2025-11-04 14:39:27 +08:00
yangdx
72db042667 Update .env loading and add API authentication to RAG evaluator
• Load .env from current directory
• Support LIGHTRAG_API_KEY auth header
• Override=False for env precedence
• Add Bearer token to API requests
• Enable per-instance .env configs
2025-11-04 10:59:09 +08:00
anouarbm
a172cf893d feat(evaluation): Add sample documents for reproducible RAGAS testing
Add 5 markdown documents that users can index to reproduce evaluation results.

Changes:
- Add sample_documents/ folder with 5 markdown files covering LightRAG features
- Update sample_dataset.json with 3 improved, specific test questions
- Shorten and correct evaluation README (removed outdated info about mock responses)
- Add sample_documents reference with expected ~95% RAGAS score

Test Results with sample documents:
- Average RAGAS Score: 95.28%
- Faithfulness: 100%, Answer Relevance: 96.67%
- Context Recall: 88.89%, Context Precision: 95.56%
2025-11-03 13:28:46 +01:00
anouarbm
36694eb9f2 fix(evaluation): Move import-time validation to runtime and improve documentation
Changes:
- Move sys.exit() calls from module level to __init__() method
- Raise proper exceptions (ImportError, ValueError, EnvironmentError) instead of sys.exit()
- Add lazy import for RAGEvaluator in __init__.py using __getattr__
- Update README to clarify sample_dataset.json contains generic test data (not personal)
- Fix README to reflect actual output format (JSON + CSV, not HTML)
- Improve documentation for custom test case creation

Addresses code review feedback about import-time validation and module exports.
2025-11-03 05:56:38 +01:00
anouarbm
9d69e8d776 fix(api): Change content field from string to list in query responses
BREAKING CHANGE: The `content` field in query response references is now
an array of strings instead of a concatenated string. This preserves
individual chunk boundaries when a single file has multiple chunks.

Changes:
- Update QueryResponse Pydantic model to accept List[str] for content
- Modify query_text endpoint to return content as list (query_routes.py:425)
- Modify query_text_stream endpoint to support chunk content enrichment
- Update OpenAPI schema and examples to reflect array structure
- Update API README with breaking change notice and migration guide
- Fix RAGAS evaluation to flatten chunk content lists
2025-11-03 04:37:09 +01:00
anouarbm
363f3051b1 eval using open ai 2025-11-02 19:39:56 +01:00
anouarbm
77db08038c Merge remote-tracking branch 'lightrag-fork/feat/ragas-evaluation' into feat/ragas-evaluation 2025-11-02 18:47:40 +01:00
anouarbm
0b5e3f9dc4 Use logger in RAG evaluation and optimize reference content joins 2025-11-02 18:43:53 +01:00
ben moussa anouar
98f0464a31
Update lightrag/evaluation/eval_rag_quality.py for launguage
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-02 18:03:54 +01:00
anouarbm
0bbef9814e Optimize RAGAS evaluation with parallel execution and chunk content enrichment
Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking.

Key Features:
- Single API call per evaluation (2x faster than before)
- Parallel evaluation based on MAX_ASYNC environment variable
- Chunk content enrichment in /query endpoint responses
- Comprehensive benchmark statistics (moyennes)
- NaN-safe metric calculations

API Changes:
- Added include_chunk_content parameter to QueryRequest (backward compatible)
- /query endpoint enriches references with actual chunk content when requested
- No breaking changes - default behavior unchanged

Evaluation Improvements:
- Parallel execution using asyncio.Semaphore (respects MAX_ASYNC)
- Shared HTTP client with connection pooling
- Proper timeout handling (3min connect, 5min read)
- Debug output for context retrieval verification
- Benchmark statistics with averages, min/max scores

Results:
- Moyenne RAGAS Score: 0.9772
- Perfect Faithfulness: 1.0000
- Perfect Context Recall: 1.0000
- Perfect Context Precision: 1.0000
- Excellent Answer Relevance: 0.9087
2025-11-02 17:39:43 +01:00
anouarbm
026bca00d9 fix: Use actual retrieved contexts for RAGAS evaluation
**Critical Fix: Contexts vs Ground Truth**
- RAGAS metrics now evaluate actual retrieval performance
- Previously: Used ground_truth as contexts (always perfect scores)
- Now: Uses retrieved documents from LightRAG API (real evaluation)

**Changes to generate_rag_response (lines 100-156)**:
- Remove unused 'context' parameter
- Change return type: Dict[str, str] → Dict[str, Any]
- Extract contexts as list of strings from references[].text
- Return 'contexts' key instead of 'context' (JSON dump)
- Add response.raise_for_status() for better error handling
- Add httpx.HTTPStatusError exception handler

**Changes to evaluate_responses (lines 180-191)**:
- Line 183: Extract retrieved_contexts from rag_response
- Line 190: Use [retrieved_contexts] instead of [[ground_truth]]
- Now correctly evaluates: retrieval quality, not ground_truth quality

**Impact on RAGAS Metrics**:
- Context Precision: Now ranks actual retrieved docs by relevance
- Context Recall: Compares ground_truth against actual retrieval
- Faithfulness: Verifies answer based on actual retrieved contexts
- Answer Relevance: Unchanged (question-answer relevance)

Fixes incorrect evaluation methodology. Based on RAGAS documentation:
- contexts = retrieved documents from RAG system
- ground_truth = reference answer for context_recall metric

References:
- https://docs.ragas.io/en/stable/concepts/components/eval_dataset/
- https://docs.ragas.io/en/stable/concepts/metrics/
2025-11-02 16:16:00 +01:00
anouarbm
b12b693a81 fixed ruff format of csv path 2025-11-02 11:46:22 +01:00
anouarbm
5cdb4b0ef2 fix: Apply ruff formatting and rename test_dataset to sample_dataset
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)

**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md

Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.
2025-11-02 10:36:03 +01:00
anouarbm
aa916f28d2 docs: add generic test_dataset.json for evaluation examples
Test cases with generic examples about:
- LightRAG framework features and capabilities
- RAG system architecture and components
- Vector database support (ChromaDB, Neo4j, Milvus, etc.)
- LLM provider integrations (OpenAI, Anthropic, Ollama, etc.)
- RAG evaluation metrics explanation
- Deployment options (Docker, FastAPI, direct integration)
- Knowledge graph-based retrieval concepts

Changes:
- Added generic test_dataset.json with 8 LightRAG-focused test cases
- File added with git add -f to override test_* pattern

This provides realistic, reusable examples for users testing their
LightRAG deployments and helps demonstrate the evaluation framework.
2025-11-01 22:27:26 +01:00
anouarbm
1ad0bf82f9 feat: add RAGAS evaluation framework for RAG quality assessment
This contribution adds a comprehensive evaluation system using the RAGAS
framework to assess LightRAG's retrieval and generation quality.

Features:
- RAGEvaluator class with four key metrics:
  * Faithfulness: Answer accuracy vs context
  * Answer Relevance: Query-response alignment
  * Context Recall: Retrieval completeness
  * Context Precision: Retrieved context quality
- HTTP API integration for live system testing
- JSON and CSV report generation
- Configurable test datasets
- Complete documentation with examples
- Sample test dataset included

Changes:
- Added lightrag/evaluation/eval_rag_quality.py (RAGAS evaluator implementation)
- Added lightrag/evaluation/README.md (comprehensive documentation)
- Added lightrag/evaluation/__init__.py (package initialization)
- Updated pyproject.toml with optional 'evaluation' dependencies
- Updated .gitignore to exclude evaluation results directory

Installation:
pip install lightrag-hku[evaluation]

Dependencies:
- ragas>=0.3.7
- datasets>=4.3.0
- httpx>=0.28.1
- pytest>=8.4.2
- pytest-asyncio>=1.2.0
2025-11-01 21:36:39 +01:00