Commit graph

3534 commits

Author SHA1 Message Date
yangdx
994a82dc7f Suppress token usage warnings for custom OpenAI-compatible endpoints
• Add warning filter for token usage
• Support vLLM, SGLang endpoints
• Non-critical for RAGAS evaluation
2025-11-05 18:25:28 +08:00
yangdx
f490622b72 Doc: Refactor evaluation README to improve clarity and structure 2025-11-05 10:43:55 +08:00
yangdx
a73314a4ba Refactor evaluation results display and logging format 2025-11-05 10:08:17 +08:00
yangdx
06b91d00f8 Improve RAG evaluation progress eval index display with zero padding 2025-11-05 09:46:07 +08:00
yangdx
2823f92fb6 Fix tqdm progress bar conflicts in concurrent RAG evaluation
• Add position pool for tqdm bars
• Serialize tqdm creation with lock
• Set leave=False to clear completed bars
• Pass position/lock to eval tasks
• Import tqdm.auto for better display
2025-11-05 02:04:13 +08:00
yangdx
e5abe9dd3d Restructure semaphore control to manage entire evaluation pipeline
• Move rag_semaphore to wrap full function
• Increase RAG concurrency to 2x eval limit
• Prevent memory buildup from slow evals
• Keep eval_semaphore for RAGAS control
2025-11-05 01:07:53 +08:00
yangdx
83715a3ac1 Implement two-stage pipeline for RAG evaluation with separate semaphores
• Split RAG gen and eval stages
• Add rag_semaphore for stage 1
• Add eval_semaphore for stage 2
• Improve concurrency control
• Update connection pool limits
2025-11-05 00:36:09 +08:00
yangdx
d36be1f499 Improve RAGAS evaluation progress tracking and clean up output handling
• Add tqdm progress bar for eval steps
• Pass progress bar to RAGAS evaluate
• Ensure progress bar cleanup in finally
• Remove redundant output buffer flushes
2025-11-05 00:16:02 +08:00
yangdx
c358f405a9 Update evaluation defaults and expand sample dataset
• Lower concurrent evals from 3 to 2
• Standardize project names in samples
• Add 3 new evaluation questions
• Expand ground truth detail coverage
• Improve dataset comprehensiveness
2025-11-04 22:17:17 +08:00
yangdx
41c26a3677 feat: add command-line args to RAG evaluation script
- Add --dataset and --ragendpoint flags
- Support short forms -d and -r
- Update README with usage examples
2025-11-04 21:40:27 +08:00
yangdx
d4b8a229b9 Update RAGAS evaluation to use gpt-4o-mini and improve compatibility
- Change default model to gpt-4o-mini
- Add deprecation warning suppression
- Update docs and comments for LightRAG
- Improve output formatting and timing
2025-11-04 18:50:53 +08:00
yangdx
6d61f70b92 Clean up RAG evaluator logging and remove excessive separator lines
• Remove excessive separator lines
• Add RAGAS concurrency comment
• Fix output buffer timing
2025-11-04 18:04:19 +08:00
yangdx
4e4b8d7e25 Update RAG evaluation metrics to use class instances instead of objects
• Import metric classes not instances
• Instantiate metrics with () syntax
2025-11-04 15:56:57 +08:00
yangdx
7abc687742 Add comprehensive configuration and compatibility fixes for RAGAS
- Fix RAGAS LLM wrapper compatibility
- Add concurrency control for rate limits
- Add eval env vars for model config
- Improve error handling and logging
- Update documentation with examples
2025-11-04 14:39:27 +08:00
yangdx
72db042667 Update .env loading and add API authentication to RAG evaluator
• Load .env from current directory
• Support LIGHTRAG_API_KEY auth header
• Override=False for env precedence
• Add Bearer token to API requests
• Enable per-instance .env configs
2025-11-04 10:59:09 +08:00
anouarbm
ad2d3c2cc0 Merge remote-tracking branch 'origin/main' into feat/ragas-evaluation 2025-11-03 13:48:14 +01:00
anouarbm
debfa0ec96 Merge branch 'feat/ragas-evaluation' of https://github.com/anouar-bm/LightRAG into feat/ragas-evaluation 2025-11-03 13:30:16 +01:00
anouarbm
a172cf893d feat(evaluation): Add sample documents for reproducible RAGAS testing
Add 5 markdown documents that users can index to reproduce evaluation results.

Changes:
- Add sample_documents/ folder with 5 markdown files covering LightRAG features
- Update sample_dataset.json with 3 improved, specific test questions
- Shorten and correct evaluation README (removed outdated info about mock responses)
- Add sample_documents reference with expected ~95% RAGAS score

Test Results with sample documents:
- Average RAGAS Score: 95.28%
- Faithfulness: 100%, Answer Relevance: 96.67%
- Context Recall: 88.89%, Context Precision: 95.56%
2025-11-03 13:28:46 +01:00
yangdx
10f6e6955f Improve Langfuse integration and stream response cleanup handling
• Check env vars before enabling Langfuse
• Move imports after env check logic
• Handle wrapper client aclose() issues
• Add debug logs for cleanup failures
2025-11-03 13:09:45 +08:00
ben moussa anouar
5da709b42a
Merge branch 'main' into feat/ragas-evaluation 2025-11-03 06:01:46 +01:00
anouarbm
36694eb9f2 fix(evaluation): Move import-time validation to runtime and improve documentation
Changes:
- Move sys.exit() calls from module level to __init__() method
- Raise proper exceptions (ImportError, ValueError, EnvironmentError) instead of sys.exit()
- Add lazy import for RAGEvaluator in __init__.py using __getattr__
- Update README to clarify sample_dataset.json contains generic test data (not personal)
- Fix README to reflect actual output format (JSON + CSV, not HTML)
- Improve documentation for custom test case creation

Addresses code review feedback about import-time validation and module exports.
2025-11-03 05:56:38 +01:00
anouarbm
9495778c2d refactor: reorder Langfuse import logic for improved clarity
Moved logger import before Langfuse block to fix NameError.
2025-11-03 05:27:41 +01:00
anouarbm
c9e1c6c1c2 fix(api): change content field to list in query responses
BREAKING CHANGE: content field is now List[str] instead of str

- Add ReferenceItem Pydantic model for type safety
- Update /query and /query/stream to return content as list
- Update OpenAPI schema and examples
- Add migration guide to API README
- Fix RAGAS evaluation to handle list format

Addresses PR #2297 feedback. Tested with RAGAS: 97.37% score.
2025-11-03 04:57:08 +01:00
anouarbm
9d69e8d776 fix(api): Change content field from string to list in query responses
BREAKING CHANGE: The `content` field in query response references is now
an array of strings instead of a concatenated string. This preserves
individual chunk boundaries when a single file has multiple chunks.

Changes:
- Update QueryResponse Pydantic model to accept List[str] for content
- Modify query_text endpoint to return content as list (query_routes.py:425)
- Modify query_text_stream endpoint to support chunk content enrichment
- Update OpenAPI schema and examples to reflect array structure
- Update API README with breaking change notice and migration guide
- Fix RAGAS evaluation to flatten chunk content lists
2025-11-03 04:37:09 +01:00
anouarbm
363f3051b1 eval using open ai 2025-11-02 19:39:56 +01:00
anouarbm
77db08038c Merge remote-tracking branch 'lightrag-fork/feat/ragas-evaluation' into feat/ragas-evaluation 2025-11-02 18:47:40 +01:00
anouarbm
0b5e3f9dc4 Use logger in RAG evaluation and optimize reference content joins 2025-11-02 18:43:53 +01:00
ben moussa anouar
98f0464a31
Update lightrag/evaluation/eval_rag_quality.py for launguage
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-02 18:03:54 +01:00
anouarbm
963ad4c637 docs: Add documentation and examples for include_chunk_content parameter
Added comprehensive documentation for the new include_chunk_content parameter
that enables retrieval of actual chunk text content in API responses.

Documentation Updates:
- Added "Include Chunk Content in References" section to API README
- Explained use cases: RAG evaluation, debugging, citations, transparency
- Provided JSON request/response examples
- Clarified parameter interaction with include_references

OpenAPI/Swagger Examples:
- Added "Response with chunk content" example to /query endpoint
- Shows complete reference structure with content field
- Demonstrates realistic chunk text content

This makes the feature discoverable through:
1. API documentation (README.md)
2. Interactive Swagger UI (http://localhost:9621/docs)
3. Code examples for developers
2025-11-02 17:53:27 +01:00
anouarbm
0bbef9814e Optimize RAGAS evaluation with parallel execution and chunk content enrichment
Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking.

Key Features:
- Single API call per evaluation (2x faster than before)
- Parallel evaluation based on MAX_ASYNC environment variable
- Chunk content enrichment in /query endpoint responses
- Comprehensive benchmark statistics (moyennes)
- NaN-safe metric calculations

API Changes:
- Added include_chunk_content parameter to QueryRequest (backward compatible)
- /query endpoint enriches references with actual chunk content when requested
- No breaking changes - default behavior unchanged

Evaluation Improvements:
- Parallel execution using asyncio.Semaphore (respects MAX_ASYNC)
- Shared HTTP client with connection pooling
- Proper timeout handling (3min connect, 5min read)
- Debug output for context retrieval verification
- Benchmark statistics with averages, min/max scores

Results:
- Moyenne RAGAS Score: 0.9772
- Perfect Faithfulness: 1.0000
- Perfect Context Recall: 1.0000
- Perfect Context Precision: 1.0000
- Excellent Answer Relevance: 0.9087
2025-11-02 17:39:43 +01:00
anouarbm
026bca00d9 fix: Use actual retrieved contexts for RAGAS evaluation
**Critical Fix: Contexts vs Ground Truth**
- RAGAS metrics now evaluate actual retrieval performance
- Previously: Used ground_truth as contexts (always perfect scores)
- Now: Uses retrieved documents from LightRAG API (real evaluation)

**Changes to generate_rag_response (lines 100-156)**:
- Remove unused 'context' parameter
- Change return type: Dict[str, str] → Dict[str, Any]
- Extract contexts as list of strings from references[].text
- Return 'contexts' key instead of 'context' (JSON dump)
- Add response.raise_for_status() for better error handling
- Add httpx.HTTPStatusError exception handler

**Changes to evaluate_responses (lines 180-191)**:
- Line 183: Extract retrieved_contexts from rag_response
- Line 190: Use [retrieved_contexts] instead of [[ground_truth]]
- Now correctly evaluates: retrieval quality, not ground_truth quality

**Impact on RAGAS Metrics**:
- Context Precision: Now ranks actual retrieved docs by relevance
- Context Recall: Compares ground_truth against actual retrieval
- Faithfulness: Verifies answer based on actual retrieved contexts
- Answer Relevance: Unchanged (question-answer relevance)

Fixes incorrect evaluation methodology. Based on RAGAS documentation:
- contexts = retrieved documents from RAG system
- ground_truth = reference answer for context_recall metric

References:
- https://docs.ragas.io/en/stable/concepts/components/eval_dataset/
- https://docs.ragas.io/en/stable/concepts/metrics/
2025-11-02 16:16:00 +01:00
anouarbm
b12b693a81 fixed ruff format of csv path 2025-11-02 11:46:22 +01:00
anouarbm
5cdb4b0ef2 fix: Apply ruff formatting and rename test_dataset to sample_dataset
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)

**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md

Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.
2025-11-02 10:36:03 +01:00
anouarbm
aa916f28d2 docs: add generic test_dataset.json for evaluation examples
Test cases with generic examples about:
- LightRAG framework features and capabilities
- RAG system architecture and components
- Vector database support (ChromaDB, Neo4j, Milvus, etc.)
- LLM provider integrations (OpenAI, Anthropic, Ollama, etc.)
- RAG evaluation metrics explanation
- Deployment options (Docker, FastAPI, direct integration)
- Knowledge graph-based retrieval concepts

Changes:
- Added generic test_dataset.json with 8 LightRAG-focused test cases
- File added with git add -f to override test_* pattern

This provides realistic, reusable examples for users testing their
LightRAG deployments and helps demonstrate the evaluation framework.
2025-11-01 22:27:26 +01:00
anouarbm
626b42bc40 feat: add optional Langfuse observability integration
This contribution adds optional Langfuse support for LLM observability and tracing.
Langfuse provides a drop-in replacement for the OpenAI client that automatically
tracks all LLM interactions without requiring code changes.

Features:
- Optional Langfuse integration with graceful fallback
- Automatic LLM request/response tracing
- Token usage tracking
- Latency metrics
- Error tracking
- Zero code changes required for existing functionality

Implementation:
- Modified lightrag/llm/openai.py to conditionally use Langfuse's AsyncOpenAI
- Falls back to standard OpenAI client if Langfuse is not installed
- Logs observability status on import

Configuration:
To enable Langfuse tracing, install the observability extras and set environment variables:

```bash
pip install lightrag-hku[observability]

export LANGFUSE_PUBLIC_KEY="your_public_key"
export LANGFUSE_SECRET_KEY="your_secret_key"
export LANGFUSE_HOST="https://cloud.langfuse.com"  # or your self-hosted instance
```

If Langfuse is not installed or environment variables are not set, LightRAG
will use the standard OpenAI client without any functionality changes.

Changes:
- Modified lightrag/llm/openai.py (added optional Langfuse import)
- Updated pyproject.toml with optional 'observability' dependencies

Dependencies (optional):
- langfuse>=3.8.1
2025-11-01 21:40:22 +01:00
anouarbm
1ad0bf82f9 feat: add RAGAS evaluation framework for RAG quality assessment
This contribution adds a comprehensive evaluation system using the RAGAS
framework to assess LightRAG's retrieval and generation quality.

Features:
- RAGEvaluator class with four key metrics:
  * Faithfulness: Answer accuracy vs context
  * Answer Relevance: Query-response alignment
  * Context Recall: Retrieval completeness
  * Context Precision: Retrieved context quality
- HTTP API integration for live system testing
- JSON and CSV report generation
- Configurable test datasets
- Complete documentation with examples
- Sample test dataset included

Changes:
- Added lightrag/evaluation/eval_rag_quality.py (RAGAS evaluator implementation)
- Added lightrag/evaluation/README.md (comprehensive documentation)
- Added lightrag/evaluation/__init__.py (package initialization)
- Updated pyproject.toml with optional 'evaluation' dependencies
- Updated .gitignore to exclude evaluation results directory

Installation:
pip install lightrag-hku[evaluation]

Dependencies:
- ragas>=0.3.7
- datasets>=4.3.0
- httpx>=0.28.1
- pytest>=8.4.2
- pytest-asyncio>=1.2.0
2025-11-01 21:36:39 +01:00
yangdx
61b57cbb5d Add PDF decryption support for password-protected files
• Add PDF_DECRYPT_PASSWORD env variable
• Check encryption status before reading
• Handle decrypt errors gracefully
• Log detailed error messages
• Support both encrypted/plain PDFs
2025-11-01 15:01:17 +08:00
yangdx
728721b14f Remove redundant separator lines in gunicorn shutdown handler 2025-11-01 12:53:54 +08:00
yangdx
6d4a55100e Remove redundant shutdown message from gunicorn 2025-11-01 12:52:22 +08:00
yangdx
ec2ea4fd3f Rename function and variables for clarity in context building
- Rename _build_llm_context to _build_context_str
- Change text_units_context to chunks_context
- Move string building before early return
- Update log messages and comments
- Consistent variable naming throughout
2025-11-01 12:15:24 +08:00
yangdx
9a8742da59 Improve entity merge logging by removing redundant message and fixing typo 2025-10-31 17:16:59 +08:00
yangdx
6b4514c8ef Reduce logging verbosity in entity merge relation processing 2025-10-31 17:02:10 +08:00
yangdx
7ccc1fdd27 Add frontend rebuild warning indicator to version display
- Return bool from check_frontend_build()
- Add ⚠️ symbol to outdated versions
- Show tooltip with rebuild message
- Add translations for warning text
- Fix tailwind config filename typo
2025-10-31 06:09:46 +08:00
yangdx
e5414c61ef Bump core version to 1.4.9.8 and API version to 0250 2025-10-31 05:23:48 +08:00
yangdx
afb5e5c1cb Fix edge cleanup when deleting entities to prevent orphaned relationships
- Track edges to delete in set
- Clean VDB before node deletion
- Remove from relation chunks storage
- Prevent orphaned relationship data
2025-10-31 02:36:15 +08:00
yangdx
c46c1b26a9 Add pycryptodome dependency for PDF encryption support 2025-10-31 01:49:42 +08:00
yangdx
c36afecba4 Remove redundant await call in file extraction pipeline 2025-10-30 20:35:41 +08:00
yangdx
c9e73bb450 Bump core version to 1.4.9.7 and API version to 0249 2025-10-30 19:43:35 +08:00
yangdx
5f4a280458 Add Qdrant legacy collection migration with workspace support
- Add QdrantMigrationError exception
- Implement automatic data migration
- Support workspace-based partitioning
- Add migration verification logic
- Update collection naming scheme
2025-10-30 19:16:33 +08:00
yangdx
f610fdaf9b Merge branch 'main' into Anush008/main 2025-10-30 11:07:39 +08:00