Commit graph

5486 commits

Author SHA1 Message Date
anouarbm
0bbef9814e Optimize RAGAS evaluation with parallel execution and chunk content enrichment
Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking.

Key Features:
- Single API call per evaluation (2x faster than before)
- Parallel evaluation based on MAX_ASYNC environment variable
- Chunk content enrichment in /query endpoint responses
- Comprehensive benchmark statistics (moyennes)
- NaN-safe metric calculations

API Changes:
- Added include_chunk_content parameter to QueryRequest (backward compatible)
- /query endpoint enriches references with actual chunk content when requested
- No breaking changes - default behavior unchanged

Evaluation Improvements:
- Parallel execution using asyncio.Semaphore (respects MAX_ASYNC)
- Shared HTTP client with connection pooling
- Proper timeout handling (3min connect, 5min read)
- Debug output for context retrieval verification
- Benchmark statistics with averages, min/max scores

Results:
- Moyenne RAGAS Score: 0.9772
- Perfect Faithfulness: 1.0000
- Perfect Context Recall: 1.0000
- Perfect Context Precision: 1.0000
- Excellent Answer Relevance: 0.9087
2025-11-02 17:39:43 +01:00
anouarbm
026bca00d9 fix: Use actual retrieved contexts for RAGAS evaluation
**Critical Fix: Contexts vs Ground Truth**
- RAGAS metrics now evaluate actual retrieval performance
- Previously: Used ground_truth as contexts (always perfect scores)
- Now: Uses retrieved documents from LightRAG API (real evaluation)

**Changes to generate_rag_response (lines 100-156)**:
- Remove unused 'context' parameter
- Change return type: Dict[str, str] → Dict[str, Any]
- Extract contexts as list of strings from references[].text
- Return 'contexts' key instead of 'context' (JSON dump)
- Add response.raise_for_status() for better error handling
- Add httpx.HTTPStatusError exception handler

**Changes to evaluate_responses (lines 180-191)**:
- Line 183: Extract retrieved_contexts from rag_response
- Line 190: Use [retrieved_contexts] instead of [[ground_truth]]
- Now correctly evaluates: retrieval quality, not ground_truth quality

**Impact on RAGAS Metrics**:
- Context Precision: Now ranks actual retrieved docs by relevance
- Context Recall: Compares ground_truth against actual retrieval
- Faithfulness: Verifies answer based on actual retrieved contexts
- Answer Relevance: Unchanged (question-answer relevance)

Fixes incorrect evaluation methodology. Based on RAGAS documentation:
- contexts = retrieved documents from RAG system
- ground_truth = reference answer for context_recall metric

References:
- https://docs.ragas.io/en/stable/concepts/components/eval_dataset/
- https://docs.ragas.io/en/stable/concepts/metrics/
2025-11-02 16:16:00 +01:00
anouarbm
b12b693a81 fixed ruff format of csv path 2025-11-02 11:46:22 +01:00
anouarbm
5cdb4b0ef2 fix: Apply ruff formatting and rename test_dataset to sample_dataset
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)

**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md

Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.
2025-11-02 10:36:03 +01:00
anouarbm
aa916f28d2 docs: add generic test_dataset.json for evaluation examples
Test cases with generic examples about:
- LightRAG framework features and capabilities
- RAG system architecture and components
- Vector database support (ChromaDB, Neo4j, Milvus, etc.)
- LLM provider integrations (OpenAI, Anthropic, Ollama, etc.)
- RAG evaluation metrics explanation
- Deployment options (Docker, FastAPI, direct integration)
- Knowledge graph-based retrieval concepts

Changes:
- Added generic test_dataset.json with 8 LightRAG-focused test cases
- File added with git add -f to override test_* pattern

This provides realistic, reusable examples for users testing their
LightRAG deployments and helps demonstrate the evaluation framework.
2025-11-01 22:27:26 +01:00
anouarbm
1ad0bf82f9 feat: add RAGAS evaluation framework for RAG quality assessment
This contribution adds a comprehensive evaluation system using the RAGAS
framework to assess LightRAG's retrieval and generation quality.

Features:
- RAGEvaluator class with four key metrics:
  * Faithfulness: Answer accuracy vs context
  * Answer Relevance: Query-response alignment
  * Context Recall: Retrieval completeness
  * Context Precision: Retrieved context quality
- HTTP API integration for live system testing
- JSON and CSV report generation
- Configurable test datasets
- Complete documentation with examples
- Sample test dataset included

Changes:
- Added lightrag/evaluation/eval_rag_quality.py (RAGAS evaluator implementation)
- Added lightrag/evaluation/README.md (comprehensive documentation)
- Added lightrag/evaluation/__init__.py (package initialization)
- Updated pyproject.toml with optional 'evaluation' dependencies
- Updated .gitignore to exclude evaluation results directory

Installation:
pip install lightrag-hku[evaluation]

Dependencies:
- ragas>=0.3.7
- datasets>=4.3.0
- httpx>=0.28.1
- pytest>=8.4.2
- pytest-asyncio>=1.2.0
2025-11-01 21:36:39 +01:00
Daniel.y
ece0398dfc
Merge pull request #2296 from danielaskdd/pdf-decryption
Feat: Add PDF Decryption Support for Password-Protected Files
2025-11-01 15:14:24 +08:00
yangdx
61b57cbb5d Add PDF decryption support for password-protected files
• Add PDF_DECRYPT_PASSWORD env variable
• Check encryption status before reading
• Handle decrypt errors gracefully
• Log detailed error messages
• Support both encrypted/plain PDFs
2025-11-01 15:01:17 +08:00
yangdx
728721b14f Remove redundant separator lines in gunicorn shutdown handler 2025-11-01 12:53:54 +08:00
yangdx
6d4a55100e Remove redundant shutdown message from gunicorn 2025-11-01 12:52:22 +08:00
Daniel.y
bc8a8842c5
Merge pull request #2295 from danielaskdd/mix-query-without-kg
Fix empty context validation bug and improve naming consistency in query context building
2025-11-01 12:20:16 +08:00
yangdx
ec2ea4fd3f Rename function and variables for clarity in context building
- Rename _build_llm_context to _build_context_str
- Change text_units_context to chunks_context
- Move string building before early return
- Update log messages and comments
- Consistent variable naming throughout
2025-11-01 12:15:24 +08:00
yangdx
9a8742da59 Improve entity merge logging by removing redundant message and fixing typo 2025-10-31 17:16:59 +08:00
yangdx
6b4514c8ef Reduce logging verbosity in entity merge relation processing 2025-10-31 17:02:10 +08:00
yangdx
2496d87148 Add data/ directory to .gitignore 2025-10-31 14:51:53 +08:00
yangdx
7ccc1fdd27 Add frontend rebuild warning indicator to version display
- Return bool from check_frontend_build()
- Add ⚠️ symbol to outdated versions
- Show tooltip with rebuild message
- Add translations for warning text
- Fix tailwind config filename typo
2025-10-31 06:09:46 +08:00
yangdx
e5414c61ef Bump core version to 1.4.9.8 and API version to 0250 2025-10-31 05:23:48 +08:00
Daniel.y
08b0283b04
Merge pull request #2291 from danielaskdd/reload-popular-labels
Refact: Auto-refresh of Popular Labels When Pipeline Completes
2025-10-31 05:20:54 +08:00
yangdx
58c83f9da5 Add auto-refresh of popular labels when pipeline completes
• Monitor pipeline busy->idle transitions
• Reload labels on dropdown open if needed
• Add onBeforeOpen callback to AsyncSelect
• Clear refresh flags after processing
• Improve label sync with backend state
2025-10-31 04:45:35 +08:00
Daniel.y
94cdbe77c5
Merge pull request #2290 from danielaskdd/delete-residual-edges
Fix: Clean Residual Edges from VDB During Entity Deletion
2025-10-31 02:44:23 +08:00
yangdx
afb5e5c1cb Fix edge cleanup when deleting entities to prevent orphaned relationships
- Track edges to delete in set
- Clean VDB before node deletion
- Remove from relation chunks storage
- Prevent orphaned relationship data
2025-10-31 02:36:15 +08:00
Daniel.y
3b48cf1643
Merge pull request #2289 from danielaskdd/fix-pycrptodome-missing
Fix: Add PyCryptodome dependency for encrypted PDF processing
2025-10-31 01:52:58 +08:00
yangdx
c46c1b26a9 Add pycryptodome dependency for PDF encryption support 2025-10-31 01:49:42 +08:00
Daniel.y
bda52a8773
Merge pull request #2287 from danielaskdd/fix-ui
Refact: Enhance Property editing UI for KG Nodes
2025-10-31 00:23:39 +08:00
yangdx
71b27ec4aa Optimize property edit dialog to use trimmed value consistently 2025-10-31 00:08:02 +08:00
yangdx
4cbd876126 feat: Update node color and legent after entity_type changed
- Move color constants to utils module
- Extract resolveNodeColor function
- Update node colors on type changes
- Simplify hook color logic
2025-10-31 00:03:55 +08:00
yangdx
79a17c3f7f Fix graph value handling for entity_id updates
• Use finalValue for entity_id changes
• Keep original value for other props
• Fix property update logic
2025-10-30 23:43:46 +08:00
yangdx
c36afecba4 Remove redundant await call in file extraction pipeline 2025-10-30 20:35:41 +08:00
yangdx
c9e73bb450 Bump core version to 1.4.9.7 and API version to 0249 2025-10-30 19:43:35 +08:00
yangdx
042cbad047 Merge branch 'qdrant-multi-tenancy' 2025-10-30 19:32:25 +08:00
yangdx
5f4a280458 Add Qdrant legacy collection migration with workspace support
- Add QdrantMigrationError exception
- Implement automatic data migration
- Support workspace-based partitioning
- Add migration verification logic
- Update collection naming scheme
2025-10-30 19:16:33 +08:00
yangdx
0498e80a42 Merge branch 'main' into qdrant-multi-tenancy 2025-10-30 14:11:00 +08:00
yangdx
78ccc4f6fd Refactor .gitignore 2025-10-30 12:56:40 +08:00
yangdx
783e2f3b1f Update uv.lock 2025-10-30 11:18:10 +08:00
yangdx
f610fdaf9b Merge branch 'main' into Anush008/main 2025-10-30 11:07:39 +08:00
Daniel.y
8145201d2e
Merge pull request #2284 from danielaskdd/fix-static-missiing
HotFix: Include static files in package distribution
2025-10-30 10:52:53 +08:00
yangdx
16d3d82a0e Include static files in package distribution
- Add static dir to MANIFEST.in
- Update package data config
- Ensure static assets are bundled
- Fix missing static file issue
2025-10-30 10:50:28 +08:00
yangdx
8af8bd80d2 docs: add frontend build steps to server installation guide 2025-10-29 21:54:47 +08:00
yangdx
0fa2fc9cab Refactor systemd service config to use environment variables
• Add LIGHTRAG_HOME environment variable
• Use .venv instead of venv directory
2025-10-29 20:14:17 +08:00
yangdx
6dc027cb75 Merge branch 'fix-exit-handler' 2025-10-29 19:15:24 +08:00
Daniel.y
a1cf01dcc1
Merge pull request #2280 from danielaskdd/fix-exit-handler
Refact: Graceful shutdown and signal handling in Gunicorn Mode
2025-10-29 19:14:46 +08:00
Daniel.y
c5ad9982d9
Merge pull request #2281 from danielaskdd/restore-query-example
Restore query generation example and fix README path reference
2025-10-29 19:12:53 +08:00
yangdx
14a015d4ad Restore query generation example and fix README path reference
• Fix path from example/ to examples/
• Add generate_query.py implementation
2025-10-29 19:11:40 +08:00
yangdx
3a7f753560 Bump core version to 1.4.9.6 and API version to 0248 2025-10-29 19:08:32 +08:00
yangdx
d5bcd14c6f Refactor service deployment to use direct process execution
- Remove bash wrapper script
- Update systemd service configuration
- Improve process management for gunicorn
- Simplify shared storage cleanup logic
- Update documentation for deployment
2025-10-29 18:55:47 +08:00
yangdx
6489aaa7f0 Remove worker_exit hook and improve cleanup logging
• Remove unreliable worker_exit function
• Add debug logs for cleanup modes
• Move DEBUG_LOCKS to top of file
2025-10-29 15:15:13 +08:00
yangdx
4a46d39c93 Replace GUNICORN_CMD_ARGS with custom LIGHTRAG_GUNICORN_MODE flag
• Use custom env var for mode detection
• Improve Gunicorn mode reliability
2025-10-29 14:06:03 +08:00
yangdx
816feefd84 Fix cleanup coordination between Gunicorn and UvicornWorker lifecycles
• Document UvicornWorker hook limitations
• Add GUNICORN_CMD_ARGS cleanup guard
• Prevent double cleanup in workers
2025-10-29 13:53:46 +08:00
yangdx
72b29659c9 Fix worker process cleanup to prevent shared resource conflicts
• Add worker_exit hook in gunicorn config
• Add shutdown_manager parameter in finalize_share_data of share_storage
• Prevent Manager shutdown in workers
• Remove custom signal handlers
2025-10-29 13:33:21 +08:00
yangdx
0692175c7b Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage 2025-10-29 09:49:59 +08:00