Commit graph

3503 commits

Author SHA1 Message Date
anouarbm
026bca00d9 fix: Use actual retrieved contexts for RAGAS evaluation
**Critical Fix: Contexts vs Ground Truth**
- RAGAS metrics now evaluate actual retrieval performance
- Previously: Used ground_truth as contexts (always perfect scores)
- Now: Uses retrieved documents from LightRAG API (real evaluation)

**Changes to generate_rag_response (lines 100-156)**:
- Remove unused 'context' parameter
- Change return type: Dict[str, str] → Dict[str, Any]
- Extract contexts as list of strings from references[].text
- Return 'contexts' key instead of 'context' (JSON dump)
- Add response.raise_for_status() for better error handling
- Add httpx.HTTPStatusError exception handler

**Changes to evaluate_responses (lines 180-191)**:
- Line 183: Extract retrieved_contexts from rag_response
- Line 190: Use [retrieved_contexts] instead of [[ground_truth]]
- Now correctly evaluates: retrieval quality, not ground_truth quality

**Impact on RAGAS Metrics**:
- Context Precision: Now ranks actual retrieved docs by relevance
- Context Recall: Compares ground_truth against actual retrieval
- Faithfulness: Verifies answer based on actual retrieved contexts
- Answer Relevance: Unchanged (question-answer relevance)

Fixes incorrect evaluation methodology. Based on RAGAS documentation:
- contexts = retrieved documents from RAG system
- ground_truth = reference answer for context_recall metric

References:
- https://docs.ragas.io/en/stable/concepts/components/eval_dataset/
- https://docs.ragas.io/en/stable/concepts/metrics/
2025-11-02 16:16:00 +01:00
anouarbm
b12b693a81 fixed ruff format of csv path 2025-11-02 11:46:22 +01:00
anouarbm
5cdb4b0ef2 fix: Apply ruff formatting and rename test_dataset to sample_dataset
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)

**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md

Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.
2025-11-02 10:36:03 +01:00
anouarbm
aa916f28d2 docs: add generic test_dataset.json for evaluation examples
Test cases with generic examples about:
- LightRAG framework features and capabilities
- RAG system architecture and components
- Vector database support (ChromaDB, Neo4j, Milvus, etc.)
- LLM provider integrations (OpenAI, Anthropic, Ollama, etc.)
- RAG evaluation metrics explanation
- Deployment options (Docker, FastAPI, direct integration)
- Knowledge graph-based retrieval concepts

Changes:
- Added generic test_dataset.json with 8 LightRAG-focused test cases
- File added with git add -f to override test_* pattern

This provides realistic, reusable examples for users testing their
LightRAG deployments and helps demonstrate the evaluation framework.
2025-11-01 22:27:26 +01:00
anouarbm
1ad0bf82f9 feat: add RAGAS evaluation framework for RAG quality assessment
This contribution adds a comprehensive evaluation system using the RAGAS
framework to assess LightRAG's retrieval and generation quality.

Features:
- RAGEvaluator class with four key metrics:
  * Faithfulness: Answer accuracy vs context
  * Answer Relevance: Query-response alignment
  * Context Recall: Retrieval completeness
  * Context Precision: Retrieved context quality
- HTTP API integration for live system testing
- JSON and CSV report generation
- Configurable test datasets
- Complete documentation with examples
- Sample test dataset included

Changes:
- Added lightrag/evaluation/eval_rag_quality.py (RAGAS evaluator implementation)
- Added lightrag/evaluation/README.md (comprehensive documentation)
- Added lightrag/evaluation/__init__.py (package initialization)
- Updated pyproject.toml with optional 'evaluation' dependencies
- Updated .gitignore to exclude evaluation results directory

Installation:
pip install lightrag-hku[evaluation]

Dependencies:
- ragas>=0.3.7
- datasets>=4.3.0
- httpx>=0.28.1
- pytest>=8.4.2
- pytest-asyncio>=1.2.0
2025-11-01 21:36:39 +01:00
yangdx
61b57cbb5d Add PDF decryption support for password-protected files
• Add PDF_DECRYPT_PASSWORD env variable
• Check encryption status before reading
• Handle decrypt errors gracefully
• Log detailed error messages
• Support both encrypted/plain PDFs
2025-11-01 15:01:17 +08:00
yangdx
728721b14f Remove redundant separator lines in gunicorn shutdown handler 2025-11-01 12:53:54 +08:00
yangdx
6d4a55100e Remove redundant shutdown message from gunicorn 2025-11-01 12:52:22 +08:00
yangdx
ec2ea4fd3f Rename function and variables for clarity in context building
- Rename _build_llm_context to _build_context_str
- Change text_units_context to chunks_context
- Move string building before early return
- Update log messages and comments
- Consistent variable naming throughout
2025-11-01 12:15:24 +08:00
yangdx
9a8742da59 Improve entity merge logging by removing redundant message and fixing typo 2025-10-31 17:16:59 +08:00
yangdx
6b4514c8ef Reduce logging verbosity in entity merge relation processing 2025-10-31 17:02:10 +08:00
yangdx
7ccc1fdd27 Add frontend rebuild warning indicator to version display
- Return bool from check_frontend_build()
- Add ⚠️ symbol to outdated versions
- Show tooltip with rebuild message
- Add translations for warning text
- Fix tailwind config filename typo
2025-10-31 06:09:46 +08:00
yangdx
e5414c61ef Bump core version to 1.4.9.8 and API version to 0250 2025-10-31 05:23:48 +08:00
yangdx
afb5e5c1cb Fix edge cleanup when deleting entities to prevent orphaned relationships
- Track edges to delete in set
- Clean VDB before node deletion
- Remove from relation chunks storage
- Prevent orphaned relationship data
2025-10-31 02:36:15 +08:00
yangdx
c46c1b26a9 Add pycryptodome dependency for PDF encryption support 2025-10-31 01:49:42 +08:00
yangdx
c36afecba4 Remove redundant await call in file extraction pipeline 2025-10-30 20:35:41 +08:00
yangdx
c9e73bb450 Bump core version to 1.4.9.7 and API version to 0249 2025-10-30 19:43:35 +08:00
yangdx
5f4a280458 Add Qdrant legacy collection migration with workspace support
- Add QdrantMigrationError exception
- Implement automatic data migration
- Support workspace-based partitioning
- Add migration verification logic
- Update collection naming scheme
2025-10-30 19:16:33 +08:00
yangdx
f610fdaf9b Merge branch 'main' into Anush008/main 2025-10-30 11:07:39 +08:00
yangdx
3a7f753560 Bump core version to 1.4.9.6 and API version to 0248 2025-10-29 19:08:32 +08:00
yangdx
d5bcd14c6f Refactor service deployment to use direct process execution
- Remove bash wrapper script
- Update systemd service configuration
- Improve process management for gunicorn
- Simplify shared storage cleanup logic
- Update documentation for deployment
2025-10-29 18:55:47 +08:00
yangdx
6489aaa7f0 Remove worker_exit hook and improve cleanup logging
• Remove unreliable worker_exit function
• Add debug logs for cleanup modes
• Move DEBUG_LOCKS to top of file
2025-10-29 15:15:13 +08:00
yangdx
4a46d39c93 Replace GUNICORN_CMD_ARGS with custom LIGHTRAG_GUNICORN_MODE flag
• Use custom env var for mode detection
• Improve Gunicorn mode reliability
2025-10-29 14:06:03 +08:00
yangdx
816feefd84 Fix cleanup coordination between Gunicorn and UvicornWorker lifecycles
• Document UvicornWorker hook limitations
• Add GUNICORN_CMD_ARGS cleanup guard
• Prevent double cleanup in workers
2025-10-29 13:53:46 +08:00
yangdx
72b29659c9 Fix worker process cleanup to prevent shared resource conflicts
• Add worker_exit hook in gunicorn config
• Add shutdown_manager parameter in finalize_share_data of share_storage
• Prevent Manager shutdown in workers
• Remove custom signal handlers
2025-10-29 13:33:21 +08:00
yangdx
0692175c7b Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage 2025-10-29 09:49:59 +08:00
yangdx
da2e9efd11 Bump API version to 0247 2025-10-29 01:39:55 +08:00
yangdx
3fa79026e0 Fix Entity Source IDs Tracking Problem
- Handle existing node updates properly in edge merging stage
- Fix source_ids merging logic
- Reorder entity deletion and optimize node operations
- Delete relationships before entities
- Add edge existence debugging logs
2025-10-29 01:19:55 +08:00
yangdx
29c4a91dc3 Move relationship ID sorting to before vector DB operations
• Remove verbose entity rebuild logging
• Sort IDs before vector DB updates
• Keep graph storage with original order
2025-10-28 19:13:48 +08:00
yangdx
c81a56a113 Fix entity and relationship deletion when no chunk references remain 2025-10-28 16:02:35 +08:00
yangdx
88d12beae2 Add offline Swagger UI support with custom static file serving
- Disable default docs URL
- Add custom /docs endpoint
- Mount static Swagger UI files
- Include OAuth2 redirect handler
- Support offline documentation access
2025-10-28 02:23:08 +08:00
yangdx
ea006bd386 Fix entity update logic to handle renaming operations
- Add is_renaming condition check
- Ensure updates when entity renamed
2025-10-28 00:12:23 +08:00
yangdx
5155edd8d2 feat: Improve entity merge and edit UX
- **API:** The `graph/entity/edit` endpoint now returns a detailed `operation_summary` for better client-side handling of update, rename, and merge outcomes.
- **Web UI:** Added an "auto-merge on rename" option. The UI now gracefully handles merge success, partial failures (update OK, merge fail), and other errors with specific user feedback.
2025-10-27 23:42:08 +08:00
yangdx
97034f06e3 Add allow_merge parameter to entity update API endpoint 2025-10-27 14:30:27 +08:00
yangdx
11a1631d76 Refactor entity edit and merge functions to support merge-on-rename
• Extract internal implementation helpers
• Add allow_merge parameter to aedit_entity
• Support merging when renaming to existing name
• Improve code reusability and modularity
• Maintain backward compatibility
2025-10-27 14:23:51 +08:00
yangdx
411e92e6b9 Fix vector deletion logging to show actual deleted count 2025-10-27 14:22:16 +08:00
yangdx
94f24a66f2 Bump API version to 0246 2025-10-27 12:28:46 +08:00
yangdx
8dfd3bf428 Replace global graph DB lock with fine-grained keyed locking
• Use entity/relation-specific locks
• Lock multiple entities when needed
2025-10-27 02:55:58 +08:00
yangdx
2c09adb8d3 Add chunk tracking support to entity merge functionality
- Pass chunk storages to merge function
- Merge relation chunk tracking data
- Merge entity chunk tracking data
- Delete old chunk tracking records
- Persist chunk storage updates
2025-10-27 02:06:21 +08:00
yangdx
a25003c336 Fix relation deduplication logic and standardize log message prefixes 2025-10-27 00:52:56 +08:00
yangdx
ab32456a79 Refactor entity merging with unified attribute merge function
• Update GRAPH_FIELD_SEP comment clarity
• Deprecate merge_strategy parameter
• Unify entity/relation merge logic
• Add join_unique_comma strategy
2025-10-27 00:04:17 +08:00
yangdx
38559373b3 Fix entity merging to include target entity relationships
* Include target entity in collection
* Merge all relevant relationships
* Prevent relationship loss
* Fix merge completeness
2025-10-26 23:13:50 +08:00
yangdx
6015e8bc68 Refactor graph utils to use unified persistence callback
- Add _persist_graph_updates function
- Remove duplicate callback functions
2025-10-26 20:20:16 +08:00
yangdx
a3370b024d Add chunk tracking cleanup to entity/relation deletion and creation
• Clean up chunk storage on delete
• Track chunks in create operations
• Normalize relation keys consistently
2025-10-26 17:06:16 +08:00
yangdx
bf1897a67e Normalize entity order for undirected graph consistency
• Normalize entity pairs for storage
• Update API docs for undirected edges
2025-10-26 15:53:31 +08:00
yangdx
3fbd704bf9 Enhance entity/relation editing with chunk tracking synchronization
• Add chunk storage sync to edit ops
• Implement incremental chunk ID updates
• Support entity renaming migrations
• Normalize relation keys consistently
• Preserve chunk references on edits
2025-10-26 14:34:56 +08:00
Anush008
8584980e3a
refactor: Qdrant Multi-tenancy (Include staged)
Signed-off-by: Anush008 <anushshetty90@gmail.com>
2025-10-26 09:58:24 +05:30
yangdx
29bf593663 Fix entity and relation chunk cleanup in deletion pipeline
• Delete from entity_chunks storage
• Delete from relation_chunks storage
2025-10-25 22:32:27 +08:00
yangdx
5ee9a2f8c6 Fix entity consistency in knowledge graph rebuilding and merging
• Sort src/tgt for consistent ordering
• Create missing nodes before edges
• Update entity chunks storage
• Pass entity_vdb to rebuild function
• Ensure entities exist in all storages
2025-10-25 21:37:03 +08:00
yangdx
a97e5dad4c Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity
• Replace Cypher with native SQL queries
• Fix O(N²) to O(E) performance issue
• Add error handling for parse failures
• Use direct table access pattern
• Eliminate Cartesian product joins
2025-10-25 14:37:18 +08:00