LightRAG

Author	SHA1	Message	Date
anouarbm	9d69e8d776	fix(api): Change content field from string to list in query responses BREAKING CHANGE: The `content` field in query response references is now an array of strings instead of a concatenated string. This preserves individual chunk boundaries when a single file has multiple chunks. Changes: - Update QueryResponse Pydantic model to accept List[str] for content - Modify query_text endpoint to return content as list (query_routes.py:425) - Modify query_text_stream endpoint to support chunk content enrichment - Update OpenAPI schema and examples to reflect array structure - Update API README with breaking change notice and migration guide - Fix RAGAS evaluation to flatten chunk content lists	2025-11-03 04:37:09 +01:00
anouarbm	363f3051b1	eval using open ai	2025-11-02 19:39:56 +01:00
anouarbm	77db08038c	Merge remote-tracking branch 'lightrag-fork/feat/ragas-evaluation' into feat/ragas-evaluation	2025-11-02 18:47:40 +01:00
anouarbm	0b5e3f9dc4	Use logger in RAG evaluation and optimize reference content joins	2025-11-02 18:43:53 +01:00
ben moussa anouar	98f0464a31	Update lightrag/evaluation/eval_rag_quality.py for launguage Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-02 18:03:54 +01:00
anouarbm	963ad4c637	docs: Add documentation and examples for include_chunk_content parameter Added comprehensive documentation for the new include_chunk_content parameter that enables retrieval of actual chunk text content in API responses. Documentation Updates: - Added "Include Chunk Content in References" section to API README - Explained use cases: RAG evaluation, debugging, citations, transparency - Provided JSON request/response examples - Clarified parameter interaction with include_references OpenAPI/Swagger Examples: - Added "Response with chunk content" example to /query endpoint - Shows complete reference structure with content field - Demonstrates realistic chunk text content This makes the feature discoverable through: 1. API documentation (README.md) 2. Interactive Swagger UI (http://localhost:9621/docs) 3. Code examples for developers	2025-11-02 17:53:27 +01:00
anouarbm	0bbef9814e	Optimize RAGAS evaluation with parallel execution and chunk content enrichment Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking. Key Features: - Single API call per evaluation (2x faster than before) - Parallel evaluation based on MAX_ASYNC environment variable - Chunk content enrichment in /query endpoint responses - Comprehensive benchmark statistics (moyennes) - NaN-safe metric calculations API Changes: - Added include_chunk_content parameter to QueryRequest (backward compatible) - /query endpoint enriches references with actual chunk content when requested - No breaking changes - default behavior unchanged Evaluation Improvements: - Parallel execution using asyncio.Semaphore (respects MAX_ASYNC) - Shared HTTP client with connection pooling - Proper timeout handling (3min connect, 5min read) - Debug output for context retrieval verification - Benchmark statistics with averages, min/max scores Results: - Moyenne RAGAS Score: 0.9772 - Perfect Faithfulness: 1.0000 - Perfect Context Recall: 1.0000 - Perfect Context Precision: 1.0000 - Excellent Answer Relevance: 0.9087	2025-11-02 17:39:43 +01:00
anouarbm	026bca00d9	fix: Use actual retrieved contexts for RAGAS evaluation Critical Fix: Contexts vs Ground Truth - RAGAS metrics now evaluate actual retrieval performance - Previously: Used ground_truth as contexts (always perfect scores) - Now: Uses retrieved documents from LightRAG API (real evaluation) Changes to generate_rag_response (lines 100-156): - Remove unused 'context' parameter - Change return type: Dict[str, str] → Dict[str, Any] - Extract contexts as list of strings from references[].text - Return 'contexts' key instead of 'context' (JSON dump) - Add response.raise_for_status() for better error handling - Add httpx.HTTPStatusError exception handler Changes to evaluate_responses (lines 180-191): - Line 183: Extract retrieved_contexts from rag_response - Line 190: Use [retrieved_contexts] instead of [[ground_truth]] - Now correctly evaluates: retrieval quality, not ground_truth quality Impact on RAGAS Metrics: - Context Precision: Now ranks actual retrieved docs by relevance - Context Recall: Compares ground_truth against actual retrieval - Faithfulness: Verifies answer based on actual retrieved contexts - Answer Relevance: Unchanged (question-answer relevance) Fixes incorrect evaluation methodology. Based on RAGAS documentation: - contexts = retrieved documents from RAG system - ground_truth = reference answer for context_recall metric References: - https://docs.ragas.io/en/stable/concepts/components/eval_dataset/ - https://docs.ragas.io/en/stable/concepts/metrics/	2025-11-02 16:16:00 +01:00
anouarbm	b12b693a81	fixed ruff format of csv path	2025-11-02 11:46:22 +01:00
anouarbm	5cdb4b0ef2	fix: Apply ruff formatting and rename test_dataset to sample_dataset Lint Fixes (ruff): - Sort imports alphabetically (I001) - Add blank line after import traceback (E302) - Add trailing comma to dict literals (COM812) - Reformat writer.writerow for readability (E501) Rename test_dataset.json → sample_dataset.json: - Avoids .gitignore pattern conflict (test_* is ignored) - More descriptive name - it's a sample/template, not actual test data - Updated all references in eval_rag_quality.py and README.md Resolves lint-and-format CI check failure. Addresses reviewer feedback about test dataset naming.	2025-11-02 10:36:03 +01:00
anouarbm	aa916f28d2	docs: add generic test_dataset.json for evaluation examples Test cases with generic examples about: - LightRAG framework features and capabilities - RAG system architecture and components - Vector database support (ChromaDB, Neo4j, Milvus, etc.) - LLM provider integrations (OpenAI, Anthropic, Ollama, etc.) - RAG evaluation metrics explanation - Deployment options (Docker, FastAPI, direct integration) - Knowledge graph-based retrieval concepts Changes: - Added generic test_dataset.json with 8 LightRAG-focused test cases - File added with git add -f to override test_* pattern This provides realistic, reusable examples for users testing their LightRAG deployments and helps demonstrate the evaluation framework.	2025-11-01 22:27:26 +01:00
anouarbm	1ad0bf82f9	feat: add RAGAS evaluation framework for RAG quality assessment This contribution adds a comprehensive evaluation system using the RAGAS framework to assess LightRAG's retrieval and generation quality. Features: - RAGEvaluator class with four key metrics: * Faithfulness: Answer accuracy vs context * Answer Relevance: Query-response alignment * Context Recall: Retrieval completeness * Context Precision: Retrieved context quality - HTTP API integration for live system testing - JSON and CSV report generation - Configurable test datasets - Complete documentation with examples - Sample test dataset included Changes: - Added lightrag/evaluation/eval_rag_quality.py (RAGAS evaluator implementation) - Added lightrag/evaluation/README.md (comprehensive documentation) - Added lightrag/evaluation/__init__.py (package initialization) - Updated pyproject.toml with optional 'evaluation' dependencies - Updated .gitignore to exclude evaluation results directory Installation: pip install lightrag-hku[evaluation] Dependencies: - ragas>=0.3.7 - datasets>=4.3.0 - httpx>=0.28.1 - pytest>=8.4.2 - pytest-asyncio>=1.2.0	2025-11-01 21:36:39 +01:00
Daniel.y	ece0398dfc	Merge pull request #2296 from danielaskdd/pdf-decryption Feat: Add PDF Decryption Support for Password-Protected Files	2025-11-01 15:14:24 +08:00
yangdx	61b57cbb5d	Add PDF decryption support for password-protected files • Add PDF_DECRYPT_PASSWORD env variable • Check encryption status before reading • Handle decrypt errors gracefully • Log detailed error messages • Support both encrypted/plain PDFs	2025-11-01 15:01:17 +08:00
yangdx	728721b14f	Remove redundant separator lines in gunicorn shutdown handler	2025-11-01 12:53:54 +08:00
yangdx	6d4a55100e	Remove redundant shutdown message from gunicorn	2025-11-01 12:52:22 +08:00
Daniel.y	bc8a8842c5	Merge pull request #2295 from danielaskdd/mix-query-without-kg Fix empty context validation bug and improve naming consistency in query context building	2025-11-01 12:20:16 +08:00
yangdx	ec2ea4fd3f	Rename function and variables for clarity in context building - Rename _build_llm_context to _build_context_str - Change text_units_context to chunks_context - Move string building before early return - Update log messages and comments - Consistent variable naming throughout	2025-11-01 12:15:24 +08:00
yangdx	9a8742da59	Improve entity merge logging by removing redundant message and fixing typo	2025-10-31 17:16:59 +08:00
yangdx	6b4514c8ef	Reduce logging verbosity in entity merge relation processing	2025-10-31 17:02:10 +08:00
yangdx	2496d87148	Add data/ directory to .gitignore	2025-10-31 14:51:53 +08:00
yangdx	7ccc1fdd27	Add frontend rebuild warning indicator to version display - Return bool from check_frontend_build() - Add ⚠️ symbol to outdated versions - Show tooltip with rebuild message - Add translations for warning text - Fix tailwind config filename typo	2025-10-31 06:09:46 +08:00
yangdx	e5414c61ef	Bump core version to 1.4.9.8 and API version to 0250	2025-10-31 05:23:48 +08:00
Daniel.y	08b0283b04	Merge pull request #2291 from danielaskdd/reload-popular-labels Refact: Auto-refresh of Popular Labels When Pipeline Completes	2025-10-31 05:20:54 +08:00
yangdx	58c83f9da5	Add auto-refresh of popular labels when pipeline completes • Monitor pipeline busy->idle transitions • Reload labels on dropdown open if needed • Add onBeforeOpen callback to AsyncSelect • Clear refresh flags after processing • Improve label sync with backend state	2025-10-31 04:45:35 +08:00
Daniel.y	94cdbe77c5	Merge pull request #2290 from danielaskdd/delete-residual-edges Fix: Clean Residual Edges from VDB During Entity Deletion	2025-10-31 02:44:23 +08:00
yangdx	afb5e5c1cb	Fix edge cleanup when deleting entities to prevent orphaned relationships - Track edges to delete in set - Clean VDB before node deletion - Remove from relation chunks storage - Prevent orphaned relationship data	2025-10-31 02:36:15 +08:00
Daniel.y	3b48cf1643	Merge pull request #2289 from danielaskdd/fix-pycrptodome-missing Fix: Add PyCryptodome dependency for encrypted PDF processing	2025-10-31 01:52:58 +08:00
yangdx	c46c1b26a9	Add pycryptodome dependency for PDF encryption support	2025-10-31 01:49:42 +08:00
Daniel.y	bda52a8773	Merge pull request #2287 from danielaskdd/fix-ui Refact: Enhance Property editing UI for KG Nodes	2025-10-31 00:23:39 +08:00
yangdx	71b27ec4aa	Optimize property edit dialog to use trimmed value consistently	2025-10-31 00:08:02 +08:00
yangdx	4cbd876126	feat: Update node color and legent after entity_type changed - Move color constants to utils module - Extract resolveNodeColor function - Update node colors on type changes - Simplify hook color logic	2025-10-31 00:03:55 +08:00
yangdx	79a17c3f7f	Fix graph value handling for entity_id updates • Use finalValue for entity_id changes • Keep original value for other props • Fix property update logic	2025-10-30 23:43:46 +08:00
yangdx	c36afecba4	Remove redundant await call in file extraction pipeline	2025-10-30 20:35:41 +08:00
yangdx	c9e73bb450	Bump core version to 1.4.9.7 and API version to 0249	2025-10-30 19:43:35 +08:00
yangdx	042cbad047	Merge branch 'qdrant-multi-tenancy'	2025-10-30 19:32:25 +08:00
yangdx	5f4a280458	Add Qdrant legacy collection migration with workspace support - Add QdrantMigrationError exception - Implement automatic data migration - Support workspace-based partitioning - Add migration verification logic - Update collection naming scheme	2025-10-30 19:16:33 +08:00
yangdx	0498e80a42	Merge branch 'main' into qdrant-multi-tenancy	2025-10-30 14:11:00 +08:00
yangdx	78ccc4f6fd	Refactor .gitignore	2025-10-30 12:56:40 +08:00
yangdx	783e2f3b1f	Update uv.lock	2025-10-30 11:18:10 +08:00
yangdx	f610fdaf9b	Merge branch 'main' into Anush008/main	2025-10-30 11:07:39 +08:00
Daniel.y	8145201d2e	Merge pull request #2284 from danielaskdd/fix-static-missiing HotFix: Include static files in package distribution	2025-10-30 10:52:53 +08:00
yangdx	16d3d82a0e	Include static files in package distribution - Add static dir to MANIFEST.in - Update package data config - Ensure static assets are bundled - Fix missing static file issue	2025-10-30 10:50:28 +08:00
yangdx	8af8bd80d2	docs: add frontend build steps to server installation guide	2025-10-29 21:54:47 +08:00
yangdx	0fa2fc9cab	Refactor systemd service config to use environment variables • Add LIGHTRAG_HOME environment variable • Use .venv instead of venv directory	2025-10-29 20:14:17 +08:00
yangdx	6dc027cb75	Merge branch 'fix-exit-handler'	2025-10-29 19:15:24 +08:00
Daniel.y	a1cf01dcc1	Merge pull request #2280 from danielaskdd/fix-exit-handler Refact: Graceful shutdown and signal handling in Gunicorn Mode	2025-10-29 19:14:46 +08:00
Daniel.y	c5ad9982d9	Merge pull request #2281 from danielaskdd/restore-query-example Restore query generation example and fix README path reference	2025-10-29 19:12:53 +08:00
yangdx	14a015d4ad	Restore query generation example and fix README path reference • Fix path from example/ to examples/ • Add generate_query.py implementation	2025-10-29 19:11:40 +08:00
yangdx	3a7f753560	Bump core version to 1.4.9.6 and API version to 0248	2025-10-29 19:08:32 +08:00

1 2 3 4 5 ...

5492 commits