LightRAG

Author	SHA1	Message	Date
anouarbm	349c1945db	Optimize RAGAS evaluation with parallel execution and chunk content enrichment Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking. Key Features: - Single API call per evaluation (2x faster than before) - Parallel evaluation based on MAX_ASYNC environment variable - Chunk content enrichment in /query endpoint responses - Comprehensive benchmark statistics (moyennes) - NaN-safe metric calculations API Changes: - Added include_chunk_content parameter to QueryRequest (backward compatible) - /query endpoint enriches references with actual chunk content when requested - No breaking changes - default behavior unchanged Evaluation Improvements: - Parallel execution using asyncio.Semaphore (respects MAX_ASYNC) - Shared HTTP client with connection pooling - Proper timeout handling (3min connect, 5min read) - Debug output for context retrieval verification - Benchmark statistics with averages, min/max scores Results: - Moyenne RAGAS Score: 0.9772 - Perfect Faithfulness: 1.0000 - Perfect Context Recall: 1.0000 - Perfect Context Precision: 1.0000 - Excellent Answer Relevance: 0.9087 (cherry picked from commit `0bbef9814e`)	2025-12-04 19:11:20 +08:00
yangdx	8f16f6fe31	Fix entity and relationship deletion when no chunk references remain (cherry picked from commit `c81a56a113`)	2025-12-04 19:11:19 +08:00
yangdx	17a9771cfb	Add chunk tracking support to entity merge functionality - Pass chunk storages to merge function - Merge relation chunk tracking data - Merge entity chunk tracking data - Delete old chunk tracking records - Persist chunk storage updates (cherry picked from commit `2c09adb8d3`)	2025-12-04 19:11:19 +08:00
yangdx	450f969430	Add chunk tracking cleanup to entity/relation deletion and creation • Clean up chunk storage on delete • Track chunks in create operations • Normalize relation keys consistently (cherry picked from commit `a3370b024d`)	2025-12-04 19:11:19 +08:00
yangdx	7e0f12c28e	Enhance entity/relation editing with chunk tracking synchronization • Add chunk storage sync to edit ops • Implement incremental chunk ID updates • Support entity renaming migrations • Normalize relation keys consistently • Preserve chunk references on edits (cherry picked from commit `3fbd704bf9`)	2025-12-04 19:11:19 +08:00
yangdx	488f67e5b2	Fix entity and relation chunk cleanup in deletion pipeline • Delete from entity_chunks storage • Delete from relation_chunks storage (cherry picked from commit `29bf593663`)	2025-12-04 19:11:19 +08:00
yangdx	cb5451faf8	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage (cherry picked from commit `dc62c78f98`)	2025-12-04 19:11:19 +08:00
yangdx	851b45f726	Add pipeline status lock function for legacy compatibility - Add get_pipeline_status_lock function - Return NamespaceLock for consistency - Support workspace parameter - Enable logging option - Legacy code compatibility (cherry picked from commit `93d445dfdd`)	2025-12-04 19:11:18 +08:00
yangdx	402d2f9a98	Fix namespace parsing when workspace contains colons • Use rsplit instead of split • Handle colons in workspace names (cherry picked from commit `f8dd2e0724`)	2025-12-04 19:11:18 +08:00
yangdx	6ba35f81df	Fix: auto-acquire pipeline when idle in document deletion • Track if we acquired the pipeline lock • Auto-acquire pipeline when idle • Only release if we acquired it • Prevent concurrent deletion conflicts • Improve deletion job validation (cherry picked from commit `4048fc4b89`)	2025-12-04 19:11:18 +08:00
yangdx	5febb88824	Fix missing workspace parameter in update flags status call (cherry picked from commit `1745b30a5f`)	2025-12-04 19:11:18 +08:00
yangdx	dc4c10c346	Fix NamespaceLock context variable timing to prevent lock bricking * Acquire lock before setting ContextVar * Prevent state corruption on cancellation * Fix permanent lock brick scenario * Store context only after success * Handle acquisition failure properly (cherry picked from commit `e8383df3b8`)	2025-12-04 19:11:17 +08:00
yangdx	87561f8b28	Remove manual initialize_pipeline_status() calls across codebase - Auto-init pipeline status in storages - Remove redundant import statements - Simplify initialization pattern - Update docs and examples (cherry picked from commit `cdd53ee875`)	2025-12-04 19:11:17 +08:00
yangdx	1e7bd654d8	Fix NamespaceLock concurrent coroutine safety with ContextVar - Use ContextVar for per-coroutine storage - Prevent state interference between coroutines - Add re-entrance protection check (cherry picked from commit `b6a5a90eaf`)	2025-12-04 19:11:17 +08:00
yangdx	f6a45245bd	Add pipeline status validation before document deletion (cherry picked from commit `9d7b7981ce`)	2025-12-04 19:11:17 +08:00
yangdx	94ae13a037	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking (cherry picked from commit `926960e957`)	2025-12-04 19:11:17 +08:00
yangdx	c01cfc3649	Fix workspace filtering logic in get_all_update_flags_status • Handle namespaces with/without prefixes • Fix workspace matching logic (cherry picked from commit `7ed0eac4c9`)	2025-12-04 19:11:16 +08:00
yangdx	50f8ddd933	Fix pipeline status namespace check to handle root case - Add check for bare "pipeline_status" - Handle namespace without prefix (cherry picked from commit `78689e8837`)	2025-12-04 19:11:16 +08:00
yangdx	dfab175c16	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls (cherry picked from commit `52c812b9a0`)	2025-12-04 19:11:16 +08:00
BukeLy	fe1576943f	fix: Add default workspace support for backward compatibility Fixes two compatibility issues in workspace isolation: 1. Problem: lightrag_server.py calls initialize_pipeline_status() without workspace parameter, causing pipeline to initialize in global namespace instead of rag's workspace. Solution: Add set_default_workspace() mechanism in shared_storage. LightRAG.initialize_storages() now sets default workspace, which initialize_pipeline_status() uses when called without parameters. 2. Problem: /health endpoint hardcoded to use "pipeline_status", cannot return workspace-specific status or support frontend workspace selection. Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now extracts workspace from header or falls back to server default, returning correct workspace-specific pipeline status. Changes: - lightrag/kg/shared_storage.py: Add set/get_default_workspace() - lightrag/lightrag.py: Call set_default_workspace() in initialize_storages() - lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper, update /health endpoint to support LIGHTRAG-WORKSPACE header Testing: - Backward compatibility: Old code works without modification - Multi-instance safety: Explicit workspace passing preserved - /health endpoint: Supports both default and header-specified workspaces Related: #2353 (cherry picked from commit `18a4870229`)	2025-12-04 19:11:16 +08:00
BukeLy	f7b500bca2	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed (cherry picked from commit `eb52ec94d7`)	2025-12-04 19:11:16 +08:00
yangdx	a7330f0b95	Remove redundant await call in file extraction pipeline (cherry picked from commit `c36afecba4`)	2025-12-04 19:11:15 +08:00
yangdx	537db072e0	Add Qdrant legacy collection migration with workspace support - Add QdrantMigrationError exception - Implement automatic data migration - Support workspace-based partitioning - Add migration verification logic - Update collection naming scheme (cherry picked from commit `5f4a280458`)	2025-12-04 19:11:15 +08:00
yangdx	687d2b6b13	Improve error handling and add cancellation checks in pipeline (cherry picked from commit `77336e50b6`)	2025-12-04 19:11:15 +08:00
yangdx	a471f1ca0e	Add pipeline cancellation feature for graceful processing termination • Add cancel_pipeline API endpoint • Implement PipelineCancelledException • Add cancellation checks in main loop • Handle task cancellation gracefully • Mark cancelled docs as FAILED (cherry picked from commit `743aefc655`)	2025-12-04 19:11:15 +08:00
yangdx	37d48bafb6	Simplify skip logging and reduce pipeline status updates (cherry picked from commit `a5253244f9`)	2025-12-04 19:11:14 +08:00
yangdx	d56b4c856e	Fix trailing whitespace and update test mocking for rerank module • Remove trailing whitespace • Fix TiktokenTokenizer import patch • Add async context manager mocks • Update aiohttp.ClientSession patch • Improve test reliability (cherry picked from commit `561ba4e4b5`)	2025-12-04 19:11:14 +08:00
yangdx	322ff19f72	Remove ascii_colors dependency and fix stream handling errors • Remove ascii_colors.trace_exception calls • Add SafeStreamHandler for closed streams • Patch ascii_colors console handler • Prevent ValueError on stream close • Improve logging error handling (cherry picked from commit `0fb2925c6a`)	2025-12-04 19:11:13 +08:00
yangdx	9cf7476dd4	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety) (cherry picked from commit `c246eff725`)	2025-12-04 19:11:10 +08:00
yangdx	95d47566c1	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety) (cherry picked from commit `a24d8181c2`)	2025-12-04 19:11:10 +08:00
yangdx	033ee5c0f5	Refactor keyword_extraction from kwargs to explicit parameter • Add keyword_extraction param to functions • Remove kwargs.pop() calls • Update function signatures • Improve parameter documentation • Make parameter handling consistent (cherry picked from commit `2f16065256`)	2025-12-04 19:11:09 +08:00
anouarbm	8650307e65	feat(evaluation): Add sample documents for reproducible RAGAS testing Add 5 markdown documents that users can index to reproduce evaluation results. Changes: - Add sample_documents/ folder with 5 markdown files covering LightRAG features - Update sample_dataset.json with 3 improved, specific test questions - Shorten and correct evaluation README (removed outdated info about mock responses) - Add sample_documents reference with expected ~95% RAGAS score Test Results with sample documents: - Average RAGAS Score: 95.28% - Faithfulness: 100%, Answer Relevance: 96.67% - Context Recall: 88.89%, Context Precision: 95.56% (cherry picked from commit `a172cf893d`)	2025-12-04 19:11:09 +08:00
yangdx	cc33728c10	Improve Langfuse integration and stream response cleanup handling • Check env vars before enabling Langfuse • Move imports after env check logic • Handle wrapper client aclose() issues • Add debug logs for cleanup failures (cherry picked from commit `10f6e6955f`)	2025-12-04 19:11:08 +08:00
anouarbm	ccdd3c2786	fixed ruff format of csv path (cherry picked from commit `b12b693a81`)	2025-12-04 19:11:08 +08:00
anouarbm	949bfc4228	fix: Apply ruff formatting and rename test_dataset to sample_dataset Lint Fixes (ruff): - Sort imports alphabetically (I001) - Add blank line after import traceback (E302) - Add trailing comma to dict literals (COM812) - Reformat writer.writerow for readability (E501) Rename test_dataset.json → sample_dataset.json: - Avoids .gitignore pattern conflict (test_* is ignored) - More descriptive name - it's a sample/template, not actual test data - Updated all references in eval_rag_quality.py and README.md Resolves lint-and-format CI check failure. Addresses reviewer feedback about test dataset naming. (cherry picked from commit `5cdb4b0ef2`)	2025-12-04 19:11:08 +08:00
anouarbm	a934becfcc	feat: add optional Langfuse observability integration This contribution adds optional Langfuse support for LLM observability and tracing. Langfuse provides a drop-in replacement for the OpenAI client that automatically tracks all LLM interactions without requiring code changes. Features: - Optional Langfuse integration with graceful fallback - Automatic LLM request/response tracing - Token usage tracking - Latency metrics - Error tracking - Zero code changes required for existing functionality Implementation: - Modified lightrag/llm/openai.py to conditionally use Langfuse's AsyncOpenAI - Falls back to standard OpenAI client if Langfuse is not installed - Logs observability status on import Configuration: To enable Langfuse tracing, install the observability extras and set environment variables: ```bash pip install lightrag-hku[observability] export LANGFUSE_PUBLIC_KEY="your_public_key" export LANGFUSE_SECRET_KEY="your_secret_key" export LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted instance ``` If Langfuse is not installed or environment variables are not set, LightRAG will use the standard OpenAI client without any functionality changes. Changes: - Modified lightrag/llm/openai.py (added optional Langfuse import) - Updated pyproject.toml with optional 'observability' dependencies Dependencies (optional): - langfuse>=3.8.1 (cherry picked from commit `626b42bc40`)	2025-12-04 19:11:08 +08:00
xiaojunxiang	355aa2593c	fix(docs): correct typo "acivate" → "activate" (cherry picked from commit `9e5004e24f`)	2025-12-04 19:11:08 +08:00
Raphaël MANSUY	ed73def994	fix: sync core modules with upstream for compatibility	2025-12-04 19:10:46 +08:00
yangdx	7ce3680ca5	Add retry decorators to Neo4j read operations for resilience (cherry picked from commit `7aaa51cda9`)	2025-12-04 19:09:08 +08:00
yangdx	00d51f9dba	Fix dimension type comparison in Milvus vector field validation • Convert dimensions to int for comparison • Handle string vs int type mismatches (cherry picked from commit `0fa9a2eee3`)	2025-12-04 19:09:08 +08:00
yangdx	0594a5a049	Update pymilvus dependency from 2.5.2 to >=2.6.2 (cherry picked from commit `baab992431`)	2025-12-04 19:09:07 +08:00
yangdx	de011c99a4	Add CASCADE to AGE extension creation in PostgreSQL implementation - Add CASCADE option to CREATE EXTENSION - Ensure dependencies are installed - Fix potential AGE setup issues (cherry picked from commit `d6019c82af`)	2025-12-04 19:09:07 +08:00
yangdx	bd93f13012	Refactor: Extract retry decorator to reduce code duplication in Neo4J storage • Define READ_RETRY_EXCEPTIONS constant • Create reusable READ_RETRY decorator • Replace 11 duplicate retry decorators • Improve code maintainability • Add missing retry to edge_degrees_batch (cherry picked from commit `8c4d7a00ad`)	2025-12-04 19:09:07 +08:00
copilot-swe-agent[bot]	b28a701532	Improve edge case handling for max_tokens=1 Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com> (cherry picked from commit `8835fc244a`)	2025-12-04 19:09:07 +08:00
wmsnp	ae5cd9262b	fix: add logger to configure_vchordrq() and format code (cherry picked from commit `f4bf5d279c`)	2025-12-04 19:09:06 +08:00
wmsnp	3954bb6579	feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic (cherry picked from commit `d07023c962`)	2025-12-04 19:09:06 +08:00
yangdx	1cbe0ba885	Reduce log level and improve workspace mismatch message clarity • Change warning to info level • Simplify workspace mismatch wording (cherry picked from commit `6cef8df159`)	2025-12-04 19:09:06 +08:00
yangdx	0ac858d3e2	fix(postgres): allow vchordrq.epsilon config when probes is empty Previously, configure_vchordrq would fail silently when probes was empty (the default), preventing epsilon from being configured. Now each parameter is handled independently with conditional execution, and configuration errors fail-fast instead of being swallowed. This fixes the documented epsilon setting being impossible to use in the default configuration. (cherry picked from commit `3096f844fb`)	2025-12-04 19:09:06 +08:00
yangdx	5bd1320a1d	Refactor storage classes to use namespace instead of final_namespace (cherry picked from commit `fd486bc922`)	2025-12-04 19:09:06 +08:00
yangdx	ed46d375fb	Auto-initialize pipeline status in LightRAG.initialize_storages() • Remove manual initialize_pipeline_status calls • Auto-init in initialize_storages method • Update error messages for clarity • Warn on workspace conflicts (cherry picked from commit `e22ac52ebc`)	2025-12-04 19:09:05 +08:00

1 2 3 4 5 ...

3430 commits