LightRAG

Author	SHA1	Message	Date
yangdx	537db072e0	Add Qdrant legacy collection migration with workspace support - Add QdrantMigrationError exception - Implement automatic data migration - Support workspace-based partitioning - Add migration verification logic - Update collection naming scheme (cherry picked from commit `5f4a280458`)	2025-12-04 19:11:15 +08:00
yangdx	687d2b6b13	Improve error handling and add cancellation checks in pipeline (cherry picked from commit `77336e50b6`)	2025-12-04 19:11:15 +08:00
yangdx	a471f1ca0e	Add pipeline cancellation feature for graceful processing termination • Add cancel_pipeline API endpoint • Implement PipelineCancelledException • Add cancellation checks in main loop • Handle task cancellation gracefully • Mark cancelled docs as FAILED (cherry picked from commit `743aefc655`)	2025-12-04 19:11:15 +08:00
yangdx	37d48bafb6	Simplify skip logging and reduce pipeline status updates (cherry picked from commit `a5253244f9`)	2025-12-04 19:11:14 +08:00
yangdx	d56b4c856e	Fix trailing whitespace and update test mocking for rerank module • Remove trailing whitespace • Fix TiktokenTokenizer import patch • Add async context manager mocks • Update aiohttp.ClientSession patch • Improve test reliability (cherry picked from commit `561ba4e4b5`)	2025-12-04 19:11:14 +08:00
yangdx	322ff19f72	Remove ascii_colors dependency and fix stream handling errors • Remove ascii_colors.trace_exception calls • Add SafeStreamHandler for closed streams • Patch ascii_colors console handler • Prevent ValueError on stream close • Improve logging error handling (cherry picked from commit `0fb2925c6a`)	2025-12-04 19:11:13 +08:00
yangdx	9cf7476dd4	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety) (cherry picked from commit `c246eff725`)	2025-12-04 19:11:10 +08:00
yangdx	95d47566c1	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety) (cherry picked from commit `a24d8181c2`)	2025-12-04 19:11:10 +08:00
yangdx	033ee5c0f5	Refactor keyword_extraction from kwargs to explicit parameter • Add keyword_extraction param to functions • Remove kwargs.pop() calls • Update function signatures • Improve parameter documentation • Make parameter handling consistent (cherry picked from commit `2f16065256`)	2025-12-04 19:11:09 +08:00
anouarbm	8650307e65	feat(evaluation): Add sample documents for reproducible RAGAS testing Add 5 markdown documents that users can index to reproduce evaluation results. Changes: - Add sample_documents/ folder with 5 markdown files covering LightRAG features - Update sample_dataset.json with 3 improved, specific test questions - Shorten and correct evaluation README (removed outdated info about mock responses) - Add sample_documents reference with expected ~95% RAGAS score Test Results with sample documents: - Average RAGAS Score: 95.28% - Faithfulness: 100%, Answer Relevance: 96.67% - Context Recall: 88.89%, Context Precision: 95.56% (cherry picked from commit `a172cf893d`)	2025-12-04 19:11:09 +08:00
yangdx	cc33728c10	Improve Langfuse integration and stream response cleanup handling • Check env vars before enabling Langfuse • Move imports after env check logic • Handle wrapper client aclose() issues • Add debug logs for cleanup failures (cherry picked from commit `10f6e6955f`)	2025-12-04 19:11:08 +08:00
anouarbm	ccdd3c2786	fixed ruff format of csv path (cherry picked from commit `b12b693a81`)	2025-12-04 19:11:08 +08:00
anouarbm	949bfc4228	fix: Apply ruff formatting and rename test_dataset to sample_dataset Lint Fixes (ruff): - Sort imports alphabetically (I001) - Add blank line after import traceback (E302) - Add trailing comma to dict literals (COM812) - Reformat writer.writerow for readability (E501) Rename test_dataset.json → sample_dataset.json: - Avoids .gitignore pattern conflict (test_* is ignored) - More descriptive name - it's a sample/template, not actual test data - Updated all references in eval_rag_quality.py and README.md Resolves lint-and-format CI check failure. Addresses reviewer feedback about test dataset naming. (cherry picked from commit `5cdb4b0ef2`)	2025-12-04 19:11:08 +08:00
anouarbm	a934becfcc	feat: add optional Langfuse observability integration This contribution adds optional Langfuse support for LLM observability and tracing. Langfuse provides a drop-in replacement for the OpenAI client that automatically tracks all LLM interactions without requiring code changes. Features: - Optional Langfuse integration with graceful fallback - Automatic LLM request/response tracing - Token usage tracking - Latency metrics - Error tracking - Zero code changes required for existing functionality Implementation: - Modified lightrag/llm/openai.py to conditionally use Langfuse's AsyncOpenAI - Falls back to standard OpenAI client if Langfuse is not installed - Logs observability status on import Configuration: To enable Langfuse tracing, install the observability extras and set environment variables: ```bash pip install lightrag-hku[observability] export LANGFUSE_PUBLIC_KEY="your_public_key" export LANGFUSE_SECRET_KEY="your_secret_key" export LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted instance ``` If Langfuse is not installed or environment variables are not set, LightRAG will use the standard OpenAI client without any functionality changes. Changes: - Modified lightrag/llm/openai.py (added optional Langfuse import) - Updated pyproject.toml with optional 'observability' dependencies Dependencies (optional): - langfuse>=3.8.1 (cherry picked from commit `626b42bc40`)	2025-12-04 19:11:08 +08:00
xiaojunxiang	355aa2593c	fix(docs): correct typo "acivate" → "activate" (cherry picked from commit `9e5004e24f`)	2025-12-04 19:11:08 +08:00
Raphaël MANSUY	ed73def994	fix: sync core modules with upstream for compatibility	2025-12-04 19:10:46 +08:00
yangdx	7ce3680ca5	Add retry decorators to Neo4j read operations for resilience (cherry picked from commit `7aaa51cda9`)	2025-12-04 19:09:08 +08:00
yangdx	00d51f9dba	Fix dimension type comparison in Milvus vector field validation • Convert dimensions to int for comparison • Handle string vs int type mismatches (cherry picked from commit `0fa9a2eee3`)	2025-12-04 19:09:08 +08:00
yangdx	0594a5a049	Update pymilvus dependency from 2.5.2 to >=2.6.2 (cherry picked from commit `baab992431`)	2025-12-04 19:09:07 +08:00
yangdx	de011c99a4	Add CASCADE to AGE extension creation in PostgreSQL implementation - Add CASCADE option to CREATE EXTENSION - Ensure dependencies are installed - Fix potential AGE setup issues (cherry picked from commit `d6019c82af`)	2025-12-04 19:09:07 +08:00
yangdx	bd93f13012	Refactor: Extract retry decorator to reduce code duplication in Neo4J storage • Define READ_RETRY_EXCEPTIONS constant • Create reusable READ_RETRY decorator • Replace 11 duplicate retry decorators • Improve code maintainability • Add missing retry to edge_degrees_batch (cherry picked from commit `8c4d7a00ad`)	2025-12-04 19:09:07 +08:00
copilot-swe-agent[bot]	b28a701532	Improve edge case handling for max_tokens=1 Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com> (cherry picked from commit `8835fc244a`)	2025-12-04 19:09:07 +08:00
wmsnp	ae5cd9262b	fix: add logger to configure_vchordrq() and format code (cherry picked from commit `f4bf5d279c`)	2025-12-04 19:09:06 +08:00
wmsnp	3954bb6579	feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic (cherry picked from commit `d07023c962`)	2025-12-04 19:09:06 +08:00
yangdx	1cbe0ba885	Reduce log level and improve workspace mismatch message clarity • Change warning to info level • Simplify workspace mismatch wording (cherry picked from commit `6cef8df159`)	2025-12-04 19:09:06 +08:00
yangdx	0ac858d3e2	fix(postgres): allow vchordrq.epsilon config when probes is empty Previously, configure_vchordrq would fail silently when probes was empty (the default), preventing epsilon from being configured. Now each parameter is handled independently with conditional execution, and configuration errors fail-fast instead of being swallowed. This fixes the documented epsilon setting being impossible to use in the default configuration. (cherry picked from commit `3096f844fb`)	2025-12-04 19:09:06 +08:00
yangdx	5bd1320a1d	Refactor storage classes to use namespace instead of final_namespace (cherry picked from commit `fd486bc922`)	2025-12-04 19:09:06 +08:00
yangdx	ed46d375fb	Auto-initialize pipeline status in LightRAG.initialize_storages() • Remove manual initialize_pipeline_status calls • Auto-init in initialize_storages method • Update error messages for clarity • Warn on workspace conflicts (cherry picked from commit `e22ac52ebc`)	2025-12-04 19:09:05 +08:00
yangdx	961c87a6e5	Standardize empty workspace handling from "_" to "" across storage * Unify empty workspace behavior by changing workspace from "_" to "" * Fixed incorrect empty workspace detection in get_all_update_flags_status() (cherry picked from commit `d54d0d55d9`)	2025-12-04 19:09:05 +08:00
yangdx	6b0c0ef815	Refactor namespace lock to support reusable async context manager • Add NamespaceLock class wrapper • Fix lock re-entrance issues • Enable concurrent lock usage • Fresh context per async with block • Update get_namespace_lock API (cherry picked from commit `7deb9a64b9`)	2025-12-04 19:09:05 +08:00
yangdx	708f80f43d	Add _default_workspace to shared storage finalization - Add _default_workspace to global vars - Set _default_workspace to None on cleanup - Ensure complete resource cleanup - Fix missing workspace finalization (cherry picked from commit `6d6716e9f8`)	2025-12-04 19:09:05 +08:00
yangdx	67007ed9a6	Improve LightRAG initialization checker tool with better usage docs • Add workspace param to get_namespace_data • Update docstring with proper usage example • Simplify demo to show correct workflow • Remove confusing before/after comparison • Clarify tool should run after init (cherry picked from commit `393f880311`)	2025-12-04 19:09:05 +08:00
yangdx	dcf88a8273	Refactor exception handling in MemgraphStorage label methods (cherry picked from commit `4401f86f07`)	2025-12-04 19:09:04 +08:00
yangdx	ed79218550	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage (cherry picked from commit `777c987371`)	2025-12-04 19:09:04 +08:00
yangdx	7632805cd0	Add concurrency warning for JsonKVStorage in cleanup tool (cherry picked from commit `913fa1e415`)	2025-12-04 19:09:04 +08:00
yangdx	db508954d1	Add uv package manager support to installation docs (cherry picked from commit `7bc6ccea19`)	2025-12-04 19:09:04 +08:00
yangdx	1daf35a77d	Refactor storage selection UI with dynamic numbering and inline prompts • Remove standalone get_user_choice method • Add dynamic sequential numbering • Inline choice validation logic • Remove redundant storage type prints • Improve excluded storage handling (cherry picked from commit `e95b02fb55`)	2025-12-04 19:09:03 +08:00
yangdx	fa5510e6f6	Fix deadlock in JSON cache migration and prevent same storage selection - Snapshot JSON data before yielding batches - Release lock during batch processing - Exclude source type from target selection - Add detailed docstring for lock behavior - Filter available storage types properly (cherry picked from commit `5be04263b2`)	2025-12-04 19:09:03 +08:00
yangdx	5a5e583b9c	Improve storage config validation and add config.ini fallback support • Add MongoDB env requirements • Support config.ini fallback • Warn on missing env vars • Check available storage count • Show config source info (cherry picked from commit `1a91bcdb5f`)	2025-12-04 19:09:03 +08:00
yangdx	7896c42fba	Restructure semaphore control to manage entire evaluation pipeline • Move rag_semaphore to wrap full function • Increase RAG concurrency to 2x eval limit • Prevent memory buildup from slow evals • Keep eval_semaphore for RAGAS control (cherry picked from commit `e5abe9dd3d`)	2025-12-04 19:09:02 +08:00
yangdx	c459caed26	Implement two-stage pipeline for RAG evaluation with separate semaphores • Split RAG gen and eval stages • Add rag_semaphore for stage 1 • Add eval_semaphore for stage 2 • Improve concurrency control • Update connection pool limits (cherry picked from commit `83715a3ac1`)	2025-12-04 19:09:02 +08:00
ben moussa anouar	dd425e5513	Update lightrag/evaluation/eval_rag_quality.py for launguage Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit `98f0464a31`)	2025-12-04 19:09:02 +08:00
yangdx	407a2c2ecd	Remove redundant shutdown message from gunicorn (cherry picked from commit `6d4a55100e`)	2025-12-04 19:09:02 +08:00
yangdx	df2c24264f	Improve entity merge logging by removing redundant message and fixing typo (cherry picked from commit `9a8742da59`)	2025-12-04 19:09:02 +08:00
yangdx	8c7b0017df	Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage (cherry picked from commit `0692175c7b`)	2025-12-04 19:09:01 +08:00
Anush008	e86aa091f4	refactor: Qdrant Multi-tenancy (Include staged) Signed-off-by: Anush008 <anushshetty90@gmail.com> (cherry picked from commit `8584980e3a`)	2025-12-04 19:09:01 +08:00
yangdx	a42222d7f9	Resolve lock leakage issue during user cancellation handling • Change default log level to INFO • Force enable error logging output • Add lock cleanup rollback protection • Handle LLM cache persistence errors • Fix async task exception handling (cherry picked from commit `a9ec15e669`)	2025-12-04 19:09:01 +08:00
yangdx	8b6fdef965	Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity • Replace Cypher with native SQL queries • Fix O(N²) to O(E) performance issue • Add error handling for parse failures • Use direct table access pattern • Eliminate Cartesian product joins (cherry picked from commit `a97e5dad4c`)	2025-12-04 19:09:01 +08:00
yangdx	e4be3549c3	Improve entity identifier truncation warning message format (cherry picked from commit `00aa5e53a7`)	2025-12-04 19:09:00 +08:00
Yasiru Rangana	8a72135a32	Optimize PostgreSQL initialization performance - Batch index existence checks into single query (16+ queries -> 1 query) - Batch timestamp column checks into single query (8 queries -> 1 query) - Batch field length checks into single query (5 queries -> 1 query) Performance improvement: ~70-80% faster initialization (35s -> 5-10s) Key optimizations: 1. check_tables(): Use ANY($1) to check all indexes at once 2. _migrate_timestamp_columns(): Batch all column type checks 3. _migrate_field_lengths(): Batch all field definition checks All changes are backward compatible with no schema or API changes. Reduces database round-trips by batching information_schema queries. (cherry picked from commit `2f22336ace`)	2025-12-04 19:09:00 +08:00

1 2 3 4 5 ...

3408 commits