LightRAG

Author	SHA1	Message	Date
BukeLy	7dc1f83efb	fix: PostgreSQL read methods and delete_entity_relation bugs Why this change is needed: After implementing model isolation, two critical bugs were discovered that would cause data access failures: Bug 1: In delete_entity_relation(), the SQL query uses positional parameters ($1, $2) but the parameter dict was not converted to a list of values before passing to db.execute(). This caused parameter binding failures when trying to delete entity relations. Bug 2: Four read methods (get_by_id, get_by_ids, get_vectors_by_ids, drop) were still using namespace_to_table_name(self.namespace) to get legacy table names instead of self.table_name with model suffix. This meant these methods would query the wrong table (legacy without suffix) while data was being inserted into the new table (with suffix), causing data not found errors. How it solves it: - Bug 1: Convert parameter dict to list using list(params.values()) before passing to db.execute(), matching the pattern used in other methods - Bug 2: Replace all namespace_to_table_name(self.namespace) calls with self.table_name in the four affected methods, ensuring they query the correct model-specific table Impact: - delete_entity_relation now correctly deletes relations by entity name - All read operations now correctly query model-specific tables - Data written with model isolation can now be properly retrieved - Maintains consistency with write operations using self.table_name Testing: - All 6 PostgreSQL migration tests pass (test_postgres_migration.py) - All 6 Qdrant migration tests pass (test_qdrant_migration.py) - Verified parameter binding works correctly - Verified read methods access correct tables	2025-11-19 23:01:01 +08:00
BukeLy	ad68624d02	feat: PostgreSQL model isolation and auto-migration Why this change is needed: PostgreSQL vector storage needs model isolation to prevent dimension conflicts when different workspaces use different embedding models. Without this, the first workspace locks the vector dimension for all subsequent workspaces, causing failures. How it solves it: - Implements dynamic table naming with model suffix: {table}_{model}_{dim}d - Adds setup_table() method mirroring Qdrant's approach for consistency - Implements 4-branch migration logic: both exist -> warn, only new -> use, neither -> create, only legacy -> migrate - Batch migration: 500 records/batch (same as Qdrant) - No automatic rollback to support idempotent re-runs Impact: - PostgreSQL tables now isolated by embedding model and dimension - Automatic data migration from legacy tables on startup - Backward compatible: model_name=None defaults to "unknown" - All SQL operations use dynamic table names Testing: - 6 new tests for PostgreSQL migration (100% pass) - Tests cover: naming, migration trigger, scenarios 1-3 - 3 additional scenario tests added for Qdrant completeness Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 22:54:37 +08:00
BukeLy	df5aacb545	feat: Qdrant model isolation and auto-migration Why this change is needed: To implement vector storage model isolation for Qdrant, allowing different workspaces to use different embedding models without conflict, and automatically migrating existing data. How it solves it: - Modified QdrantVectorDBStorage to use model-specific collection suffixes - Implemented automated migration logic from legacy collections to new schema - Fixed Shared-Data lock re-entrancy issue in multiprocess mode - Added comprehensive tests for collection naming and migration triggers Impact: - Existing users will have data automatically migrated on next startup - New workspaces will use isolated collections based on embedding model - Fixes potential lock-related bugs in shared storage Testing: - Added tests/test_qdrant_migration.py passing - Verified migration logic covers all 4 states (New/Legacy existence combinations)	2025-11-19 18:47:38 +08:00
yangdx	dbae327a17	Merge branch 'main' into dev-postgres-vchordrq	2025-11-18 22:13:27 +08:00
yangdx	3096f844fb	fix(postgres): allow vchordrq.epsilon config when probes is empty Previously, configure_vchordrq would fail silently when probes was empty (the default), preventing epsilon from being configured. Now each parameter is handled independently with conditional execution, and configuration errors fail-fast instead of being swallowed. This fixes the documented epsilon setting being impossible to use in the default configuration.	2025-11-18 21:58:36 +08:00
yangdx	f8dd2e0724	Fix namespace parsing when workspace contains colons • Use rsplit instead of split • Handle colons in workspace names	2025-11-18 12:23:05 +08:00
wmsnp	d07023c962	feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic	2025-11-18 11:45:16 +08:00
yangdx	6d6716e9f8	Add _default_workspace to shared storage finalization - Add _default_workspace to global vars - Set _default_workspace to None on cleanup - Ensure complete resource cleanup - Fix missing workspace finalization	2025-11-17 13:46:46 +08:00
yangdx	e8383df3b8	Fix NamespaceLock context variable timing to prevent lock bricking * Acquire lock before setting ContextVar * Prevent state corruption on cancellation * Fix permanent lock brick scenario * Store context only after success * Handle acquisition failure properly	2025-11-17 12:54:33 +08:00
yangdx	95e1fb1612	Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py	2025-11-17 12:54:33 +08:00
yangdx	7ed0eac4c9	Fix workspace filtering logic in get_all_update_flags_status • Handle namespaces with/without prefixes • Fix workspace matching logic	2025-11-17 12:54:33 +08:00
yangdx	78689e8837	Fix pipeline status namespace check to handle root case - Add check for bare "pipeline_status" - Handle namespace without prefix	2025-11-17 12:54:33 +08:00
yangdx	d54d0d55d9	Standardize empty workspace handling from "_" to "" across storage * Unify empty workspace behavior by changing workspace from "_" to "" * Fixed incorrect empty workspace detection in get_all_update_flags_status()	2025-11-17 12:54:33 +08:00
yangdx	b6a5a90eaf	Fix NamespaceLock concurrent coroutine safety with ContextVar - Use ContextVar for per-coroutine storage - Prevent state interference between coroutines - Add re-entrance protection check	2025-11-17 12:54:33 +08:00
yangdx	fd486bc922	Refactor storage classes to use namespace instead of final_namespace	2025-11-17 12:54:33 +08:00
yangdx	01814bfc7a	Fix missing function call parentheses in get_all_update_flags_status	2025-11-17 12:54:33 +08:00
yangdx	7deb9a64b9	Refactor namespace lock to support reusable async context manager • Add NamespaceLock class wrapper • Fix lock re-entrance issues • Enable concurrent lock usage • Fresh context per async with block • Update get_namespace_lock API	2025-11-17 12:54:33 +08:00
yangdx	52c812b9a0	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls	2025-11-17 12:54:33 +08:00
yangdx	926960e957	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking	2025-11-17 12:54:33 +08:00
yangdx	8283c86bce	Refactor exception handling in MemgraphStorage label methods	2025-11-17 12:54:32 +08:00
yangdx	423e4e927a	Fix null reference errors in graph database error handling - Initialize result vars to None - Add null checks before consume calls - Prevent crashes in except blocks - Apply fix to both Neo4J and Memgraph	2025-11-17 12:54:32 +08:00
yangdx	a08bc72635	Fix empty dict handling after JSON sanitization • Replace truthy checks with `is not None` • Handle empty dict edge case properly • Prevent data reload failures • Add comprehensive test coverage • Fix JsonKVStorage and DocStatusStorage	2025-11-17 12:54:32 +08:00
yangdx	cca0800ed4	Fix migration to reload sanitized data and prevent memory corruption • Reload cleaned data after sanitization • Update shared memory with clean data • Add specific surrogate char tests • Test migration sanitization flow • Prevent dirty data in memory	2025-11-17 12:54:32 +08:00
yangdx	f289cf6225	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage	2025-11-17 12:54:32 +08:00
BukeLy	18a4870229	fix: Add default workspace support for backward compatibility Fixes two compatibility issues in workspace isolation: 1. Problem: lightrag_server.py calls initialize_pipeline_status() without workspace parameter, causing pipeline to initialize in global namespace instead of rag's workspace. Solution: Add set_default_workspace() mechanism in shared_storage. LightRAG.initialize_storages() now sets default workspace, which initialize_pipeline_status() uses when called without parameters. 2. Problem: /health endpoint hardcoded to use "pipeline_status", cannot return workspace-specific status or support frontend workspace selection. Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now extracts workspace from header or falls back to server default, returning correct workspace-specific pipeline status. Changes: - lightrag/kg/shared_storage.py: Add set/get_default_workspace() - lightrag/lightrag.py: Call set_default_workspace() in initialize_storages() - lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper, update /health endpoint to support LIGHTRAG-WORKSPACE header Testing: - Backward compatibility: Old code works without modification - Multi-instance safety: Explicit workspace passing preserved - /health endpoint: Supports both default and header-specified workspaces Related: #2353	2025-11-17 12:54:20 +08:00
BukeLy	eb52ec94d7	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed	2025-11-17 12:53:44 +08:00
yangdx	4401f86f07	Refactor exception handling in MemgraphStorage label methods	2025-11-14 11:01:26 +08:00
yangdx	1ccef2b932	Fix null reference errors in graph database error handling - Initialize result vars to None - Add null checks before consume calls - Prevent crashes in except blocks - Apply fix to both Neo4J and Memgraph	2025-11-14 10:39:04 +08:00
yangdx	70cc2419f2	Fix empty dict handling after JSON sanitization • Replace truthy checks with `is not None` • Handle empty dict edge case properly • Prevent data reload failures • Add comprehensive test coverage • Fix JsonKVStorage and DocStatusStorage	2025-11-12 16:40:57 +08:00
yangdx	dcf1d28681	Fix migration to reload sanitized data and prevent memory corruption • Reload cleaned data after sanitization • Update shared memory with clean data • Add specific surrogate char tests • Test migration sanitization flow • Prevent dirty data in memory	2025-11-12 16:16:28 +08:00
yangdx	777c987371	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage	2025-11-12 13:48:56 +08:00
yangdx	1a91bcdb5f	Improve storage config validation and add config.ini fallback support • Add MongoDB env requirements • Support config.ini fallback • Warn on missing env vars • Check available storage count • Show config source info	2025-11-08 22:48:49 +08:00
yangdx	3276b7a49d	Fix linting	2025-11-06 20:48:51 +08:00
yangdx	155f59759b	Fix node ID normalization and improve batch operation consistency • Remove premature ID normalization • Add lookup mapping for node resolution • Filter results by requested nodes only • Improve error logging with workspace	2025-11-06 20:34:53 +08:00
yangdx	807d2461d3	Remove unused chunk-based node/edge retrieval methods	2025-11-06 18:17:10 +08:00
yangdx	5f4a280458	Add Qdrant legacy collection migration with workspace support - Add QdrantMigrationError exception - Implement automatic data migration - Support workspace-based partitioning - Add migration verification logic - Update collection naming scheme	2025-10-30 19:16:33 +08:00
yangdx	f610fdaf9b	Merge branch 'main' into Anush008/main	2025-10-30 11:07:39 +08:00
yangdx	d5bcd14c6f	Refactor service deployment to use direct process execution - Remove bash wrapper script - Update systemd service configuration - Improve process management for gunicorn - Simplify shared storage cleanup logic - Update documentation for deployment	2025-10-29 18:55:47 +08:00
yangdx	6489aaa7f0	Remove worker_exit hook and improve cleanup logging • Remove unreliable worker_exit function • Add debug logs for cleanup modes • Move DEBUG_LOCKS to top of file	2025-10-29 15:15:13 +08:00
yangdx	72b29659c9	Fix worker process cleanup to prevent shared resource conflicts • Add worker_exit hook in gunicorn config • Add shutdown_manager parameter in finalize_share_data of share_storage • Prevent Manager shutdown in workers • Remove custom signal handlers	2025-10-29 13:33:21 +08:00
yangdx	0692175c7b	Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage	2025-10-29 09:49:59 +08:00
yangdx	411e92e6b9	Fix vector deletion logging to show actual deleted count	2025-10-27 14:22:16 +08:00
Anush008	8584980e3a	refactor: Qdrant Multi-tenancy (Include staged) Signed-off-by: Anush008 <anushshetty90@gmail.com>	2025-10-26 09:58:24 +05:30
yangdx	a97e5dad4c	Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity • Replace Cypher with native SQL queries • Fix O(N²) to O(E) performance issue • Add error handling for parse failures • Use direct table access pattern • Eliminate Cartesian product joins	2025-10-25 14:37:18 +08:00
yangdx	083b163c1f	Improve lock logging with consistent messaging and debug levels	2025-10-25 11:04:21 +08:00
yangdx	a9ec15e669	Resolve lock leakage issue during user cancellation handling • Change default log level to INFO • Force enable error logging output • Add lock cleanup rollback protection • Handle LLM cache persistence errors • Fix async task exception handling	2025-10-25 03:06:45 +08:00
yangdx	0fa9a2eee3	Fix dimension type comparison in Milvus vector field validation • Convert dimensions to int for comparison • Handle string vs int type mismatches	2025-10-22 23:37:49 +08:00
Daniel.y	907204714b	Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization Optimize PostgreSQL initialization performance	2025-10-21 22:17:46 +08:00
yangdx	e5e16b7bd1	Fix Redis data migration error • Use proper Redis connection context • Fix namespace pattern for key scanning • Propagate storage check exceptions • Remove defensive error swallowing	2025-10-21 16:27:04 +08:00
Yasiru Rangana	2f22336ace	Optimize PostgreSQL initialization performance - Batch index existence checks into single query (16+ queries -> 1 query) - Batch timestamp column checks into single query (8 queries -> 1 query) - Batch field length checks into single query (5 queries -> 1 query) Performance improvement: ~70-80% faster initialization (35s -> 5-10s) Key optimizations: 1. check_tables(): Use ANY($1) to check all indexes at once 2. _migrate_timestamp_columns(): Batch all column type checks 3. _migrate_field_lengths(): Batch all field definition checks All changes are backward compatible with no schema or API changes. Reduces database round-trips by batching information_schema queries.	2025-10-21 01:09:48 +11:00

1 2 3 4 5 ...

900 commits