LightRAG

Author	SHA1	Message	Date
BukeLy	4c12301e81	fix: correct parameter passing in delete_entity_relation Why this change is needed: The previous fix in commit `7dc1f83e` incorrectly "fixed" delete_entity_relation by converting the parameter dict to a list. However, PostgreSQLDB.execute() expects a dict[str, Any] parameter, not a list. The execute() method internally converts dict values to tuple (line 1487: tuple(data.values())), so passing a list bypasses the expected interface and causes parameter binding issues. What was wrong: ```python params = {"workspace": self.workspace, "entity_name": entity_name} await self.db.execute(delete_sql, list(params.values())) # WRONG ``` The correct approach (matching delete_entity method): ```python await self.db.execute( delete_sql, {"workspace": self.workspace, "entity_name": entity_name} ) ``` How it solves it: - Pass parameters as a dict directly to db.execute(), matching the method signature - Maintain consistency with delete_entity() which correctly passes a dict - Let db.execute() handle the dict-to-tuple conversion internally as designed Impact: - delete_entity_relation now correctly passes parameters to PostgreSQL - Method interface consistency with other delete operations - Proper parameter binding ensures reliable entity relation deletion Testing: - All 6 PostgreSQL migration tests pass - Verified parameter passing matches delete_entity pattern - Code review identified the issue before production use Related: - Fixes incorrect "fix" from commit `7dc1f83e` - Aligns with PostgreSQLDB.execute() interface (line 1477-1480)	2025-11-19 23:31:09 +08:00
BukeLy	7dc1f83efb	fix: PostgreSQL read methods and delete_entity_relation bugs Why this change is needed: After implementing model isolation, two critical bugs were discovered that would cause data access failures: Bug 1: In delete_entity_relation(), the SQL query uses positional parameters ($1, $2) but the parameter dict was not converted to a list of values before passing to db.execute(). This caused parameter binding failures when trying to delete entity relations. Bug 2: Four read methods (get_by_id, get_by_ids, get_vectors_by_ids, drop) were still using namespace_to_table_name(self.namespace) to get legacy table names instead of self.table_name with model suffix. This meant these methods would query the wrong table (legacy without suffix) while data was being inserted into the new table (with suffix), causing data not found errors. How it solves it: - Bug 1: Convert parameter dict to list using list(params.values()) before passing to db.execute(), matching the pattern used in other methods - Bug 2: Replace all namespace_to_table_name(self.namespace) calls with self.table_name in the four affected methods, ensuring they query the correct model-specific table Impact: - delete_entity_relation now correctly deletes relations by entity name - All read operations now correctly query model-specific tables - Data written with model isolation can now be properly retrieved - Maintains consistency with write operations using self.table_name Testing: - All 6 PostgreSQL migration tests pass (test_postgres_migration.py) - All 6 Qdrant migration tests pass (test_qdrant_migration.py) - Verified parameter binding works correctly - Verified read methods access correct tables	2025-11-19 23:01:01 +08:00
BukeLy	ad68624d02	feat: PostgreSQL model isolation and auto-migration Why this change is needed: PostgreSQL vector storage needs model isolation to prevent dimension conflicts when different workspaces use different embedding models. Without this, the first workspace locks the vector dimension for all subsequent workspaces, causing failures. How it solves it: - Implements dynamic table naming with model suffix: {table}_{model}_{dim}d - Adds setup_table() method mirroring Qdrant's approach for consistency - Implements 4-branch migration logic: both exist -> warn, only new -> use, neither -> create, only legacy -> migrate - Batch migration: 500 records/batch (same as Qdrant) - No automatic rollback to support idempotent re-runs Impact: - PostgreSQL tables now isolated by embedding model and dimension - Automatic data migration from legacy tables on startup - Backward compatible: model_name=None defaults to "unknown" - All SQL operations use dynamic table names Testing: - 6 new tests for PostgreSQL migration (100% pass) - Tests cover: naming, migration trigger, scenarios 1-3 - 3 additional scenario tests added for Qdrant completeness Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 22:54:37 +08:00
BukeLy	df5aacb545	feat: Qdrant model isolation and auto-migration Why this change is needed: To implement vector storage model isolation for Qdrant, allowing different workspaces to use different embedding models without conflict, and automatically migrating existing data. How it solves it: - Modified QdrantVectorDBStorage to use model-specific collection suffixes - Implemented automated migration logic from legacy collections to new schema - Fixed Shared-Data lock re-entrancy issue in multiprocess mode - Added comprehensive tests for collection naming and migration triggers Impact: - Existing users will have data automatically migrated on next startup - New workspaces will use isolated collections based on embedding model - Fixes potential lock-related bugs in shared storage Testing: - Added tests/test_qdrant_migration.py passing - Verified migration logic covers all 4 states (New/Legacy existence combinations)	2025-11-19 18:47:38 +08:00
BukeLy	13f2440bbf	feat: enhance BaseVectorStorage for model isolation Why this change is needed: To enforce consistent naming and migration strategy across all vector storages. How it solves it: - Added _generate_collection_suffix() helper - Added _get_legacy_collection_name() and _get_new_collection_name() interfaces Impact: Prepares storage implementations for multi-model support. Testing: Added tests/test_base_storage_integrity.py passing.	2025-11-19 02:15:22 +08:00
BukeLy	5c10d3d58e	feat: enhance EmbeddingFunc with model_name support Why this change is needed: To support vector storage model isolation, we need to track which model is used for embeddings and generate unique identifiers for collections/tables. How it solves it: - Added model_name field to EmbeddingFunc - Added get_model_identifier() method to generate sanitized suffix - Added unit tests to verify behavior Impact: Enables subsequent changes in storage backends to isolate data by model. Testing: Added tests/test_embedding_func.py passing.	2025-11-19 02:11:39 +08:00
yangdx	d16c7840ab	Bump API version to 0256	2025-11-18 23:15:31 +08:00
yangdx	e77340d4a1	Adjust chunking parameters to match the default environment variable settings	2025-11-18 23:14:50 +08:00
yangdx	1bfa1f81cb	Merge branch 'main' into fix_chunk_comment	2025-11-18 22:38:50 +08:00
yangdx	9c10c87554	Fix linting	2025-11-18 22:38:43 +08:00
yangdx	dbae327a17	Merge branch 'main' into dev-postgres-vchordrq	2025-11-18 22:13:27 +08:00
yangdx	3096f844fb	fix(postgres): allow vchordrq.epsilon config when probes is empty Previously, configure_vchordrq would fail silently when probes was empty (the default), preventing epsilon from being configured. Now each parameter is handled independently with conditional execution, and configuration errors fail-fast instead of being swallowed. This fixes the documented epsilon setting being impossible to use in the default configuration.	2025-11-18 21:58:36 +08:00
EightyOliveira	dacca334e0	refactor(chunking): rename params and improve docstring for chunking_by_token_size	2025-11-18 15:46:28 +08:00
yangdx	702cfd2981	Fix document deletion concurrency control and validation logic • Clarify job naming for single vs batch deletion • Update job name validation in busy pipeline check	2025-11-18 13:59:24 +08:00
yangdx	4048fc4b89	Fix: auto-acquire pipeline when idle in document deletion • Track if we acquired the pipeline lock • Auto-acquire pipeline when idle • Only release if we acquired it • Prevent concurrent deletion conflicts • Improve deletion job validation	2025-11-18 13:25:13 +08:00
yangdx	1745b30a5f	Fix missing workspace parameter in update flags status call	2025-11-18 12:55:48 +08:00
yangdx	f8dd2e0724	Fix namespace parsing when workspace contains colons • Use rsplit instead of split • Handle colons in workspace names	2025-11-18 12:23:05 +08:00
wmsnp	d07023c962	feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic	2025-11-18 11:45:16 +08:00
yangdx	6cef8df159	Reduce log level and improve workspace mismatch message clarity • Change warning to info level • Simplify workspace mismatch wording	2025-11-18 08:25:21 +08:00
yangdx	ddc76f0c80	Merge branch 'main' into workspace-isolation	2025-11-17 17:08:07 +08:00
yangdx	9262f66d13	Bump API version to 0255	2025-11-17 17:07:18 +08:00
yangdx	393f880311	Improve LightRAG initialization checker tool with better usage docs • Add workspace param to get_namespace_data • Update docstring with proper usage example • Simplify demo to show correct workflow • Remove confusing before/after comparison • Clarify tool should run after init	2025-11-17 15:42:54 +08:00
yangdx	9d7b7981ce	Add pipeline status validation before document deletion	2025-11-17 14:58:10 +08:00
yangdx	98e964dfc4	Fix initialization instructions in check_lightrag_setup function	2025-11-17 14:27:26 +08:00
yangdx	6d6716e9f8	Add _default_workspace to shared storage finalization - Add _default_workspace to global vars - Set _default_workspace to None on cleanup - Ensure complete resource cleanup - Fix missing workspace finalization	2025-11-17 13:46:46 +08:00
yangdx	f1d8f18c80	Merge branch 'main' into workspace-isolation	2025-11-17 13:01:33 +08:00
yangdx	cdd53ee875	Remove manual initialize_pipeline_status() calls across codebase - Auto-init pipeline status in storages - Remove redundant import statements - Simplify initialization pattern - Update docs and examples	2025-11-17 12:54:33 +08:00
yangdx	e22ac52ebc	Auto-initialize pipeline status in LightRAG.initialize_storages() • Remove manual initialize_pipeline_status calls • Auto-init in initialize_storages method • Update error messages for clarity • Warn on workspace conflicts	2025-11-17 12:54:33 +08:00
yangdx	e8383df3b8	Fix NamespaceLock context variable timing to prevent lock bricking * Acquire lock before setting ContextVar * Prevent state corruption on cancellation * Fix permanent lock brick scenario * Store context only after success * Handle acquisition failure properly	2025-11-17 12:54:33 +08:00
yangdx	95e1fb1612	Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py	2025-11-17 12:54:33 +08:00
yangdx	7ed0eac4c9	Fix workspace filtering logic in get_all_update_flags_status • Handle namespaces with/without prefixes • Fix workspace matching logic	2025-11-17 12:54:33 +08:00
yangdx	78689e8837	Fix pipeline status namespace check to handle root case - Add check for bare "pipeline_status" - Handle namespace without prefix	2025-11-17 12:54:33 +08:00
yangdx	d54d0d55d9	Standardize empty workspace handling from "_" to "" across storage * Unify empty workspace behavior by changing workspace from "_" to "" * Fixed incorrect empty workspace detection in get_all_update_flags_status()	2025-11-17 12:54:33 +08:00
yangdx	b6a5a90eaf	Fix NamespaceLock concurrent coroutine safety with ContextVar - Use ContextVar for per-coroutine storage - Prevent state interference between coroutines - Add re-entrance protection check	2025-11-17 12:54:33 +08:00
yangdx	fd486bc922	Refactor storage classes to use namespace instead of final_namespace	2025-11-17 12:54:33 +08:00
yangdx	01814bfc7a	Fix missing function call parentheses in get_all_update_flags_status	2025-11-17 12:54:33 +08:00
yangdx	7deb9a64b9	Refactor namespace lock to support reusable async context manager • Add NamespaceLock class wrapper • Fix lock re-entrance issues • Enable concurrent lock usage • Fresh context per async with block • Update get_namespace_lock API	2025-11-17 12:54:33 +08:00
yangdx	52c812b9a0	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls	2025-11-17 12:54:33 +08:00
yangdx	926960e957	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking	2025-11-17 12:54:33 +08:00
yangdx	ec05d89c2a	Add macOS fork safety check for Gunicorn multi-worker mode • Check OBJC_DISABLE_INITIALIZE_FORK_SAFETY • Prevent NumPy/Accelerate crashes • Show detailed error message • Provide multiple fix options • Exit early if misconfigured	2025-11-17 12:54:33 +08:00
yangdx	e5addf4d94	Improve embedding config priority and add debug logging • Fix embedding_dim priority logic • Add final config logging	2025-11-17 12:54:32 +08:00
yangdx	2fb57e767d	Fix embedding token limit initialization order * Capture max_token_size before decorator * Apply wrapper after capturing attribute * Prevent decorator from stripping dataclass * Ensure token limit is properly set	2025-11-17 12:54:32 +08:00
yangdx	6b2af2b579	Refactor embedding function creation with proper attribute inheritance - Extract max_token_size from providers - Avoid double-wrapping EmbeddingFunc - Improve configuration priority logic - Add comprehensive debug logging - Return complete EmbeddingFunc instance	2025-11-17 12:54:32 +08:00
yangdx	f0254773c6	Convert embedding_token_limit from property to field with __post_init__ • Remove property decorator • Add field with init=False • Set value in __post_init__ method • embedding_token_limit is now in config dictionary	2025-11-17 12:54:32 +08:00
yangdx	14a6c24ed7	Add configurable embedding token limit with validation - Add EMBEDDING_TOKEN_LIMIT env var - Set max_token_size on embedding func - Add token limit property to LightRAG - Validate summary length vs limit - Log warning when limit exceeded	2025-11-17 12:54:32 +08:00
yangdx	f5b48587ed	Improve Bedrock error handling with retry logic and custom exceptions • Add specific exception types • Implement proper retry mechanism • Better error classification • Enhanced logging and validation • Enable embedding retry decorator	2025-11-17 12:54:32 +08:00
yangdx	77221564b0	Add max_token_size parameter to embedding function decorators - Add max_token_size=8192 to all embed funcs - Move siliconcloud to deprecated folder - Import wrap_embedding_func_with_attrs - Update EmbeddingFunc docstring - Fix langfuse import type annotation	2025-11-17 12:54:32 +08:00
yangdx	8283c86bce	Refactor exception handling in MemgraphStorage label methods	2025-11-17 12:54:32 +08:00
yangdx	423e4e927a	Fix null reference errors in graph database error handling - Initialize result vars to None - Add null checks before consume calls - Prevent crashes in except blocks - Apply fix to both Neo4J and Memgraph	2025-11-17 12:54:32 +08:00
yangdx	2f2f35b883	Add macOS compatibility check for DOCLING with multi-worker Gunicorn	2025-11-17 12:54:32 +08:00

1 2 3 4 5 ...

3689 commits