LightRAG/lightrag/kg
BukeLy f69cf9bcd6 fix: prevent vector dimension mismatch crashes and data loss on no-suffix restarts
Why this change is needed:
Two critical issues were identified in Codex review of PR #2391:
1. Migration fails when legacy collections/tables use different embedding dimensions
   (e.g., upgrading from 1536d to 3072d models causes initialization failures)
2. When model_suffix is empty (no model_name provided), table_name equals legacy_table_name,
   causing Case 1 logic to delete the only table/collection on second startup

How it solves it:
- Added dimension compatibility checks before migration in both Qdrant and PostgreSQL
- PostgreSQL uses two-method detection: pg_attribute metadata query + vector sampling fallback
- When dimensions mismatch, skip migration and create new empty table/collection, preserving legacy data
- Added safety check to detect when new and legacy names are identical, preventing deletion
- Both backends log clear warnings about dimension mismatches and skipped migrations

Impact:
- lightrag/kg/qdrant_impl.py: Added dimension check (lines 254-297) and no-suffix safety (lines 163-169)
- lightrag/kg/postgres_impl.py: Added dimension check with fallback (lines 2347-2410) and no-suffix safety (lines 2281-2287)
- tests/test_no_model_suffix_safety.py: New test file with 4 test cases covering edge scenarios
- Backward compatible: All existing scenarios continue working unchanged

Testing:
- All 20 tests pass (16 existing migration tests + 4 new safety tests)
- E2E tests enhanced with explicit verification points for dimension mismatch scenarios
- Verified graceful degradation when dimension detection fails
- Code style verified with ruff and pre-commit hooks
2025-11-23 15:44:07 +08:00
..
deprecated Preserve ordering in get_by_ids methods across all storage implementations 2025-10-11 12:37:59 +08:00
__init__.py Improve storage config validation and add config.ini fallback support 2025-11-08 22:48:49 +08:00
faiss_impl.py Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
json_doc_status_impl.py Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
json_kv_impl.py Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
memgraph_impl.py Refactor workspace handling to use default workspace and namespace locks 2025-11-17 12:54:33 +08:00
milvus_impl.py Standardize empty workspace handling from "_" to "" across storage 2025-11-17 12:54:33 +08:00
mongo_impl.py Standardize empty workspace handling from "_" to "" across storage 2025-11-17 12:54:33 +08:00
nano_vector_db_impl.py Standardize empty workspace handling from "_" to "" across storage 2025-11-17 12:54:33 +08:00
neo4j_impl.py Refactor workspace handling to use default workspace and namespace locks 2025-11-17 12:54:33 +08:00
networkx_impl.py Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
postgres_impl.py fix: prevent vector dimension mismatch crashes and data loss on no-suffix restarts 2025-11-23 15:44:07 +08:00
qdrant_impl.py fix: prevent vector dimension mismatch crashes and data loss on no-suffix restarts 2025-11-23 15:44:07 +08:00
redis_impl.py Standardize empty workspace handling from "_" to "" across storage 2025-11-17 12:54:33 +08:00
shared_storage.py style: fix lint errors (trailing whitespace and formatting) 2025-11-20 01:41:23 +08:00