LightRAG/lightrag/kg
BukeLy 16fff353d9 fix: prevent data loss in PostgreSQL migration and add doc_status table creation
This commit fixes two critical issues in PostgreSQL storage:

BUG 1: Legacy table cleanup causing data loss across workspaces
---------------------------------------------------------------
PROBLEM:
- After migrating workspace_a data from legacy table, the ENTIRE legacy
  table was deleted
- This caused workspace_b's data (still in legacy table) to be lost
- Multi-tenant data isolation was violated

FIX:
- Implement workspace-aware cleanup: only delete migrated workspace's data
- Check if other workspaces still have data before dropping table
- Only drop legacy table when it becomes completely empty
- If other workspace data exists, preserve legacy table with remaining records

Location: postgres_impl.py PGVectorStorage.setup_table() lines 2510-2567

Test verification:
- test_workspace_migration_isolation_e2e_postgres validates this fix

BUG 2: PGDocStatusStorage missing table initialization
-------------------------------------------------------
PROBLEM:
- PGDocStatusStorage.initialize() only set workspace, never created table
- Caused "relation 'lightrag_doc_status' does not exist" errors
- document insertion (ainsert) failed immediately

FIX:
- Add table creation to initialize() method using _pg_create_table()
- Consistent with other storage implementations:
  * MongoDocStatusStorage creates collections
  * JsonDocStatusStorage creates directories
  * PGDocStatusStorage now creates tables ✓

Location: postgres_impl.py PGDocStatusStorage.initialize() lines 2965-2971

Test Results:
- Unit tests: 13/13 passed (test_unified_lock_safety,
  test_workspace_migration_isolation, test_dimension_mismatch)
- E2E tests require PostgreSQL server

Related: PR #2391 (Vector Storage Model Isolation)
2025-11-23 16:43:49 +08:00
..
deprecated Preserve ordering in get_by_ids methods across all storage implementations 2025-10-11 12:37:59 +08:00
__init__.py Improve storage config validation and add config.ini fallback support 2025-11-08 22:48:49 +08:00
faiss_impl.py Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
json_doc_status_impl.py Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
json_kv_impl.py Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
memgraph_impl.py Refactor workspace handling to use default workspace and namespace locks 2025-11-17 12:54:33 +08:00
milvus_impl.py Standardize empty workspace handling from "_" to "" across storage 2025-11-17 12:54:33 +08:00
mongo_impl.py Standardize empty workspace handling from "_" to "" across storage 2025-11-17 12:54:33 +08:00
nano_vector_db_impl.py Standardize empty workspace handling from "_" to "" across storage 2025-11-17 12:54:33 +08:00
neo4j_impl.py Refactor workspace handling to use default workspace and namespace locks 2025-11-17 12:54:33 +08:00
networkx_impl.py Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
postgres_impl.py fix: prevent data loss in PostgreSQL migration and add doc_status table creation 2025-11-23 16:43:49 +08:00
qdrant_impl.py fix: prevent vector dimension mismatch crashes and data loss on no-suffix restarts 2025-11-23 15:44:07 +08:00
redis_impl.py Standardize empty workspace handling from "_" to "" across storage 2025-11-17 12:54:33 +08:00
shared_storage.py fix: prevent double-release in UnifiedLock.__aexit__ error recovery 2025-11-23 16:34:08 +08:00