LightRAG

Author	SHA1	Message	Date
BukeLy	6bef40766d	style: fix lint errors (trailing whitespace and formatting)	2025-11-20 01:41:23 +08:00
BukeLy	5d9547344a	fix: correct Qdrant legacy_namespace for data migration Why this change is needed: The legacy_namespace logic was incorrectly including workspace in the collection name, causing migration to fail in E2E tests. When workspace was set (e.g., to a temp directory path), legacy_namespace became "/tmp/xxx_chunks" instead of "lightrag_vdb_chunks", so the migration logic couldn't find the legacy collection. How it solves it: Changed legacy_namespace to always use the old naming scheme without workspace prefix: "lightrag_vdb_{namespace}". This matches the actual collection names from pre-migration code and aligns with PostgreSQL's approach where legacy_table_name = base_table (without workspace). Impact: - Qdrant legacy data migration now works correctly in E2E tests - All unit tests pass (6/6 for both Qdrant and PostgreSQL) - E2E test_legacy_migration_qdrant should now pass Testing: - Unit tests: pytest tests/test_qdrant_migration.py -v (6/6 passed) - Unit tests: pytest tests/test_postgres_migration.py -v (6/6 passed) - Updated test_qdrant_collection_naming to verify new legacy_namespace	2025-11-20 01:08:15 +08:00
BukeLy	ad68624d02	feat: PostgreSQL model isolation and auto-migration Why this change is needed: PostgreSQL vector storage needs model isolation to prevent dimension conflicts when different workspaces use different embedding models. Without this, the first workspace locks the vector dimension for all subsequent workspaces, causing failures. How it solves it: - Implements dynamic table naming with model suffix: {table}_{model}_{dim}d - Adds setup_table() method mirroring Qdrant's approach for consistency - Implements 4-branch migration logic: both exist -> warn, only new -> use, neither -> create, only legacy -> migrate - Batch migration: 500 records/batch (same as Qdrant) - No automatic rollback to support idempotent re-runs Impact: - PostgreSQL tables now isolated by embedding model and dimension - Automatic data migration from legacy tables on startup - Backward compatible: model_name=None defaults to "unknown" - All SQL operations use dynamic table names Testing: - 6 new tests for PostgreSQL migration (100% pass) - Tests cover: naming, migration trigger, scenarios 1-3 - 3 additional scenario tests added for Qdrant completeness Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 22:54:37 +08:00
BukeLy	df5aacb545	feat: Qdrant model isolation and auto-migration Why this change is needed: To implement vector storage model isolation for Qdrant, allowing different workspaces to use different embedding models without conflict, and automatically migrating existing data. How it solves it: - Modified QdrantVectorDBStorage to use model-specific collection suffixes - Implemented automated migration logic from legacy collections to new schema - Fixed Shared-Data lock re-entrancy issue in multiprocess mode - Added comprehensive tests for collection naming and migration triggers Impact: - Existing users will have data automatically migrated on next startup - New workspaces will use isolated collections based on embedding model - Fixes potential lock-related bugs in shared storage Testing: - Added tests/test_qdrant_migration.py passing - Verified migration logic covers all 4 states (New/Legacy existence combinations)	2025-11-19 18:47:38 +08:00

4 commits