Why this change is needed:
The E2E test test_backward_compat_old_workspace_naming_qdrant was failing
because _find_legacy_collection() searched for generic "lightrag_vdb_{namespace}"
before workspace-specific "{workspace}_{namespace}" collections. When both
existed, it would always find the generic one first (which might be empty),
ignoring the workspace collection that actually contained the data to migrate.
How it solves it:
Reordered the candidates list in _find_legacy_collection() to prioritize
more specific naming patterns over generic ones:
1. {workspace}_{namespace} (most specific, old workspace format)
2. lightrag_vdb_{namespace} (generic legacy format)
3. {namespace} (most generic, oldest format)
This ensures the migration finds the correct source collection with actual data.
Impact:
- Fixes test_backward_compat_old_workspace_naming_qdrant which creates
a "prod_chunks" collection with 10 points
- Migration will now correctly find and migrate from workspace-specific
legacy collections before falling back to generic collections
- Maintains backward compatibility with all legacy naming patterns
Testing:
Run: pytest tests/test_e2e_multi_instance.py::test_backward_compat_old_workspace_naming_qdrant -v
This change ensures that when the model_suffix is empty, the final_namespace falls back to the legacy_namespace, preventing potential naming issues. A warning is logged to inform users about the missing model suffix and the fallback to the legacy naming scheme.
Additionally, comprehensive tests have been added to verify the behavior of both PostgreSQL and Qdrant storage when model_suffix is empty, ensuring that the naming conventions are correctly applied and that no trailing underscores are present.
Impact:
- Prevents crashes due to empty model_suffix
- Provides clear feedback to users regarding configuration issues
- Maintains backward compatibility with existing setups
Testing:
All new tests pass, validating the handling of empty model_suffix scenarios.
Why this change is needed:
The legacy_namespace logic was incorrectly including workspace in the
collection name, causing migration to fail in E2E tests. When workspace
was set (e.g., to a temp directory path), legacy_namespace became
"/tmp/xxx_chunks" instead of "lightrag_vdb_chunks", so the migration
logic couldn't find the legacy collection.
How it solves it:
Changed legacy_namespace to always use the old naming scheme without
workspace prefix: "lightrag_vdb_{namespace}". This matches the actual
collection names from pre-migration code and aligns with PostgreSQL's
approach where legacy_table_name = base_table (without workspace).
Impact:
- Qdrant legacy data migration now works correctly in E2E tests
- All unit tests pass (6/6 for both Qdrant and PostgreSQL)
- E2E test_legacy_migration_qdrant should now pass
Testing:
- Unit tests: pytest tests/test_qdrant_migration.py -v (6/6 passed)
- Unit tests: pytest tests/test_postgres_migration.py -v (6/6 passed)
- Updated test_qdrant_collection_naming to verify new legacy_namespace
Why this change is needed:
To implement vector storage model isolation for Qdrant, allowing different workspaces to use different embedding models without conflict, and automatically migrating existing data.
How it solves it:
- Modified QdrantVectorDBStorage to use model-specific collection suffixes
- Implemented automated migration logic from legacy collections to new schema
- Fixed Shared-Data lock re-entrancy issue in multiprocess mode
- Added comprehensive tests for collection naming and migration triggers
Impact:
- Existing users will have data automatically migrated on next startup
- New workspaces will use isolated collections based on embedding model
- Fixes potential lock-related bugs in shared storage
Testing:
- Added tests/test_qdrant_migration.py passing
- Verified migration logic covers all 4 states (New/Legacy existence combinations)
Refactored the LLM cache to a flat Key-Value (KV) structure, replacing the previous nested format. The old structure used the 'mode' as a key and stored specific cache content as JSON nested under it. This change significantly enhances cache recall efficiency.