chore: remove internal analysis document from PR
This commit is contained in:
parent
0508ad7a15
commit
d12c14946b
1 changed files with 0 additions and 376 deletions
|
|
@ -1,376 +0,0 @@
|
|||
# Vector Model Isolation - 迁移场景覆盖分析
|
||||
|
||||
## 执行日期
|
||||
2025-11-20
|
||||
|
||||
## 关键发现
|
||||
|
||||
### ⚠️ 严重问题:Qdrant Legacy 命名不兼容
|
||||
|
||||
#### 问题描述
|
||||
|
||||
**旧版本(main分支)的Qdrant命名**:
|
||||
```python
|
||||
# Get legacy namespace for data migration from old version
|
||||
if effective_workspace:
|
||||
self.legacy_namespace = f"{effective_workspace}_{self.namespace}"
|
||||
else:
|
||||
self.legacy_namespace = self.namespace
|
||||
|
||||
self.final_namespace = f"lightrag_vdb_{self.namespace}"
|
||||
```
|
||||
|
||||
示例:
|
||||
- workspace="my_workspace", namespace="chunks"
|
||||
- legacy_namespace = "my_workspace_chunks"
|
||||
- final_namespace = "lightrag_vdb_chunks"
|
||||
|
||||
**新版本(feature分支)的Qdrant命名**:
|
||||
```python
|
||||
# Legacy collection name (without model suffix, for migration)
|
||||
self.legacy_namespace = f"lightrag_vdb_{self.namespace}"
|
||||
|
||||
# New naming scheme with model isolation
|
||||
self.final_namespace = f"lightrag_vdb_{self.namespace}_{model_suffix}"
|
||||
```
|
||||
|
||||
示例:
|
||||
- workspace="my_workspace", namespace="chunks"
|
||||
- legacy_namespace = "lightrag_vdb_chunks" ❌ 与旧版不匹配!
|
||||
- final_namespace = "lightrag_vdb_chunks_text_embedding_ada_002_1536d"
|
||||
|
||||
#### 影响分析
|
||||
|
||||
1. **从旧版本升级时的迁移失败**:
|
||||
- 旧版本用户的collection名称可能是:`my_workspace_chunks` 或 `chunks`
|
||||
- 新版本尝试从 `lightrag_vdb_chunks` 迁移
|
||||
- 结果:找不到legacy collection,无法自动迁移!
|
||||
|
||||
2. **数据丢失风险**:
|
||||
- 用户升级后可能看不到旧数据
|
||||
- 需要手动迁移数据
|
||||
|
||||
### ✅ PostgreSQL 迁移逻辑正确
|
||||
|
||||
PostgreSQL的迁移逻辑比较清晰:
|
||||
|
||||
**旧版本**:
|
||||
- 表名直接使用 `lightrag_vdb_chunks` 等固定名称
|
||||
|
||||
**新版本**:
|
||||
- legacy_table_name = `lightrag_vdb_chunks`
|
||||
- table_name = `lightrag_vdb_chunks_{model}_{dim}d`
|
||||
|
||||
这个逻辑是正确的,因为旧版PostgreSQL就是使用固定表名。
|
||||
|
||||
---
|
||||
|
||||
## 测试覆盖情况分析
|
||||
|
||||
### 当前E2E测试覆盖的场景
|
||||
|
||||
| 测试名称 | 数据库 | 测试场景 | 覆盖Case |
|
||||
|---------|--------|---------|---------|
|
||||
| `test_legacy_migration_postgres` | PostgreSQL | 从legacy表迁移 | Case 4: Legacy→New |
|
||||
| `test_legacy_migration_qdrant` | Qdrant | 从legacy collection迁移 | Case 4: Legacy→New |
|
||||
| `test_multi_instance_postgres` | PostgreSQL | 多模型共存 | Case 3: 创建新表 |
|
||||
| `test_multi_instance_qdrant` | Qdrant | 多模型共存 | Case 3: 创建新collection |
|
||||
|
||||
### 缺失的测试场景
|
||||
|
||||
#### 未覆盖的Case
|
||||
|
||||
1. ❌ **Case 1: 新旧共存警告**
|
||||
- 场景:legacy和new都存在
|
||||
- 预期:只输出警告,不迁移
|
||||
- 状态:未测试
|
||||
|
||||
2. ❌ **Case 2: 已迁移场景**
|
||||
- 场景:只有new存在,legacy已删除
|
||||
- 预期:检查索引,正常使用
|
||||
- 状态:未测试
|
||||
|
||||
3. ❌ **从真实旧版本升级**
|
||||
- 场景:用户从LightRAG旧版本升级
|
||||
- Qdrant: legacy名称是 `{workspace}_{namespace}` 或 `{namespace}`
|
||||
- 预期:能正确识别并迁移
|
||||
- 状态:**未覆盖,存在兼容性问题!**
|
||||
|
||||
#### 未覆盖的边界情况
|
||||
|
||||
1. ❌ **空数据迁移**
|
||||
- 场景:legacy存在但为空
|
||||
- 预期:跳过迁移,创建新表/collection
|
||||
- 状态:代码有逻辑,但未测试
|
||||
|
||||
2. ❌ **迁移失败回滚**
|
||||
- 场景:迁移过程中断
|
||||
- 预期:抛出异常,数据一致性保证
|
||||
- 状态:未测试
|
||||
|
||||
3. ❌ **Workspace隔离验证**
|
||||
- 场景:同一collection/table内多个workspace
|
||||
- 预期:数据完全隔离
|
||||
- 状态:未明确测试
|
||||
|
||||
4. ❌ **模型切换场景**
|
||||
- 场景:用户切换embedding模型
|
||||
- 预期:创建新表/collection,旧数据保留
|
||||
- 状态:未测试
|
||||
|
||||
---
|
||||
|
||||
## 向后兼容性分析
|
||||
|
||||
### ✅ PostgreSQL - 完全兼容
|
||||
|
||||
- 旧版本表名:`lightrag_vdb_chunks`
|
||||
- 新版本识别:`legacy_table_name = "lightrag_vdb_chunks"`
|
||||
- 结论:**完全兼容**
|
||||
|
||||
### ❌ Qdrant - 不兼容!
|
||||
|
||||
#### 兼容性问题详情
|
||||
|
||||
**场景1:使用workspace的旧版用户**
|
||||
```python
|
||||
# 旧版本 (main)
|
||||
workspace = "prod"
|
||||
legacy_namespace = "prod_chunks" # 旧版生成的名称
|
||||
final_namespace = "lightrag_vdb_chunks"
|
||||
|
||||
# 新版本 (feature)
|
||||
legacy_namespace = "lightrag_vdb_chunks" # 新版期望的legacy名称
|
||||
final_namespace = "lightrag_vdb_chunks_text_embedding_ada_002_1536d"
|
||||
|
||||
# 结果:找不到 "prod_chunks" collection,迁移失败!
|
||||
```
|
||||
|
||||
**场景2:不使用workspace的旧版用户**
|
||||
```python
|
||||
# 旧版本 (main)
|
||||
workspace = None
|
||||
legacy_namespace = "chunks" # 旧版生成的名称
|
||||
final_namespace = "lightrag_vdb_chunks"
|
||||
|
||||
# 新版本 (feature)
|
||||
legacy_namespace = "lightrag_vdb_chunks" # 新版期望的legacy名称
|
||||
final_namespace = "lightrag_vdb_chunks_text_embedding_ada_002_1536d"
|
||||
|
||||
# 结果:找不到 "chunks" collection,迁移失败!
|
||||
```
|
||||
|
||||
#### 影响范围
|
||||
|
||||
1. **所有使用workspace的Qdrant用户** - 升级后数据无法访问
|
||||
2. **所有不使用workspace的Qdrant用户** - 升级后数据无法访问
|
||||
3. **仅有旧版本使用 `lightrag_vdb_{namespace}` 作为collection名的用户不受影响**
|
||||
|
||||
---
|
||||
|
||||
## 代码风格一致性检查
|
||||
|
||||
### ✅ 整体代码风格
|
||||
|
||||
1. **迁移逻辑模式统一**:
|
||||
- PostgreSQL和Qdrant使用相同的4-Case逻辑
|
||||
- 两者都有 `setup_table/setup_collection` 静态方法
|
||||
- ✅ 一致性良好
|
||||
|
||||
2. **命名规范**:
|
||||
- 都使用 `legacy_*` 和 `final_*` / `table_name` 命名
|
||||
- 都使用 `model_suffix` 生成逻辑
|
||||
- ✅ 一致性良好
|
||||
|
||||
3. **日志格式**:
|
||||
- 都使用相同的日志格式和级别
|
||||
- 都输出清晰的迁移进度
|
||||
- ✅ 一致性良好
|
||||
|
||||
4. **错误处理**:
|
||||
- 都定义了专门的迁移异常类
|
||||
- 都有迁移验证逻辑
|
||||
- ✅ 一致性良好
|
||||
|
||||
5. **批处理大小**:
|
||||
- PostgreSQL: 500条/批
|
||||
- Qdrant: 500条/批
|
||||
- ✅ 一致性良好
|
||||
|
||||
### ⚠️ 需要改进的地方
|
||||
|
||||
1. **注释风格不统一**:
|
||||
- 部分使用中文注释
|
||||
- 部分使用英文注释
|
||||
- 建议:统一为英文
|
||||
|
||||
2. **测试命名**:
|
||||
- 部分测试有中文docstring
|
||||
- 建议:保持中英双语
|
||||
|
||||
---
|
||||
|
||||
## 建议修复方案
|
||||
|
||||
### 1. 修复Qdrant兼容性问题
|
||||
|
||||
#### 方案A:支持多种legacy命名模式(推荐)
|
||||
|
||||
```python
|
||||
async def _find_legacy_collection(
|
||||
client: QdrantClient,
|
||||
workspace: str,
|
||||
namespace: str
|
||||
) -> str | None:
|
||||
"""
|
||||
Try to find legacy collection with various naming patterns
|
||||
for backward compatibility.
|
||||
|
||||
Returns:
|
||||
Collection name if found, None otherwise
|
||||
"""
|
||||
# Pattern 1: New legacy format (from previous feature branch)
|
||||
candidate1 = f"lightrag_vdb_{namespace}"
|
||||
|
||||
# Pattern 2: Old format with workspace
|
||||
candidate2 = f"{workspace}_{namespace}" if workspace else None
|
||||
|
||||
# Pattern 3: Old format without workspace
|
||||
candidate3 = namespace
|
||||
|
||||
# Try each pattern
|
||||
for candidate in [candidate1, candidate2, candidate3]:
|
||||
if candidate and client.collection_exists(candidate):
|
||||
logger.info(f"Found legacy collection: {candidate}")
|
||||
return candidate
|
||||
|
||||
return None
|
||||
```
|
||||
|
||||
然后在`setup_collection`中使用:
|
||||
|
||||
```python
|
||||
# Find legacy collection with backward compatibility
|
||||
legacy_collection = await _find_legacy_collection(
|
||||
client, workspace, namespace
|
||||
)
|
||||
|
||||
legacy_exists = legacy_collection is not None
|
||||
|
||||
# Case 4: Only legacy exists - Migrate data
|
||||
if legacy_exists and not new_collection_exists:
|
||||
logger.info(
|
||||
f"Qdrant: Migrating data from legacy collection '{legacy_collection}'"
|
||||
)
|
||||
# ... 迁移逻辑使用 legacy_collection
|
||||
```
|
||||
|
||||
#### 方案B:文档化手动迁移步骤
|
||||
|
||||
如果不想支持自动识别,至少要提供清晰的手动迁移文档。
|
||||
|
||||
### 2. 补充缺失的测试
|
||||
|
||||
#### 高优先级测试
|
||||
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_qdrant_legacy_workspace_migration():
|
||||
"""Test migration from old workspace-based naming"""
|
||||
# 创建旧格式collection: "workspace_chunks"
|
||||
# 验证新代码能识别并迁移
|
||||
pass
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_case1_both_exist_warning():
|
||||
"""Test Case 1: Both legacy and new exist"""
|
||||
# 验证只输出警告,不迁移
|
||||
pass
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_case2_only_new_exists():
|
||||
"""Test Case 2: Only new table/collection exists"""
|
||||
# 验证跳过迁移,检查索引
|
||||
pass
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_legacy_migration():
|
||||
"""Test migration when legacy is empty"""
|
||||
# 验证跳过数据迁移,只创建新表/collection
|
||||
pass
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_workspace_isolation():
|
||||
"""Test workspace isolation within same collection/table"""
|
||||
# 验证不同workspace的数据完全隔离
|
||||
pass
|
||||
```
|
||||
|
||||
#### 中等优先级测试
|
||||
|
||||
```python
|
||||
@pytest.mark.asyncio
|
||||
async def test_model_switch_scenario():
|
||||
"""Test switching embedding models"""
|
||||
# 验证切换模型后创建新表/collection
|
||||
pass
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_migration_failure_handling():
|
||||
"""Test migration error handling"""
|
||||
# 验证迁移失败时的异常处理
|
||||
pass
|
||||
```
|
||||
|
||||
### 3. 改进文档
|
||||
|
||||
需要在Migration Guide中明确说明:
|
||||
|
||||
1. **Qdrant用户的特殊注意事项**
|
||||
2. **如何手动迁移旧collection**
|
||||
3. **升级前的备份建议**
|
||||
4. **验证迁移成功的步骤**
|
||||
|
||||
---
|
||||
|
||||
## 总结
|
||||
|
||||
### 关键问题
|
||||
|
||||
1. ❌ **Qdrant向后兼容性严重问题** - 必须修复!
|
||||
2. ❌ **测试覆盖不足** - 缺少关键场景测试
|
||||
3. ✅ **PostgreSQL迁移逻辑正确**
|
||||
4. ✅ **代码风格基本一致**
|
||||
|
||||
### 建议优先级
|
||||
|
||||
1. **P0 - 立即修复**:
|
||||
- 修复Qdrant向后兼容性问题
|
||||
- 添加兼容性测试
|
||||
|
||||
2. **P1 - PR合并前**:
|
||||
- 补充Case 1、Case 2测试
|
||||
- 添加workspace隔离测试
|
||||
- 更新Migration Guide文档
|
||||
|
||||
3. **P2 - 后续改进**:
|
||||
- 补充边界情况测试
|
||||
- 统一注释语言
|
||||
- 添加更详细的错误信息
|
||||
|
||||
### 风险评估
|
||||
|
||||
- **不修复Qdrant兼容性**: 🔴 高风险 - 用户升级后数据丢失
|
||||
- **测试覆盖不足**: 🟡 中风险 - 生产环境可能出现未预期的问题
|
||||
- **文档不完整**: 🟡 中风险 - 用户不知道如何正确升级
|
||||
|
||||
---
|
||||
|
||||
## 下一步行动
|
||||
|
||||
1. 与用户确认是否接受方案A(推荐)或方案B
|
||||
2. 实施选定的修复方案
|
||||
3. 补充关键测试
|
||||
4. 更新文档
|
||||
5. 重新运行所有E2E测试
|
||||
6. 准备发布
|
||||
Loading…
Add table
Reference in a new issue