Update LLM cache migration docs and improve UX prompts

This commit is contained in:
yangdx 2025-11-08 23:48:19 +08:00
parent 1a91bcdb5f
commit 987bc09cab
3 changed files with 153 additions and 98 deletions

View file

@ -17,59 +17,11 @@ The tool migrates the following cache types:
- `default:extract:*` - Entity and relationship extraction caches
- `default:summary:*` - Entity and relationship summary caches
**Note**: Query caches (modes like `local`, `global`, etc.) are NOT migrated.
**Note**: Query caches (modes like `mix`,`local`, `global`, etc.) are NOT migrated.
## Prerequisites
### 1. Environment Variable Configuration
Ensure the relevant storage environment variables are configured in your `.env` file:
#### Workspace Configuration (Optional)
```bash
# Generic workspace (shared by all storages)
WORKSPACE=space1
# Or configure independent workspace for specific storage
POSTGRES_WORKSPACE=pg_space
MONGODB_WORKSPACE=mongo_space
REDIS_WORKSPACE=redis_space
```
**Workspace Priority**: Storage-specific > Generic WORKSPACE > Empty string
#### JsonKVStorage
```bash
WORKING_DIR=./rag_storage
```
#### RedisKVStorage
```bash
REDIS_URI=redis://localhost:6379
```
#### PGKVStorage
```bash
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_DATABASE=your_database
```
#### MongoKVStorage
```bash
MONGO_URI=mongodb://root:root@localhost:27017/
MONGO_DATABASE=LightRAG
```
### 2. Install Dependencies
Ensure LightRAG and its dependencies are installed:
```bash
pip install -r requirements.txt
```
The LLM Cache Migration Tool reads the storage configuration of the LightRAG Server and provides an LLM migration option to select source and destination storage. Ensure that both the source and destination storage have been correctly configured and are accessible via the LightRAG Server before cache migration.
## Usage
@ -95,10 +47,10 @@ Supported KV Storage Types:
[3] PGKVStorage
[4] MongoKVStorage
Select Source storage type (1-4) (Press Enter or 0 to exit): 1
Select Source storage type (1-4) (Press Enter to exit): 1
```
**Note**: You can press Enter or type `0` at the source storage selection to exit gracefully.
**Note**: You can press Enter or type `0` at any storage selection prompt to exit gracefully.
#### 2. Source Storage Validation
The tool will:
@ -121,23 +73,43 @@ Counting cache records...
```
**Progress Display by Storage Type:**
- **JsonKVStorage**: Fast in-memory counting, no progress display needed
- **RedisKVStorage**: Real-time scanning progress
- **JsonKVStorage**: Fast in-memory counting, displays final count without incremental progress
```
Counting cache records...
- Total: 8,734 records
```
- **RedisKVStorage**: Real-time scanning progress with incremental counts
```
Scanning Redis keys... found 8,734 records
```
- **PostgreSQL**: Shows timing if operation takes >1 second
- **PostgreSQL**: Quick COUNT(*) query, shows timing only if operation takes >1 second
```
Counting PostgreSQL records... (took 2.3s)
```
- **MongoDB**: Shows timing if operation takes >1 second
- **MongoDB**: Fast count_documents(), shows timing only if operation takes >1 second
```
Counting MongoDB documents... (took 1.8s)
```
#### 3. Select Target Storage Type
Repeat steps 1-2 to select and validate the target storage.
The tool automatically excludes the source storage type from the target selection and renumbers the remaining options sequentially:
```
Available Storage Types for Target (source: JsonKVStorage excluded):
[1] RedisKVStorage
[2] PGKVStorage
[3] MongoKVStorage
Select Target storage type (1-3) (Press Enter or 0 to exit): 1
```
**Important Notes:**
- You **cannot** select the same storage type for both source and target
- Options are automatically renumbered (e.g., [1], [2], [3] instead of [2], [3], [4])
- You can press Enter or type `0` to exit at this point as well
The tool then validates the target storage following the same process as the source (checking environment variables, initializing connection, counting records).
#### 4. Confirm Migration
@ -147,8 +119,9 @@ Migration Confirmation
Source: JsonKVStorage (workspace: space1) - 8,734 records
Target: MongoKVStorage (workspace: space1) - 0 records
Batch Size: 1,000 records/batch
Memory Mode: Streaming (memory-optimized)
⚠ Warning: Target storage already has 0 records
Warning: Target storage already has 0 records
Migration will overwrite records with the same keys
Continue? (y/n): y
@ -156,18 +129,27 @@ Continue? (y/n): y
#### 5. Execute Migration
Observe migration progress:
The tool uses **streaming migration** by default for memory efficiency. Observe migration progress:
```
=== Starting Migration ===
Batch 1/9: ████████░░ 1000/8734 (11%) - default:extract
Batch 2/9: ████████████████░░ 2000/8734 (23%) - default:extract
=== Starting Streaming Migration ===
💡 Memory-optimized mode: Processing 1,000 records at a time
Batch 1/9: ████████░░░░░░░░░░░░ 1000/8734 (11.4%) - default:extract ✓
Batch 2/9: ████████████░░░░░░░░ 2000/8734 (22.9%) - default:extract ✓
...
Batch 9/9: ████████████████████ 8734/8734 (100%) - default:summary
Batch 9/9: ████████████████████ 8734/8734 (100.0%) - default:summary
Persisting data to disk...
✓ Data persisted successfully
```
**Key Features:**
- **Streaming mode**: Processes data in batches without loading entire dataset into memory
- **Real-time progress**: Shows progress bar with precise percentage and cache type
- **Success indicators**: ✓ for successful batches, ✗ for failed batches
- **Constant memory usage**: Handles millions of records efficiently
#### 6. Review Migration Report
The tool provides a comprehensive final report showing statistics and any errors encountered:
@ -290,24 +272,96 @@ After migration completes, a detailed report includes:
1. **Data Overwrite Warning**
- Migration will overwrite records with the same keys in the target storage
- Tool displays a warning if target storage already has data
- Data migration can be performed repeatedly
- Pre-existing data in target storage is handled correctly
2. **Workspace Consistency**
- Recommended to use the same workspace for source and target
- Cache data in different workspaces are completely isolated
3. **Interrupt and Resume**
- Migration can be interrupted at any time (Ctrl+C)
- Already migrated data will remain in target storage
- Re-running will overwrite existing records
- Failed batches can be manually retried
4. **Performance Considerations**
- Large data migration may take considerable time
- Recommend migrating during off-peak hours
- Ensure stable network connection (for remote databases)
- Memory usage stays constant regardless of dataset size
## Storage Configuration
The tool supports multiple configuration methods with the following priority:
1. **Environment variables** (highest priority)
2. **config.ini file** (medium priority)
3. **Default values** (lowest priority)
#### Option A: Environment Variable Configuration
Configure storage settings in your `.env` file:
#### Workspace Configuration (Optional)
```bash
# Generic workspace (shared by all storages)
WORKSPACE=space1
# Or configure independent workspace for specific storage
POSTGRES_WORKSPACE=pg_space
MONGODB_WORKSPACE=mongo_space
REDIS_WORKSPACE=redis_space
```
**Workspace Priority**: Storage-specific > Generic WORKSPACE > Empty string
#### JsonKVStorage
```bash
WORKING_DIR=./rag_storage
```
#### RedisKVStorage
```bash
REDIS_URI=redis://localhost:6379
```
#### PGKVStorage
```bash
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_DATABASE=your_database
```
#### MongoKVStorage
```bash
MONGO_URI=mongodb://root:root@localhost:27017/
MONGO_DATABASE=LightRAG
```
#### Option B: config.ini Configuration
Alternatively, create a `config.ini` file in the project root:
```ini
[redis]
uri = redis://localhost:6379
[postgres]
host = localhost
port = 5432
user = postgres
password = yourpassword
database = lightrag
[mongodb]
uri = mongodb://root:root@localhost:27017/
database = LightRAG
```
**Note**: Environment variables take precedence over config.ini settings. JsonKVStorage uses `WORKING_DIR` environment variable or defaults to `./rag_storage`.
## Troubleshooting
### Missing Environment Variables
@ -345,22 +399,12 @@ MONGO_DATABASE=LightRAG
# 2. Run tool
python -m lightrag.tools.migrate_llm_cache
# 3. Select: 1 (JsonKVStorage) -> 4 (MongoKVStorage)
# 3. Select: 1 (JsonKVStorage) -> 1 (MongoKVStorage - renumbered from 4)
```
### Scenario 2: PostgreSQL Database Switch
**Note**: After selecting JsonKVStorage as source, MongoKVStorage will be shown as option [1] in the target selection since options are renumbered after excluding the source.
Use case: Database migration or upgrade
```bash
# 1. Configure old and new databases
POSTGRES_WORKSPACE=old_db # Source
# ... Configure new database as default
# 2. Run tool and select same storage type
```
### Scenario 3: Redis to PostgreSQL
### Scenario 2: Redis to PostgreSQL
Use case: Migrating from cache storage to relational database
@ -373,20 +417,38 @@ POSTGRES_HOST=new-postgres-server
# 2. Run tool
python -m lightrag.tools.migrate_llm_cache
# 3. Select: 2 (RedisKVStorage) -> 3 (PGKVStorage)
# 3. Select: 2 (RedisKVStorage) -> 2 (PGKVStorage - renumbered from 3)
```
**Note**: After selecting RedisKVStorage as source, PGKVStorage will be shown as option [2] in the target selection.
### Scenario 3: Different Workspaces Migration
Use case: Migrating data between different workspace environments
```bash
# Configure separate workspaces for source and target
POSTGRES_WORKSPACE=dev_workspace # For development environment
MONGODB_WORKSPACE=prod_workspace # For production environment
# Run tool
python -m lightrag.tools.migrate_llm_cache
# Select: 3 (PGKVStorage with dev_workspace) -> 3 (MongoKVStorage with prod_workspace)
```
**Note**: This allows you to migrate between different logical data partitions while changing storage backends.
## Tool Limitations
1. **Only Default Mode Caches**
1. **Same Storage Type Not Allowed**
- You cannot migrate between the same storage type (e.g., PostgreSQL to PostgreSQL)
- This is enforced by the tool automatically excluding the source storage type from target selection
- For same-storage migrations (e.g., database switches), use database-native tools instead
2. **Only Default Mode Caches**
- Only migrates `default:extract:*` and `default:summary:*`
- Query caches are not included
2. **Workspace Isolation**
- Different workspaces are treated as completely separate
- Cross-workspace migration requires manual workspace reconfiguration
3. **Network Dependency**
4. **Network Dependency**
- Tool requires stable network connection for remote databases
- Large datasets may fail if connection is interrupted
@ -407,10 +469,3 @@ python -m lightrag.tools.migrate_llm_cache
4. **Clean Old Data**
- After successful migration, consider cleaning old cache data
- Keep backups for a reasonable period before deletion
## Support
For issues or questions:
- Check LightRAG documentation
- Review error logs for detailed information
- Ensure all environment variables are correctly configured

View file

@ -975,7 +975,7 @@ class MigrationTool:
# Custom input handling with exit support
while True:
choice = input(
f"\nSelect {storage_type} storage type ({prompt_range}) (Press Enter or 0 to exit): "
f"\nSelect {storage_type} storage type ({prompt_range}) (Press Enter to exit): "
).strip()
# Check for exit