Update LLM cache migration docs and improve UX prompts
This commit is contained in:
parent
1a91bcdb5f
commit
987bc09cab
3 changed files with 153 additions and 98 deletions
|
|
@ -17,59 +17,11 @@ The tool migrates the following cache types:
|
||||||
- `default:extract:*` - Entity and relationship extraction caches
|
- `default:extract:*` - Entity and relationship extraction caches
|
||||||
- `default:summary:*` - Entity and relationship summary caches
|
- `default:summary:*` - Entity and relationship summary caches
|
||||||
|
|
||||||
**Note**: Query caches (modes like `local`, `global`, etc.) are NOT migrated.
|
**Note**: Query caches (modes like `mix`,`local`, `global`, etc.) are NOT migrated.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
### 1. Environment Variable Configuration
|
The LLM Cache Migration Tool reads the storage configuration of the LightRAG Server and provides an LLM migration option to select source and destination storage. Ensure that both the source and destination storage have been correctly configured and are accessible via the LightRAG Server before cache migration.
|
||||||
|
|
||||||
Ensure the relevant storage environment variables are configured in your `.env` file:
|
|
||||||
|
|
||||||
#### Workspace Configuration (Optional)
|
|
||||||
```bash
|
|
||||||
# Generic workspace (shared by all storages)
|
|
||||||
WORKSPACE=space1
|
|
||||||
|
|
||||||
# Or configure independent workspace for specific storage
|
|
||||||
POSTGRES_WORKSPACE=pg_space
|
|
||||||
MONGODB_WORKSPACE=mongo_space
|
|
||||||
REDIS_WORKSPACE=redis_space
|
|
||||||
```
|
|
||||||
|
|
||||||
**Workspace Priority**: Storage-specific > Generic WORKSPACE > Empty string
|
|
||||||
|
|
||||||
#### JsonKVStorage
|
|
||||||
```bash
|
|
||||||
WORKING_DIR=./rag_storage
|
|
||||||
```
|
|
||||||
|
|
||||||
#### RedisKVStorage
|
|
||||||
```bash
|
|
||||||
REDIS_URI=redis://localhost:6379
|
|
||||||
```
|
|
||||||
|
|
||||||
#### PGKVStorage
|
|
||||||
```bash
|
|
||||||
POSTGRES_HOST=localhost
|
|
||||||
POSTGRES_PORT=5432
|
|
||||||
POSTGRES_USER=your_username
|
|
||||||
POSTGRES_PASSWORD=your_password
|
|
||||||
POSTGRES_DATABASE=your_database
|
|
||||||
```
|
|
||||||
|
|
||||||
#### MongoKVStorage
|
|
||||||
```bash
|
|
||||||
MONGO_URI=mongodb://root:root@localhost:27017/
|
|
||||||
MONGO_DATABASE=LightRAG
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Install Dependencies
|
|
||||||
|
|
||||||
Ensure LightRAG and its dependencies are installed:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
|
|
@ -95,10 +47,10 @@ Supported KV Storage Types:
|
||||||
[3] PGKVStorage
|
[3] PGKVStorage
|
||||||
[4] MongoKVStorage
|
[4] MongoKVStorage
|
||||||
|
|
||||||
Select Source storage type (1-4) (Press Enter or 0 to exit): 1
|
Select Source storage type (1-4) (Press Enter to exit): 1
|
||||||
```
|
```
|
||||||
|
|
||||||
**Note**: You can press Enter or type `0` at the source storage selection to exit gracefully.
|
**Note**: You can press Enter or type `0` at any storage selection prompt to exit gracefully.
|
||||||
|
|
||||||
#### 2. Source Storage Validation
|
#### 2. Source Storage Validation
|
||||||
The tool will:
|
The tool will:
|
||||||
|
|
@ -121,23 +73,43 @@ Counting cache records...
|
||||||
```
|
```
|
||||||
|
|
||||||
**Progress Display by Storage Type:**
|
**Progress Display by Storage Type:**
|
||||||
- **JsonKVStorage**: Fast in-memory counting, no progress display needed
|
- **JsonKVStorage**: Fast in-memory counting, displays final count without incremental progress
|
||||||
- **RedisKVStorage**: Real-time scanning progress
|
```
|
||||||
|
Counting cache records...
|
||||||
|
- Total: 8,734 records
|
||||||
|
```
|
||||||
|
- **RedisKVStorage**: Real-time scanning progress with incremental counts
|
||||||
```
|
```
|
||||||
Scanning Redis keys... found 8,734 records
|
Scanning Redis keys... found 8,734 records
|
||||||
```
|
```
|
||||||
- **PostgreSQL**: Shows timing if operation takes >1 second
|
- **PostgreSQL**: Quick COUNT(*) query, shows timing only if operation takes >1 second
|
||||||
```
|
```
|
||||||
Counting PostgreSQL records... (took 2.3s)
|
Counting PostgreSQL records... (took 2.3s)
|
||||||
```
|
```
|
||||||
- **MongoDB**: Shows timing if operation takes >1 second
|
- **MongoDB**: Fast count_documents(), shows timing only if operation takes >1 second
|
||||||
```
|
```
|
||||||
Counting MongoDB documents... (took 1.8s)
|
Counting MongoDB documents... (took 1.8s)
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 3. Select Target Storage Type
|
#### 3. Select Target Storage Type
|
||||||
|
|
||||||
Repeat steps 1-2 to select and validate the target storage.
|
The tool automatically excludes the source storage type from the target selection and renumbers the remaining options sequentially:
|
||||||
|
|
||||||
|
```
|
||||||
|
Available Storage Types for Target (source: JsonKVStorage excluded):
|
||||||
|
[1] RedisKVStorage
|
||||||
|
[2] PGKVStorage
|
||||||
|
[3] MongoKVStorage
|
||||||
|
|
||||||
|
Select Target storage type (1-3) (Press Enter or 0 to exit): 1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important Notes:**
|
||||||
|
- You **cannot** select the same storage type for both source and target
|
||||||
|
- Options are automatically renumbered (e.g., [1], [2], [3] instead of [2], [3], [4])
|
||||||
|
- You can press Enter or type `0` to exit at this point as well
|
||||||
|
|
||||||
|
The tool then validates the target storage following the same process as the source (checking environment variables, initializing connection, counting records).
|
||||||
|
|
||||||
#### 4. Confirm Migration
|
#### 4. Confirm Migration
|
||||||
|
|
||||||
|
|
@ -147,8 +119,9 @@ Migration Confirmation
|
||||||
Source: JsonKVStorage (workspace: space1) - 8,734 records
|
Source: JsonKVStorage (workspace: space1) - 8,734 records
|
||||||
Target: MongoKVStorage (workspace: space1) - 0 records
|
Target: MongoKVStorage (workspace: space1) - 0 records
|
||||||
Batch Size: 1,000 records/batch
|
Batch Size: 1,000 records/batch
|
||||||
|
Memory Mode: Streaming (memory-optimized)
|
||||||
|
|
||||||
⚠ Warning: Target storage already has 0 records
|
⚠️ Warning: Target storage already has 0 records
|
||||||
Migration will overwrite records with the same keys
|
Migration will overwrite records with the same keys
|
||||||
|
|
||||||
Continue? (y/n): y
|
Continue? (y/n): y
|
||||||
|
|
@ -156,18 +129,27 @@ Continue? (y/n): y
|
||||||
|
|
||||||
#### 5. Execute Migration
|
#### 5. Execute Migration
|
||||||
|
|
||||||
Observe migration progress:
|
The tool uses **streaming migration** by default for memory efficiency. Observe migration progress:
|
||||||
|
|
||||||
```
|
```
|
||||||
=== Starting Migration ===
|
=== Starting Streaming Migration ===
|
||||||
Batch 1/9: ████████░░ 1000/8734 (11%) - default:extract
|
💡 Memory-optimized mode: Processing 1,000 records at a time
|
||||||
Batch 2/9: ████████████████░░ 2000/8734 (23%) - default:extract
|
|
||||||
|
Batch 1/9: ████████░░░░░░░░░░░░ 1000/8734 (11.4%) - default:extract ✓
|
||||||
|
Batch 2/9: ████████████░░░░░░░░ 2000/8734 (22.9%) - default:extract ✓
|
||||||
...
|
...
|
||||||
Batch 9/9: ████████████████████ 8734/8734 (100%) - default:summary
|
Batch 9/9: ████████████████████ 8734/8734 (100.0%) - default:summary ✓
|
||||||
|
|
||||||
Persisting data to disk...
|
Persisting data to disk...
|
||||||
|
✓ Data persisted successfully
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- **Streaming mode**: Processes data in batches without loading entire dataset into memory
|
||||||
|
- **Real-time progress**: Shows progress bar with precise percentage and cache type
|
||||||
|
- **Success indicators**: ✓ for successful batches, ✗ for failed batches
|
||||||
|
- **Constant memory usage**: Handles millions of records efficiently
|
||||||
|
|
||||||
#### 6. Review Migration Report
|
#### 6. Review Migration Report
|
||||||
|
|
||||||
The tool provides a comprehensive final report showing statistics and any errors encountered:
|
The tool provides a comprehensive final report showing statistics and any errors encountered:
|
||||||
|
|
@ -290,24 +272,96 @@ After migration completes, a detailed report includes:
|
||||||
1. **Data Overwrite Warning**
|
1. **Data Overwrite Warning**
|
||||||
- Migration will overwrite records with the same keys in the target storage
|
- Migration will overwrite records with the same keys in the target storage
|
||||||
- Tool displays a warning if target storage already has data
|
- Tool displays a warning if target storage already has data
|
||||||
|
- Data migration can be performed repeatedly
|
||||||
- Pre-existing data in target storage is handled correctly
|
- Pre-existing data in target storage is handled correctly
|
||||||
|
|
||||||
2. **Workspace Consistency**
|
|
||||||
- Recommended to use the same workspace for source and target
|
|
||||||
- Cache data in different workspaces are completely isolated
|
|
||||||
|
|
||||||
3. **Interrupt and Resume**
|
3. **Interrupt and Resume**
|
||||||
- Migration can be interrupted at any time (Ctrl+C)
|
- Migration can be interrupted at any time (Ctrl+C)
|
||||||
- Already migrated data will remain in target storage
|
- Already migrated data will remain in target storage
|
||||||
- Re-running will overwrite existing records
|
- Re-running will overwrite existing records
|
||||||
- Failed batches can be manually retried
|
- Failed batches can be manually retried
|
||||||
|
|
||||||
4. **Performance Considerations**
|
4. **Performance Considerations**
|
||||||
- Large data migration may take considerable time
|
- Large data migration may take considerable time
|
||||||
- Recommend migrating during off-peak hours
|
- Recommend migrating during off-peak hours
|
||||||
- Ensure stable network connection (for remote databases)
|
- Ensure stable network connection (for remote databases)
|
||||||
- Memory usage stays constant regardless of dataset size
|
- Memory usage stays constant regardless of dataset size
|
||||||
|
|
||||||
|
## Storage Configuration
|
||||||
|
|
||||||
|
The tool supports multiple configuration methods with the following priority:
|
||||||
|
|
||||||
|
1. **Environment variables** (highest priority)
|
||||||
|
2. **config.ini file** (medium priority)
|
||||||
|
3. **Default values** (lowest priority)
|
||||||
|
|
||||||
|
#### Option A: Environment Variable Configuration
|
||||||
|
|
||||||
|
Configure storage settings in your `.env` file:
|
||||||
|
|
||||||
|
#### Workspace Configuration (Optional)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generic workspace (shared by all storages)
|
||||||
|
WORKSPACE=space1
|
||||||
|
|
||||||
|
# Or configure independent workspace for specific storage
|
||||||
|
POSTGRES_WORKSPACE=pg_space
|
||||||
|
MONGODB_WORKSPACE=mongo_space
|
||||||
|
REDIS_WORKSPACE=redis_space
|
||||||
|
```
|
||||||
|
|
||||||
|
**Workspace Priority**: Storage-specific > Generic WORKSPACE > Empty string
|
||||||
|
|
||||||
|
#### JsonKVStorage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
WORKING_DIR=./rag_storage
|
||||||
|
```
|
||||||
|
|
||||||
|
#### RedisKVStorage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
REDIS_URI=redis://localhost:6379
|
||||||
|
```
|
||||||
|
|
||||||
|
#### PGKVStorage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POSTGRES_HOST=localhost
|
||||||
|
POSTGRES_PORT=5432
|
||||||
|
POSTGRES_USER=your_username
|
||||||
|
POSTGRES_PASSWORD=your_password
|
||||||
|
POSTGRES_DATABASE=your_database
|
||||||
|
```
|
||||||
|
|
||||||
|
#### MongoKVStorage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
MONGO_URI=mongodb://root:root@localhost:27017/
|
||||||
|
MONGO_DATABASE=LightRAG
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Option B: config.ini Configuration
|
||||||
|
|
||||||
|
Alternatively, create a `config.ini` file in the project root:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[redis]
|
||||||
|
uri = redis://localhost:6379
|
||||||
|
|
||||||
|
[postgres]
|
||||||
|
host = localhost
|
||||||
|
port = 5432
|
||||||
|
user = postgres
|
||||||
|
password = yourpassword
|
||||||
|
database = lightrag
|
||||||
|
|
||||||
|
[mongodb]
|
||||||
|
uri = mongodb://root:root@localhost:27017/
|
||||||
|
database = LightRAG
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: Environment variables take precedence over config.ini settings. JsonKVStorage uses `WORKING_DIR` environment variable or defaults to `./rag_storage`.
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### Missing Environment Variables
|
### Missing Environment Variables
|
||||||
|
|
@ -345,22 +399,12 @@ MONGO_DATABASE=LightRAG
|
||||||
# 2. Run tool
|
# 2. Run tool
|
||||||
python -m lightrag.tools.migrate_llm_cache
|
python -m lightrag.tools.migrate_llm_cache
|
||||||
|
|
||||||
# 3. Select: 1 (JsonKVStorage) -> 4 (MongoKVStorage)
|
# 3. Select: 1 (JsonKVStorage) -> 1 (MongoKVStorage - renumbered from 4)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Scenario 2: PostgreSQL Database Switch
|
**Note**: After selecting JsonKVStorage as source, MongoKVStorage will be shown as option [1] in the target selection since options are renumbered after excluding the source.
|
||||||
|
|
||||||
Use case: Database migration or upgrade
|
### Scenario 2: Redis to PostgreSQL
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. Configure old and new databases
|
|
||||||
POSTGRES_WORKSPACE=old_db # Source
|
|
||||||
# ... Configure new database as default
|
|
||||||
|
|
||||||
# 2. Run tool and select same storage type
|
|
||||||
```
|
|
||||||
|
|
||||||
### Scenario 3: Redis to PostgreSQL
|
|
||||||
|
|
||||||
Use case: Migrating from cache storage to relational database
|
Use case: Migrating from cache storage to relational database
|
||||||
|
|
||||||
|
|
@ -373,20 +417,38 @@ POSTGRES_HOST=new-postgres-server
|
||||||
# 2. Run tool
|
# 2. Run tool
|
||||||
python -m lightrag.tools.migrate_llm_cache
|
python -m lightrag.tools.migrate_llm_cache
|
||||||
|
|
||||||
# 3. Select: 2 (RedisKVStorage) -> 3 (PGKVStorage)
|
# 3. Select: 2 (RedisKVStorage) -> 2 (PGKVStorage - renumbered from 3)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Note**: After selecting RedisKVStorage as source, PGKVStorage will be shown as option [2] in the target selection.
|
||||||
|
|
||||||
|
### Scenario 3: Different Workspaces Migration
|
||||||
|
|
||||||
|
Use case: Migrating data between different workspace environments
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Configure separate workspaces for source and target
|
||||||
|
POSTGRES_WORKSPACE=dev_workspace # For development environment
|
||||||
|
MONGODB_WORKSPACE=prod_workspace # For production environment
|
||||||
|
|
||||||
|
# Run tool
|
||||||
|
python -m lightrag.tools.migrate_llm_cache
|
||||||
|
|
||||||
|
# Select: 3 (PGKVStorage with dev_workspace) -> 3 (MongoKVStorage with prod_workspace)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: This allows you to migrate between different logical data partitions while changing storage backends.
|
||||||
|
|
||||||
## Tool Limitations
|
## Tool Limitations
|
||||||
|
|
||||||
1. **Only Default Mode Caches**
|
1. **Same Storage Type Not Allowed**
|
||||||
|
- You cannot migrate between the same storage type (e.g., PostgreSQL to PostgreSQL)
|
||||||
|
- This is enforced by the tool automatically excluding the source storage type from target selection
|
||||||
|
- For same-storage migrations (e.g., database switches), use database-native tools instead
|
||||||
|
2. **Only Default Mode Caches**
|
||||||
- Only migrates `default:extract:*` and `default:summary:*`
|
- Only migrates `default:extract:*` and `default:summary:*`
|
||||||
- Query caches are not included
|
- Query caches are not included
|
||||||
|
4. **Network Dependency**
|
||||||
2. **Workspace Isolation**
|
|
||||||
- Different workspaces are treated as completely separate
|
|
||||||
- Cross-workspace migration requires manual workspace reconfiguration
|
|
||||||
|
|
||||||
3. **Network Dependency**
|
|
||||||
- Tool requires stable network connection for remote databases
|
- Tool requires stable network connection for remote databases
|
||||||
- Large datasets may fail if connection is interrupted
|
- Large datasets may fail if connection is interrupted
|
||||||
|
|
||||||
|
|
@ -407,10 +469,3 @@ python -m lightrag.tools.migrate_llm_cache
|
||||||
4. **Clean Old Data**
|
4. **Clean Old Data**
|
||||||
- After successful migration, consider cleaning old cache data
|
- After successful migration, consider cleaning old cache data
|
||||||
- Keep backups for a reasonable period before deletion
|
- Keep backups for a reasonable period before deletion
|
||||||
|
|
||||||
## Support
|
|
||||||
|
|
||||||
For issues or questions:
|
|
||||||
- Check LightRAG documentation
|
|
||||||
- Review error logs for detailed information
|
|
||||||
- Ensure all environment variables are correctly configured
|
|
||||||
|
|
|
||||||
|
|
@ -975,7 +975,7 @@ class MigrationTool:
|
||||||
# Custom input handling with exit support
|
# Custom input handling with exit support
|
||||||
while True:
|
while True:
|
||||||
choice = input(
|
choice = input(
|
||||||
f"\nSelect {storage_type} storage type ({prompt_range}) (Press Enter or 0 to exit): "
|
f"\nSelect {storage_type} storage type ({prompt_range}) (Press Enter to exit): "
|
||||||
).strip()
|
).strip()
|
||||||
|
|
||||||
# Check for exit
|
# Check for exit
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue