414 lines
11 KiB
Markdown
414 lines
11 KiB
Markdown
# LLM Cache Migration Tool - User Guide
|
|
|
|
## Overview
|
|
|
|
This tool migrates LightRAG's LLM response cache between different KV storage implementations. It specifically migrates caches generated during file extraction (mode `default`), including entity extraction and summary caches.
|
|
|
|
## Supported Storage Types
|
|
|
|
1. **JsonKVStorage** - File-based JSON storage
|
|
2. **RedisKVStorage** - Redis database storage
|
|
3. **PGKVStorage** - PostgreSQL database storage
|
|
4. **MongoKVStorage** - MongoDB database storage
|
|
|
|
## Cache Types
|
|
|
|
The tool migrates the following cache types:
|
|
- `default:extract:*` - Entity and relationship extraction caches
|
|
- `default:summary:*` - Entity and relationship summary caches
|
|
|
|
**Note**: Query caches (modes like `local`, `global`, etc.) are NOT migrated.
|
|
|
|
## Prerequisites
|
|
|
|
### 1. Environment Variable Configuration
|
|
|
|
Ensure the relevant storage environment variables are configured in your `.env` file:
|
|
|
|
#### Workspace Configuration (Optional)
|
|
```bash
|
|
# Generic workspace (shared by all storages)
|
|
WORKSPACE=space1
|
|
|
|
# Or configure independent workspace for specific storage
|
|
POSTGRES_WORKSPACE=pg_space
|
|
MONGODB_WORKSPACE=mongo_space
|
|
REDIS_WORKSPACE=redis_space
|
|
```
|
|
|
|
**Workspace Priority**: Storage-specific > Generic WORKSPACE > Empty string
|
|
|
|
#### JsonKVStorage
|
|
```bash
|
|
WORKING_DIR=./rag_storage
|
|
```
|
|
|
|
#### RedisKVStorage
|
|
```bash
|
|
REDIS_URI=redis://localhost:6379
|
|
```
|
|
|
|
#### PGKVStorage
|
|
```bash
|
|
POSTGRES_HOST=localhost
|
|
POSTGRES_PORT=5432
|
|
POSTGRES_USER=your_username
|
|
POSTGRES_PASSWORD=your_password
|
|
POSTGRES_DATABASE=your_database
|
|
```
|
|
|
|
#### MongoKVStorage
|
|
```bash
|
|
MONGO_URI=mongodb://root:root@localhost:27017/
|
|
MONGO_DATABASE=LightRAG
|
|
```
|
|
|
|
### 2. Install Dependencies
|
|
|
|
Ensure LightRAG and its dependencies are installed:
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
Run from the LightRAG project root directory:
|
|
|
|
```bash
|
|
python tools/migrate_llm_cache.py
|
|
```
|
|
|
|
### Interactive Workflow
|
|
|
|
The tool guides you through the following steps:
|
|
|
|
#### 1. Select Source Storage Type
|
|
```
|
|
Supported KV Storage Types:
|
|
[1] JsonKVStorage
|
|
[2] RedisKVStorage
|
|
[3] PGKVStorage
|
|
[4] MongoKVStorage
|
|
|
|
Select Source storage type (1-4) (Press Enter or 0 to exit): 1
|
|
```
|
|
|
|
**Note**: You can press Enter or type `0` at the source storage selection to exit gracefully.
|
|
|
|
#### 2. Source Storage Validation
|
|
The tool will:
|
|
- Check required environment variables
|
|
- Auto-detect workspace configuration
|
|
- Initialize and connect to storage
|
|
- Count cache records available for migration
|
|
|
|
```
|
|
Checking environment variables...
|
|
✓ All required environment variables are set
|
|
|
|
Initializing Source storage...
|
|
- Storage Type: JsonKVStorage
|
|
- Workspace: space1
|
|
- Connection Status: ✓ Success
|
|
|
|
Counting cache records...
|
|
- Total: 8,734 records
|
|
```
|
|
|
|
**Progress Display by Storage Type:**
|
|
- **JsonKVStorage**: Fast in-memory counting, no progress display needed
|
|
- **RedisKVStorage**: Real-time scanning progress
|
|
```
|
|
Scanning Redis keys... found 8,734 records
|
|
```
|
|
- **PostgreSQL**: Shows timing if operation takes >1 second
|
|
```
|
|
Counting PostgreSQL records... (took 2.3s)
|
|
```
|
|
- **MongoDB**: Shows timing if operation takes >1 second
|
|
```
|
|
Counting MongoDB documents... (took 1.8s)
|
|
```
|
|
|
|
#### 3. Select Target Storage Type
|
|
|
|
Repeat steps 1-2 to select and validate the target storage.
|
|
|
|
#### 4. Confirm Migration
|
|
|
|
```
|
|
==================================================
|
|
Migration Confirmation
|
|
Source: JsonKVStorage (workspace: space1) - 8,734 records
|
|
Target: MongoKVStorage (workspace: space1) - 0 records
|
|
Batch Size: 1,000 records/batch
|
|
|
|
⚠ Warning: Target storage already has 0 records
|
|
Migration will overwrite records with the same keys
|
|
|
|
Continue? (y/n): y
|
|
```
|
|
|
|
#### 5. Execute Migration
|
|
|
|
Observe migration progress:
|
|
|
|
```
|
|
=== Starting Migration ===
|
|
Batch 1/9: ████████░░ 1000/8734 (11%) - default:extract
|
|
Batch 2/9: ████████████████░░ 2000/8734 (23%) - default:extract
|
|
...
|
|
Batch 9/9: ████████████████████ 8734/8734 (100%) - default:summary
|
|
|
|
Persisting data to disk...
|
|
```
|
|
|
|
#### 6. Review Migration Report
|
|
|
|
The tool provides a comprehensive final report showing statistics and any errors encountered:
|
|
|
|
**Successful Migration:**
|
|
```
|
|
Migration Complete - Final Report
|
|
|
|
📊 Statistics:
|
|
Total source records: 8,734
|
|
Total batches: 9
|
|
Successful batches: 9
|
|
Failed batches: 0
|
|
Successfully migrated: 8,734
|
|
Failed to migrate: 0
|
|
Success rate: 100.00%
|
|
|
|
✓ SUCCESS: All records migrated successfully!
|
|
```
|
|
|
|
**Migration with Errors:**
|
|
```
|
|
Migration Complete - Final Report
|
|
|
|
📊 Statistics:
|
|
Total source records: 8,734
|
|
Total batches: 9
|
|
Successful batches: 8
|
|
Failed batches: 1
|
|
Successfully migrated: 7,734
|
|
Failed to migrate: 1,000
|
|
Success rate: 88.55%
|
|
|
|
⚠️ Errors encountered: 1
|
|
|
|
Error Details:
|
|
------------------------------------------------------------
|
|
|
|
Error Summary:
|
|
- ConnectionError: 1 occurrence(s)
|
|
|
|
First 5 errors:
|
|
|
|
1. Batch 2
|
|
Type: ConnectionError
|
|
Message: Connection timeout after 30s
|
|
Records lost: 1,000
|
|
|
|
⚠️ WARNING: Migration completed with errors!
|
|
Please review the error details above.
|
|
```
|
|
|
|
## Technical Details
|
|
|
|
### Workspace Handling
|
|
|
|
The tool retrieves workspace in the following priority order:
|
|
|
|
1. **Storage-specific workspace environment variables**
|
|
- PGKVStorage: `POSTGRES_WORKSPACE`
|
|
- MongoKVStorage: `MONGODB_WORKSPACE`
|
|
- RedisKVStorage: `REDIS_WORKSPACE`
|
|
|
|
2. **Generic workspace environment variable**
|
|
- `WORKSPACE`
|
|
|
|
3. **Default value**
|
|
- Empty string (uses storage's default workspace)
|
|
|
|
### Batch Migration
|
|
|
|
- Default batch size: 1000 records/batch
|
|
- Avoids memory overflow from loading too much data at once
|
|
- Each batch is committed independently, supporting resume capability
|
|
|
|
### Memory-Efficient Pagination
|
|
|
|
For large datasets, the tool implements storage-specific pagination strategies:
|
|
|
|
- **JsonKVStorage**: Direct in-memory access (data already loaded in shared storage)
|
|
- **RedisKVStorage**: Cursor-based SCAN with pipeline batching (1000 keys/batch)
|
|
- **PGKVStorage**: SQL LIMIT/OFFSET pagination (1000 records/batch)
|
|
- **MongoKVStorage**: Cursor streaming with batch_size (1000 documents/batch)
|
|
|
|
This ensures the tool can handle millions of cache records without memory issues.
|
|
|
|
### Prefix Filtering Implementation
|
|
|
|
The tool uses optimized filtering methods for different storage types:
|
|
|
|
- **JsonKVStorage**: Direct dictionary iteration with lock protection
|
|
- **RedisKVStorage**: SCAN command with namespace-prefixed patterns + pipeline for bulk GET
|
|
- **PGKVStorage**: SQL LIKE queries with proper field mapping (id, return_value, etc.)
|
|
- **MongoKVStorage**: MongoDB regex queries on `_id` field with cursor streaming
|
|
|
|
## Error Handling & Resilience
|
|
|
|
The tool implements comprehensive error tracking to ensure transparent and resilient migrations:
|
|
|
|
### Batch-Level Error Tracking
|
|
- Each batch is independently error-checked
|
|
- Failed batches are logged but don't stop the migration
|
|
- Successful batches are committed even if later batches fail
|
|
- Real-time progress shows ✓ (success) or ✗ (failed) for each batch
|
|
|
|
### Error Reporting
|
|
After migration completes, a detailed report includes:
|
|
- **Statistics**: Total records, success/failure counts, success rate
|
|
- **Error Summary**: Grouped by error type with occurrence counts
|
|
- **Error Details**: Batch number, error type, message, and records lost
|
|
- **Recommendations**: Clear indication of success or need for review
|
|
|
|
### No Double Data Loading
|
|
- Unlike traditional verification approaches, the tool does NOT reload all target data
|
|
- Errors are detected during migration, not after
|
|
- This eliminates memory overhead and handles pre-existing target data correctly
|
|
|
|
## Important Notes
|
|
|
|
1. **Data Overwrite Warning**
|
|
- Migration will overwrite records with the same keys in the target storage
|
|
- Tool displays a warning if target storage already has data
|
|
- Pre-existing data in target storage is handled correctly
|
|
|
|
2. **Workspace Consistency**
|
|
- Recommended to use the same workspace for source and target
|
|
- Cache data in different workspaces are completely isolated
|
|
|
|
3. **Interrupt and Resume**
|
|
- Migration can be interrupted at any time (Ctrl+C)
|
|
- Already migrated data will remain in target storage
|
|
- Re-running will overwrite existing records
|
|
- Failed batches can be manually retried
|
|
|
|
4. **Performance Considerations**
|
|
- Large data migration may take considerable time
|
|
- Recommend migrating during off-peak hours
|
|
- Ensure stable network connection (for remote databases)
|
|
- Memory usage stays constant regardless of dataset size
|
|
|
|
## Troubleshooting
|
|
|
|
### Missing Environment Variables
|
|
```
|
|
✗ Missing required environment variables: POSTGRES_USER, POSTGRES_PASSWORD
|
|
```
|
|
**Solution**: Add missing variables to your `.env` file
|
|
|
|
### Connection Failed
|
|
```
|
|
✗ Initialization failed: Connection refused
|
|
```
|
|
**Solutions**:
|
|
- Check if database service is running
|
|
- Verify connection parameters (host, port, credentials)
|
|
- Check firewall settings
|
|
|
|
**Solutions**:
|
|
- Check migration process for error logs
|
|
- Re-run migration tool
|
|
- Check target storage capacity and permissions
|
|
|
|
## Example Scenarios
|
|
|
|
### Scenario 1: JSON to MongoDB Migration
|
|
|
|
Use case: Migrating from single-machine development to production
|
|
|
|
```bash
|
|
# 1. Configure environment variables
|
|
WORKSPACE=production
|
|
MONGO_URI=mongodb://user:pass@prod-server:27017/
|
|
MONGO_DATABASE=LightRAG
|
|
|
|
# 2. Run tool
|
|
python tools/migrate_llm_cache.py
|
|
|
|
# 3. Select: 1 (JsonKVStorage) -> 4 (MongoKVStorage)
|
|
```
|
|
|
|
### Scenario 2: PostgreSQL Database Switch
|
|
|
|
Use case: Database migration or upgrade
|
|
|
|
```bash
|
|
# 1. Configure old and new databases
|
|
POSTGRES_WORKSPACE=old_db # Source
|
|
# ... Configure new database as default
|
|
|
|
# 2. Run tool and select same storage type
|
|
```
|
|
|
|
### Scenario 3: Redis to PostgreSQL
|
|
|
|
Use case: Migrating from cache storage to relational database
|
|
|
|
```bash
|
|
# 1. Ensure both databases are accessible
|
|
REDIS_URI=redis://old-redis:6379
|
|
POSTGRES_HOST=new-postgres-server
|
|
# ... Other PostgreSQL configs
|
|
|
|
# 2. Run tool
|
|
python tools/migrate_llm_cache.py
|
|
|
|
# 3. Select: 2 (RedisKVStorage) -> 3 (PGKVStorage)
|
|
```
|
|
|
|
## Tool Limitations
|
|
|
|
1. **Only Default Mode Caches**
|
|
- Only migrates `default:extract:*` and `default:summary:*`
|
|
- Query caches are not included
|
|
|
|
2. **Workspace Isolation**
|
|
- Different workspaces are treated as completely separate
|
|
- Cross-workspace migration requires manual workspace reconfiguration
|
|
|
|
3. **Network Dependency**
|
|
- Tool requires stable network connection for remote databases
|
|
- Large datasets may fail if connection is interrupted
|
|
|
|
## Best Practices
|
|
|
|
1. **Backup Before Migration**
|
|
- Always backup your data before migration
|
|
- Test migration on non-production data first
|
|
|
|
2. **Verify Results**
|
|
- Check the verification output after migration
|
|
- Manually verify a few cache entries if needed
|
|
|
|
3. **Monitor Performance**
|
|
- Watch database resource usage during migration
|
|
- Consider migrating in smaller batches if needed
|
|
|
|
4. **Clean Old Data**
|
|
- After successful migration, consider cleaning old cache data
|
|
- Keep backups for a reasonable period before deletion
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
- Check LightRAG documentation
|
|
- Review error logs for detailed information
|
|
- Ensure all environment variables are correctly configured
|