cherry-pick 1485cb82
This commit is contained in:
parent
47c48cca91
commit
462a3bade2
2 changed files with 668 additions and 50 deletions
661
lightrag/tools/README_CLEAN_LLM_QUERY_CACHE.md
Normal file
661
lightrag/tools/README_CLEAN_LLM_QUERY_CACHE.md
Normal file
|
|
@ -0,0 +1,661 @@
|
||||||
|
# LLM Query Cache Cleanup Tool - User Guide
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This tool cleans up LightRAG's LLM query cache from KV storage implementations. It specifically targets query caches generated during RAG query operations (modes: `mix`, `hybrid`, `local`, `global`), including both query and keywords caches.
|
||||||
|
|
||||||
|
## Supported Storage Types
|
||||||
|
|
||||||
|
1. **JsonKVStorage** - File-based JSON storage
|
||||||
|
2. **RedisKVStorage** - Redis database storage
|
||||||
|
3. **PGKVStorage** - PostgreSQL database storage
|
||||||
|
4. **MongoKVStorage** - MongoDB database storage
|
||||||
|
|
||||||
|
## Cache Types
|
||||||
|
|
||||||
|
The tool cleans up the following query cache types:
|
||||||
|
|
||||||
|
### Query Cache Modes (4 types)
|
||||||
|
- `mix:*` - Mixed mode query caches
|
||||||
|
- `hybrid:*` - Hybrid mode query caches
|
||||||
|
- `local:*` - Local mode query caches
|
||||||
|
- `global:*` - Global mode query caches
|
||||||
|
|
||||||
|
### Cache Content Types (2 types)
|
||||||
|
- `*:query:*` - Query result caches
|
||||||
|
- `*:keywords:*` - Keywords extraction caches
|
||||||
|
|
||||||
|
### Cache Key Format
|
||||||
|
```
|
||||||
|
<mode>:<cache_type>:<hash>
|
||||||
|
```
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
- `mix:query:5ce04d25e957c290216cee5bfe6344fa`
|
||||||
|
- `mix:keywords:fee77b98244a0b047ce95e21060de60e`
|
||||||
|
- `global:query:abc123def456...`
|
||||||
|
- `local:keywords:789xyz...`
|
||||||
|
|
||||||
|
**Important Note**: This tool does NOT clean extraction caches (`default:extract:*` and `default:summary:*`). Use the migration tool or manual deletion for those caches.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- The tool reads storage configuration from environment variables or `config.ini`
|
||||||
|
- Ensure the target storage is properly configured and accessible
|
||||||
|
- Backup important data before running cleanup operations
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
Run from the LightRAG project root directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m lightrag.tools.clean_llm_query_cache
|
||||||
|
# or
|
||||||
|
python lightrag/tools/clean_llm_query_cache.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Interactive Workflow
|
||||||
|
|
||||||
|
The tool guides you through the following steps:
|
||||||
|
|
||||||
|
#### 1. Select Storage Type
|
||||||
|
```
|
||||||
|
============================================================
|
||||||
|
LLM Query Cache Cleanup Tool - LightRAG
|
||||||
|
============================================================
|
||||||
|
|
||||||
|
=== Storage Setup ===
|
||||||
|
|
||||||
|
Supported KV Storage Types:
|
||||||
|
[1] JsonKVStorage
|
||||||
|
[2] RedisKVStorage
|
||||||
|
[3] PGKVStorage
|
||||||
|
[4] MongoKVStorage
|
||||||
|
|
||||||
|
Select storage type (1-4) (Press Enter to exit): 1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: You can press Enter or type `0` at any prompt to exit gracefully.
|
||||||
|
|
||||||
|
#### 2. Storage Validation
|
||||||
|
The tool will:
|
||||||
|
- Check required environment variables
|
||||||
|
- Auto-detect workspace configuration
|
||||||
|
- Initialize and connect to storage
|
||||||
|
- Verify connection status
|
||||||
|
|
||||||
|
```
|
||||||
|
Checking configuration...
|
||||||
|
✓ All required environment variables are set
|
||||||
|
|
||||||
|
Initializing storage...
|
||||||
|
- Storage Type: JsonKVStorage
|
||||||
|
- Workspace: space1
|
||||||
|
- Connection Status: ✓ Success
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. View Cache Statistics
|
||||||
|
|
||||||
|
The tool displays a detailed breakdown of query caches by mode and type:
|
||||||
|
|
||||||
|
```
|
||||||
|
Counting query cache records...
|
||||||
|
|
||||||
|
📊 Query Cache Statistics (Before Cleanup):
|
||||||
|
┌────────────┬────────────┬────────────┬────────────┐
|
||||||
|
│ Mode │ Query │ Keywords │ Total │
|
||||||
|
├────────────┼────────────┼────────────┼────────────┤
|
||||||
|
│ mix │ 1,234 │ 567 │ 1,801 │
|
||||||
|
│ hybrid │ 890 │ 423 │ 1,313 │
|
||||||
|
│ local │ 2,345 │ 1,123 │ 3,468 │
|
||||||
|
│ global │ 678 │ 345 │ 1,023 │
|
||||||
|
├────────────┼────────────┼────────────┼────────────┤
|
||||||
|
│ Total │ 5,147 │ 2,458 │ 7,605 │
|
||||||
|
└────────────┴────────────┴────────────┴────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4. Select Cleanup Scope
|
||||||
|
|
||||||
|
Choose what type of caches to delete:
|
||||||
|
|
||||||
|
```
|
||||||
|
=== Cleanup Options ===
|
||||||
|
[1] Delete all query caches (both query and keywords)
|
||||||
|
[2] Delete query caches only (keep keywords)
|
||||||
|
[3] Delete keywords caches only (keep query)
|
||||||
|
[0] Cancel
|
||||||
|
|
||||||
|
Select cleanup option (0-3): 1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cleanup Types:**
|
||||||
|
- **Option 1 (all)**: Deletes both query and keywords caches across all modes
|
||||||
|
- **Option 2 (query)**: Deletes only query caches, preserves keywords caches
|
||||||
|
- **Option 3 (keywords)**: Deletes only keywords caches, preserves query caches
|
||||||
|
|
||||||
|
#### 5. Confirm Deletion
|
||||||
|
|
||||||
|
Review the cleanup plan and confirm:
|
||||||
|
|
||||||
|
```
|
||||||
|
============================================================
|
||||||
|
Cleanup Confirmation
|
||||||
|
============================================================
|
||||||
|
Storage: JsonKVStorage (workspace: space1)
|
||||||
|
Cleanup Type: all
|
||||||
|
Records to Delete: 7,605 / 7,605
|
||||||
|
|
||||||
|
⚠️ WARNING: This will delete ALL query caches across all modes!
|
||||||
|
|
||||||
|
Continue with deletion? (y/n): y
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 6. Execute Cleanup
|
||||||
|
|
||||||
|
The tool performs batch deletion with real-time progress:
|
||||||
|
|
||||||
|
**JsonKVStorage Example:**
|
||||||
|
```
|
||||||
|
=== Starting Cleanup ===
|
||||||
|
💡 Processing 1,000 records at a time from JsonKVStorage
|
||||||
|
|
||||||
|
Batch 1/8: ████░░░░░░░░░░░░░░░░ 1,000/7,605 (13.1%) ✓
|
||||||
|
Batch 2/8: ████████░░░░░░░░░░░░ 2,000/7,605 (26.3%) ✓
|
||||||
|
...
|
||||||
|
Batch 8/8: ████████████████████ 7,605/7,605 (100.0%) ✓
|
||||||
|
|
||||||
|
Persisting changes to storage...
|
||||||
|
✓ Changes persisted successfully
|
||||||
|
```
|
||||||
|
|
||||||
|
**RedisKVStorage Example:**
|
||||||
|
```
|
||||||
|
=== Starting Cleanup ===
|
||||||
|
💡 Processing Redis keys in batches of 1,000
|
||||||
|
|
||||||
|
Batch 1: Deleted 1,000 keys (Total: 1,000) ✓
|
||||||
|
Batch 2: Deleted 1,000 keys (Total: 2,000) ✓
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
**PostgreSQL Example:**
|
||||||
|
```
|
||||||
|
=== Starting Cleanup ===
|
||||||
|
💡 Executing PostgreSQL DELETE query
|
||||||
|
|
||||||
|
✓ Deleted 7,605 records in 0.45s
|
||||||
|
```
|
||||||
|
|
||||||
|
**MongoDB Example:**
|
||||||
|
```
|
||||||
|
=== Starting Cleanup ===
|
||||||
|
💡 Executing MongoDB deleteMany operations
|
||||||
|
|
||||||
|
Pattern 1/8: Deleted 1,234 records ✓
|
||||||
|
Pattern 2/8: Deleted 567 records ✓
|
||||||
|
...
|
||||||
|
Total deleted: 7,605 records
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 7. Review Cleanup Report
|
||||||
|
|
||||||
|
The tool provides a comprehensive final report:
|
||||||
|
|
||||||
|
**Successful Cleanup:**
|
||||||
|
```
|
||||||
|
============================================================
|
||||||
|
Cleanup Complete - Final Report
|
||||||
|
============================================================
|
||||||
|
|
||||||
|
📊 Statistics:
|
||||||
|
Total records to delete: 7,605
|
||||||
|
Total batches: 8
|
||||||
|
Successful batches: 8
|
||||||
|
Failed batches: 0
|
||||||
|
Successfully deleted: 7,605
|
||||||
|
Failed to delete: 0
|
||||||
|
Success rate: 100.00%
|
||||||
|
|
||||||
|
📈 Before/After Comparison:
|
||||||
|
Total caches before: 7,605
|
||||||
|
Total caches after: 0
|
||||||
|
Net reduction: 7,605
|
||||||
|
|
||||||
|
============================================================
|
||||||
|
✓ SUCCESS: All records cleaned up successfully!
|
||||||
|
============================================================
|
||||||
|
|
||||||
|
📊 Query Cache Statistics (After Cleanup):
|
||||||
|
┌────────────┬────────────┬────────────┬────────────┐
|
||||||
|
│ Mode │ Query │ Keywords │ Total │
|
||||||
|
├────────────┼────────────┼────────────┼────────────┤
|
||||||
|
│ mix │ 0 │ 0 │ 0 │
|
||||||
|
│ hybrid │ 0 │ 0 │ 0 │
|
||||||
|
│ local │ 0 │ 0 │ 0 │
|
||||||
|
│ global │ 0 │ 0 │ 0 │
|
||||||
|
├────────────┼────────────┼────────────┼────────────┤
|
||||||
|
│ Total │ 0 │ 0 │ 0 │
|
||||||
|
└────────────┴────────────┴────────────┴────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cleanup with Errors:**
|
||||||
|
```
|
||||||
|
============================================================
|
||||||
|
Cleanup Complete - Final Report
|
||||||
|
============================================================
|
||||||
|
|
||||||
|
📊 Statistics:
|
||||||
|
Total records to delete: 7,605
|
||||||
|
Total batches: 8
|
||||||
|
Successful batches: 7
|
||||||
|
Failed batches: 1
|
||||||
|
Successfully deleted: 6,605
|
||||||
|
Failed to delete: 1,000
|
||||||
|
Success rate: 86.85%
|
||||||
|
|
||||||
|
📈 Before/After Comparison:
|
||||||
|
Total caches before: 7,605
|
||||||
|
Total caches after: 1,000
|
||||||
|
Net reduction: 6,605
|
||||||
|
|
||||||
|
⚠️ Errors encountered: 1
|
||||||
|
|
||||||
|
Error Details:
|
||||||
|
------------------------------------------------------------
|
||||||
|
|
||||||
|
Error Summary:
|
||||||
|
- ConnectionError: 1 occurrence(s)
|
||||||
|
|
||||||
|
First 5 errors:
|
||||||
|
|
||||||
|
1. Batch 3
|
||||||
|
Type: ConnectionError
|
||||||
|
Message: Connection timeout after 30s
|
||||||
|
Records lost: 1,000
|
||||||
|
|
||||||
|
============================================================
|
||||||
|
⚠️ WARNING: Cleanup completed with errors!
|
||||||
|
Please review the error details above.
|
||||||
|
============================================================
|
||||||
|
```
|
||||||
|
|
||||||
|
## Technical Details
|
||||||
|
|
||||||
|
### Workspace Handling
|
||||||
|
|
||||||
|
The tool retrieves workspace in the following priority order:
|
||||||
|
|
||||||
|
1. **Storage-specific workspace environment variables**
|
||||||
|
- PGKVStorage: `POSTGRES_WORKSPACE`
|
||||||
|
- MongoKVStorage: `MONGODB_WORKSPACE`
|
||||||
|
- RedisKVStorage: `REDIS_WORKSPACE`
|
||||||
|
|
||||||
|
2. **Generic workspace environment variable**
|
||||||
|
- `WORKSPACE`
|
||||||
|
|
||||||
|
3. **Default value**
|
||||||
|
- Empty string (uses storage's default workspace)
|
||||||
|
|
||||||
|
### Batch Deletion
|
||||||
|
|
||||||
|
- Default batch size: 1000 records/batch
|
||||||
|
- Prevents memory overflow and connection timeouts
|
||||||
|
- Each batch is processed independently
|
||||||
|
- Failed batches are logged but don't stop cleanup
|
||||||
|
|
||||||
|
### Storage-Specific Deletion Strategies
|
||||||
|
|
||||||
|
#### JsonKVStorage
|
||||||
|
- Collects all matching keys first (snapshot approach)
|
||||||
|
- Deletes in batches with lock protection
|
||||||
|
- Fast in-memory operations
|
||||||
|
|
||||||
|
#### RedisKVStorage
|
||||||
|
- Uses SCAN with pattern matching
|
||||||
|
- Pipeline DELETE for batch operations
|
||||||
|
- Cursor-based iteration for large datasets
|
||||||
|
|
||||||
|
#### PostgreSQL
|
||||||
|
- Single DELETE query with OR conditions
|
||||||
|
- Efficient server-side bulk deletion
|
||||||
|
- Uses LIKE patterns for mode/type matching
|
||||||
|
|
||||||
|
#### MongoDB
|
||||||
|
- Multiple deleteMany operations (one per pattern)
|
||||||
|
- Regex-based document matching
|
||||||
|
- Returns exact deletion counts
|
||||||
|
|
||||||
|
### Pattern Matching Implementation
|
||||||
|
|
||||||
|
**JsonKVStorage:**
|
||||||
|
```python
|
||||||
|
# Direct key prefix matching
|
||||||
|
if key.startswith("mix:query:") or key.startswith("mix:keywords:")
|
||||||
|
```
|
||||||
|
|
||||||
|
**RedisKVStorage:**
|
||||||
|
```python
|
||||||
|
# SCAN with namespace-prefixed patterns
|
||||||
|
pattern = f"{namespace}:mix:query:*"
|
||||||
|
cursor, keys = await redis.scan(cursor, match=pattern)
|
||||||
|
```
|
||||||
|
|
||||||
|
**PostgreSQL:**
|
||||||
|
```python
|
||||||
|
# SQL LIKE conditions
|
||||||
|
WHERE id LIKE 'mix:query:%' OR id LIKE 'mix:keywords:%'
|
||||||
|
```
|
||||||
|
|
||||||
|
**MongoDB:**
|
||||||
|
```python
|
||||||
|
# Regex queries on _id field
|
||||||
|
{"_id": {"$regex": "^mix:query:"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Error Handling & Resilience
|
||||||
|
|
||||||
|
The tool implements comprehensive error tracking:
|
||||||
|
|
||||||
|
### Batch-Level Error Tracking
|
||||||
|
- Each batch is independently error-checked
|
||||||
|
- Failed batches are logged with full details
|
||||||
|
- Successful batches commit even if later batches fail
|
||||||
|
- Real-time progress shows ✓ (success) or ✗ (failed)
|
||||||
|
|
||||||
|
### Error Reporting
|
||||||
|
After cleanup completes, a detailed report includes:
|
||||||
|
- **Statistics**: Total records, success/failure counts, success rate
|
||||||
|
- **Before/After Comparison**: Net reduction in cache count
|
||||||
|
- **Error Summary**: Grouped by error type with occurrence counts
|
||||||
|
- **Error Details**: Batch number, error type, message, and records lost
|
||||||
|
- **Recommendations**: Clear indication of success or need for review
|
||||||
|
|
||||||
|
### Verification
|
||||||
|
- Post-cleanup count verification
|
||||||
|
- Before/after statistics comparison
|
||||||
|
- Identifies partial cleanup scenarios
|
||||||
|
|
||||||
|
## Important Notes
|
||||||
|
|
||||||
|
1. **Irreversible Operation**
|
||||||
|
- Deleted caches cannot be recovered
|
||||||
|
- Always backup important data before cleanup
|
||||||
|
- Test on non-production data first
|
||||||
|
|
||||||
|
2. **Performance Impact**
|
||||||
|
- Query performance may degrade temporarily after cleanup
|
||||||
|
- Caches will rebuild on subsequent queries
|
||||||
|
- Consider cleanup during off-peak hours
|
||||||
|
|
||||||
|
3. **Selective Cleanup**
|
||||||
|
- Choose cleanup scope carefully
|
||||||
|
- Keywords caches may be valuable for future queries
|
||||||
|
- Query caches rebuild faster than keywords caches
|
||||||
|
|
||||||
|
4. **Workspace Isolation**
|
||||||
|
- Cleanup only affects the selected workspace
|
||||||
|
- Other workspaces remain untouched
|
||||||
|
- Verify workspace before confirming cleanup
|
||||||
|
|
||||||
|
5. **Interrupt and Resume**
|
||||||
|
- Cleanup can be interrupted at any time (Ctrl+C)
|
||||||
|
- Already deleted records cannot be recovered
|
||||||
|
- No automatic resume - must run tool again
|
||||||
|
|
||||||
|
## Storage Configuration
|
||||||
|
|
||||||
|
The tool supports multiple configuration methods with the following priority:
|
||||||
|
|
||||||
|
1. **Environment variables** (highest priority)
|
||||||
|
2. **config.ini file** (medium priority)
|
||||||
|
3. **Default values** (lowest priority)
|
||||||
|
|
||||||
|
### Environment Variable Configuration
|
||||||
|
|
||||||
|
Configure storage settings in your `.env` file:
|
||||||
|
|
||||||
|
#### Workspace Configuration (Optional)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generic workspace (shared by all storages)
|
||||||
|
WORKSPACE=space1
|
||||||
|
|
||||||
|
# Or configure independent workspace for specific storage
|
||||||
|
POSTGRES_WORKSPACE=pg_space
|
||||||
|
MONGODB_WORKSPACE=mongo_space
|
||||||
|
REDIS_WORKSPACE=redis_space
|
||||||
|
```
|
||||||
|
|
||||||
|
**Workspace Priority**: Storage-specific > Generic WORKSPACE > Empty string
|
||||||
|
|
||||||
|
#### JsonKVStorage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
WORKING_DIR=./rag_storage
|
||||||
|
```
|
||||||
|
|
||||||
|
#### RedisKVStorage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
REDIS_URI=redis://localhost:6379
|
||||||
|
```
|
||||||
|
|
||||||
|
#### PGKVStorage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
POSTGRES_HOST=localhost
|
||||||
|
POSTGRES_PORT=5432
|
||||||
|
POSTGRES_USER=your_username
|
||||||
|
POSTGRES_PASSWORD=your_password
|
||||||
|
POSTGRES_DATABASE=your_database
|
||||||
|
```
|
||||||
|
|
||||||
|
#### MongoKVStorage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
MONGO_URI=mongodb://root:root@localhost:27017/
|
||||||
|
MONGO_DATABASE=LightRAG
|
||||||
|
```
|
||||||
|
|
||||||
|
### config.ini Configuration
|
||||||
|
|
||||||
|
Alternatively, create a `config.ini` file in the project root:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[redis]
|
||||||
|
uri = redis://localhost:6379
|
||||||
|
|
||||||
|
[postgres]
|
||||||
|
host = localhost
|
||||||
|
port = 5432
|
||||||
|
user = postgres
|
||||||
|
password = yourpassword
|
||||||
|
database = lightrag
|
||||||
|
|
||||||
|
[mongodb]
|
||||||
|
uri = mongodb://root:root@localhost:27017/
|
||||||
|
database = LightRAG
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: Environment variables take precedence over config.ini settings.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Missing Environment Variables
|
||||||
|
```
|
||||||
|
⚠️ Warning: Missing environment variables: POSTGRES_USER, POSTGRES_PASSWORD
|
||||||
|
```
|
||||||
|
**Solution**: Add missing variables to your `.env` file or configure in `config.ini`
|
||||||
|
|
||||||
|
### Connection Failed
|
||||||
|
```
|
||||||
|
✗ Initialization failed: Connection refused
|
||||||
|
```
|
||||||
|
**Solutions**:
|
||||||
|
- Check if database service is running
|
||||||
|
- Verify connection parameters (host, port, credentials)
|
||||||
|
- Check firewall settings
|
||||||
|
- Ensure network connectivity for remote databases
|
||||||
|
|
||||||
|
### No Caches Found
|
||||||
|
```
|
||||||
|
⚠️ No query caches found in storage
|
||||||
|
```
|
||||||
|
**Possible Reasons**:
|
||||||
|
- No queries have been run yet
|
||||||
|
- Caches were already cleaned
|
||||||
|
- Wrong workspace selected
|
||||||
|
- Different storage type was used for queries
|
||||||
|
|
||||||
|
### Partial Cleanup
|
||||||
|
```
|
||||||
|
⚠️ WARNING: Cleanup completed with errors!
|
||||||
|
```
|
||||||
|
**Solutions**:
|
||||||
|
- Check error details in the report
|
||||||
|
- Verify storage connection stability
|
||||||
|
- Re-run tool to clean remaining caches
|
||||||
|
- Check storage capacity and permissions
|
||||||
|
|
||||||
|
## Use Cases
|
||||||
|
|
||||||
|
### Use Case 1: Clean All Query Caches
|
||||||
|
|
||||||
|
**Scenario**: Free up storage space by removing all query caches
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run tool
|
||||||
|
python -m lightrag.tools.clean_llm_query_cache
|
||||||
|
|
||||||
|
# Select: Storage type -> Option 1 (all) -> Confirm (y)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result**: All query and keywords caches deleted, maximum storage freed
|
||||||
|
|
||||||
|
### Use Case 2: Refresh Query Caches Only
|
||||||
|
|
||||||
|
**Scenario**: Force query cache rebuild while keeping keywords
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run tool
|
||||||
|
python -m lightrag.tools.clean_llm_query_cache
|
||||||
|
|
||||||
|
# Select: Storage type -> Option 2 (query only) -> Confirm (y)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result**: Query caches deleted, keywords preserved for faster rebuild
|
||||||
|
|
||||||
|
### Use Case 3: Clean Stale Keywords
|
||||||
|
|
||||||
|
**Scenario**: Remove outdated keywords while keeping recent query results
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run tool
|
||||||
|
python -m lightrag.tools.clean_llm_query_cache
|
||||||
|
|
||||||
|
# Select: Storage type -> Option 3 (keywords only) -> Confirm (y)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result**: Keywords deleted, query caches preserved
|
||||||
|
|
||||||
|
### Use Case 4: Workspace-Specific Cleanup
|
||||||
|
|
||||||
|
**Scenario**: Clean caches for a specific workspace
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Configure workspace
|
||||||
|
export WORKSPACE=development
|
||||||
|
|
||||||
|
# Run tool
|
||||||
|
python -m lightrag.tools.clean_llm_query_cache
|
||||||
|
|
||||||
|
# Select: Storage type -> Cleanup option -> Confirm (y)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Result**: Only development workspace caches cleaned
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
1. **Backup Before Cleanup**
|
||||||
|
- Always backup your storage before major cleanup
|
||||||
|
- Test cleanup on non-production data first
|
||||||
|
- Document cleanup decisions
|
||||||
|
|
||||||
|
2. **Monitor Performance**
|
||||||
|
- Watch storage metrics during cleanup
|
||||||
|
- Monitor query performance after cleanup
|
||||||
|
- Allow time for cache rebuild
|
||||||
|
|
||||||
|
3. **Scheduled Cleanup**
|
||||||
|
- Clean caches periodically (weekly/monthly)
|
||||||
|
- Automate cleanup for development environments
|
||||||
|
- Keep production cleanup manual for safety
|
||||||
|
|
||||||
|
4. **Selective Deletion**
|
||||||
|
- Consider cleanup scope based on needs
|
||||||
|
- Keywords caches are harder to rebuild
|
||||||
|
- Query caches rebuild automatically
|
||||||
|
|
||||||
|
5. **Storage Capacity**
|
||||||
|
- Monitor storage usage trends
|
||||||
|
- Clean caches before reaching capacity limits
|
||||||
|
- Archive old data if needed
|
||||||
|
|
||||||
|
## Comparison with Migration Tool
|
||||||
|
|
||||||
|
| Feature | Cleanup Tool | Migration Tool |
|
||||||
|
|---------|-------------|----------------|
|
||||||
|
| **Purpose** | Delete query caches | Migrate extraction caches |
|
||||||
|
| **Cache Types** | mix/hybrid/local/global | default:extract/summary |
|
||||||
|
| **Modes** | query, keywords | extract, summary |
|
||||||
|
| **Operation** | Deletion | Copy between storages |
|
||||||
|
| **Reversible** | No | Yes (source unchanged) |
|
||||||
|
| **Use Case** | Free storage, refresh caches | Change storage backend |
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
1. **Single Storage Operation**
|
||||||
|
- Can only clean one storage type at a time
|
||||||
|
- To clean multiple storages, run tool multiple times
|
||||||
|
|
||||||
|
2. **No Dry Run Mode**
|
||||||
|
- Deletion is immediate after confirmation
|
||||||
|
- No preview-only mode available
|
||||||
|
- Test on non-production first
|
||||||
|
|
||||||
|
3. **No Selective Mode Cleanup**
|
||||||
|
- Cannot clean only specific modes (e.g., only `mix`)
|
||||||
|
- Cleanup applies to all modes for selected cache type
|
||||||
|
- All-or-nothing per cache type
|
||||||
|
|
||||||
|
4. **No Scheduled Cleanup**
|
||||||
|
- Manual execution required
|
||||||
|
- No built-in scheduling
|
||||||
|
- Use cron/scheduler if automation needed
|
||||||
|
|
||||||
|
5. **Verification Limitations**
|
||||||
|
- Post-cleanup verification may fail in error scenarios
|
||||||
|
- Manual verification recommended for critical operations
|
||||||
|
|
||||||
|
## Future Enhancements
|
||||||
|
|
||||||
|
Potential improvements for future versions:
|
||||||
|
|
||||||
|
- Selective mode cleanup (e.g., clean only `mix` mode)
|
||||||
|
- Age-based cleanup (delete caches older than X days)
|
||||||
|
- Size-based cleanup (delete largest caches first)
|
||||||
|
- Dry run mode for safe preview
|
||||||
|
- Automated scheduling support
|
||||||
|
- Cache statistics export
|
||||||
|
- Incremental cleanup with pause/resume
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For issues, questions, or feature requests:
|
||||||
|
- Check the error details in the cleanup report
|
||||||
|
- Review storage configuration
|
||||||
|
- Verify workspace settings
|
||||||
|
- Test with a small dataset first
|
||||||
|
- Report bugs through project issue tracker
|
||||||
|
|
@ -463,7 +463,7 @@ class CleanupTool:
|
||||||
|
|
||||||
# CRITICAL: Set update flag so changes persist to disk
|
# CRITICAL: Set update flag so changes persist to disk
|
||||||
# Without this, deletions remain in-memory only and are lost on exit
|
# Without this, deletions remain in-memory only and are lost on exit
|
||||||
await set_all_update_flags(storage.final_namespace, storage.workspace)
|
await set_all_update_flags(storage.final_namespace)
|
||||||
|
|
||||||
# Success
|
# Success
|
||||||
stats.successful_batches += 1
|
stats.successful_batches += 1
|
||||||
|
|
@ -719,7 +719,7 @@ class CleanupTool:
|
||||||
"""
|
"""
|
||||||
print(f"\n{title}")
|
print(f"\n{title}")
|
||||||
print("┌" + "─" * 12 + "┬" + "─" * 12 + "┬" + "─" * 12 + "┬" + "─" * 12 + "┐")
|
print("┌" + "─" * 12 + "┬" + "─" * 12 + "┬" + "─" * 12 + "┬" + "─" * 12 + "┐")
|
||||||
print(f"│ {'Mode':<10} │ {'Query':>10} │ {'Keywords':>10} │ {'Total':>10} │")
|
print(f"│ {'Mode':<10} │ {'Query':<10} │ {'Keywords':<10} │ {'Total':<10} │")
|
||||||
print("├" + "─" * 12 + "┼" + "─" * 12 + "┼" + "─" * 12 + "┼" + "─" * 12 + "┤")
|
print("├" + "─" * 12 + "┼" + "─" * 12 + "┼" + "─" * 12 + "┼" + "─" * 12 + "┤")
|
||||||
|
|
||||||
total_query = 0
|
total_query = 0
|
||||||
|
|
@ -873,31 +873,6 @@ class CleanupTool:
|
||||||
|
|
||||||
storage_name = STORAGE_TYPES[choice]
|
storage_name = STORAGE_TYPES[choice]
|
||||||
|
|
||||||
# Special warning for JsonKVStorage about concurrent access
|
|
||||||
if storage_name == "JsonKVStorage":
|
|
||||||
print("\n" + "=" * 60)
|
|
||||||
print(f"{BOLD_RED}⚠️ IMPORTANT WARNING - JsonKVStorage Concurrency{RESET}")
|
|
||||||
print("=" * 60)
|
|
||||||
print("\nJsonKVStorage is an in-memory database that does NOT support")
|
|
||||||
print("concurrent access to the same file by multiple programs.")
|
|
||||||
print("\nBefore proceeding, please ensure that:")
|
|
||||||
print(" • LightRAG Server is completely shut down")
|
|
||||||
print(" • No other programs are accessing the storage files")
|
|
||||||
print("\n" + "=" * 60)
|
|
||||||
|
|
||||||
confirm = (
|
|
||||||
input("\nHas LightRAG Server been shut down? (yes/no): ")
|
|
||||||
.strip()
|
|
||||||
.lower()
|
|
||||||
)
|
|
||||||
if confirm != "yes":
|
|
||||||
print(
|
|
||||||
"\n✓ Operation cancelled - Please shut down LightRAG Server first"
|
|
||||||
)
|
|
||||||
return None, None, None
|
|
||||||
|
|
||||||
print("✓ Proceeding with JsonKVStorage cleanup...")
|
|
||||||
|
|
||||||
# Check configuration (warnings only, doesn't block)
|
# Check configuration (warnings only, doesn't block)
|
||||||
print("\nChecking configuration...")
|
print("\nChecking configuration...")
|
||||||
self.check_env_vars(storage_name)
|
self.check_env_vars(storage_name)
|
||||||
|
|
@ -1006,36 +981,18 @@ class CleanupTool:
|
||||||
return
|
return
|
||||||
elif choice == "1":
|
elif choice == "1":
|
||||||
cleanup_type = "all"
|
cleanup_type = "all"
|
||||||
|
break
|
||||||
elif choice == "2":
|
elif choice == "2":
|
||||||
cleanup_type = "query"
|
cleanup_type = "query"
|
||||||
|
break
|
||||||
elif choice == "3":
|
elif choice == "3":
|
||||||
cleanup_type = "keywords"
|
cleanup_type = "keywords"
|
||||||
|
break
|
||||||
else:
|
else:
|
||||||
print("✗ Invalid choice. Please enter 0, 1, 2, or 3")
|
print("✗ Invalid choice. Please enter 0, 1, 2, or 3")
|
||||||
continue
|
|
||||||
|
|
||||||
# Calculate total to delete for the selected type
|
# Calculate total to delete
|
||||||
stats.total_to_delete = self.calculate_total_to_delete(
|
stats.total_to_delete = self.calculate_total_to_delete(counts, cleanup_type)
|
||||||
counts, cleanup_type
|
|
||||||
)
|
|
||||||
|
|
||||||
# Check if there are any records to delete
|
|
||||||
if stats.total_to_delete == 0:
|
|
||||||
if cleanup_type == "all":
|
|
||||||
print(f"\n{BOLD_RED}⚠️ No query caches found to delete!{RESET}")
|
|
||||||
elif cleanup_type == "query":
|
|
||||||
print(
|
|
||||||
f"\n{BOLD_RED}⚠️ No query caches found to delete! (Only keywords exist){RESET}"
|
|
||||||
)
|
|
||||||
elif cleanup_type == "keywords":
|
|
||||||
print(
|
|
||||||
f"\n{BOLD_RED}⚠️ No keywords caches found to delete! (Only query caches exist){RESET}"
|
|
||||||
)
|
|
||||||
print(" Please select a different cleanup option.\n")
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Valid selection with records to delete
|
|
||||||
break
|
|
||||||
|
|
||||||
# Confirm deletion
|
# Confirm deletion
|
||||||
print("\n" + "=" * 60)
|
print("\n" + "=" * 60)
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue