LightRAG/docs/OfflineDeployment.md
yangdx a5c05f1b92 Add offline deployment support with cache management and layered deps
• Add tiktoken cache downloader CLI
• Add layered offline dependencies
• Add offline requirements files
• Add offline deployment guide
2025-10-11 10:28:14 +08:00

311 lines
8.4 KiB
Markdown

# LightRAG Offline Deployment Guide
This guide provides comprehensive instructions for deploying LightRAG in offline environments where internet access is limited or unavailable.
## Table of Contents
- [Overview](#overview)
- [Quick Start](#quick-start)
- [Layered Dependencies](#layered-dependencies)
- [Tiktoken Cache Management](#tiktoken-cache-management)
- [Complete Offline Deployment Workflow](#complete-offline-deployment-workflow)
- [Troubleshooting](#troubleshooting)
## Overview
LightRAG uses dynamic package installation (`pipmaster`) for optional features based on file types and configurations. In offline environments, these dynamic installations will fail. This guide shows you how to pre-install all necessary dependencies and cache files.
### What Gets Dynamically Installed?
LightRAG dynamically installs packages for:
- **Document Processing**: `docling`, `pypdf2`, `python-docx`, `python-pptx`, `openpyxl`
- **Storage Backends**: `redis`, `neo4j`, `pymilvus`, `pymongo`, `asyncpg`, `qdrant-client`
- **LLM Providers**: `openai`, `anthropic`, `ollama`, `zhipuai`, `aioboto3`, `voyageai`, `llama-index`, `lmdeploy`, `transformers`, `torch`
- Tiktoken Models**: BPE encoding models downloaded from OpenAI CDN
## Quick Start
### Option 1: Using pip with Offline Extras
```bash
# Online environment: Install all offline dependencies
pip install lightrag-hku[offline]
# Download tiktoken cache
lightrag-download-cache
# Create offline package
pip download lightrag-hku[offline] -d ./offline-packages
tar -czf lightrag-offline.tar.gz ./offline-packages ~/.tiktoken_cache
# Transfer to offline server
scp lightrag-offline.tar.gz user@offline-server:/path/to/
# Offline environment: Install
tar -xzf lightrag-offline.tar.gz
pip install --no-index --find-links=./offline-packages lightrag-hku[offline]
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
```
### Option 2: Using Requirements Files
```bash
# Online environment: Download packages
pip download -r requirements-offline.txt -d ./packages
# Transfer to offline server
tar -czf packages.tar.gz ./packages
scp packages.tar.gz user@offline-server:/path/to/
# Offline environment: Install
tar -xzf packages.tar.gz
pip install --no-index --find-links=./packages -r requirements-offline.txt
```
## Layered Dependencies
LightRAG provides flexible dependency groups for different use cases:
### Available Dependency Groups
| Group | Description | Use Case |
|-------|-------------|----------|
| `offline-docs` | Document processing | PDF, DOCX, PPTX, XLSX files |
| `offline-storage` | Storage backends | Redis, Neo4j, MongoDB, PostgreSQL, etc. |
| `offline-llm` | LLM providers | OpenAI, Anthropic, Ollama, etc. |
| `offline` | All of the above | Complete offline deployment |
### Installation Examples
```bash
# Install only document processing dependencies
pip install lightrag-hku[offline-docs]
# Install document processing and storage backends
pip install lightrag-hku[offline-docs,offline-storage]
# Install all offline dependencies
pip install lightrag-hku[offline]
```
### Using Individual Requirements Files
```bash
# Document processing only
pip install -r requirements-offline-docs.txt
# Storage backends only
pip install -r requirements-offline-storage.txt
# LLM providers only
pip install -r requirements-offline-llm.txt
# All offline dependencies
pip install -r requirements-offline.txt
```
## Tiktoken Cache Management
Tiktoken downloads BPE encoding models on first use. In offline environments, you must pre-download these models.
### Using the CLI Command
After installing LightRAG, use the built-in command:
```bash
# Download to default location (~/.tiktoken_cache)
lightrag-download-cache
# Download to specific directory
lightrag-download-cache --cache-dir ./tiktoken_cache
# Download specific models only
lightrag-download-cache --models gpt-4o-mini gpt-4
```
### Default Models Downloaded
- `gpt-4o-mini` (LightRAG default)
- `gpt-4o`
- `gpt-4`
- `gpt-3.5-turbo`
- `text-embedding-ada-002`
- `text-embedding-3-small`
- `text-embedding-3-large`
### Setting Cache Location in Offline Environment
```bash
# Option 1: Environment variable (temporary)
export TIKTOKEN_CACHE_DIR=/path/to/tiktoken_cache
# Option 2: Add to ~/.bashrc or ~/.zshrc (persistent)
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
source ~/.bashrc
# Option 3: Copy to default location
cp -r /path/to/tiktoken_cache ~/.tiktoken_cache/
```
## Complete Offline Deployment Workflow
### Step 1: Prepare in Online Environment
```bash
# 1. Install LightRAG with offline dependencies
pip install lightrag-hku[offline]
# 2. Download tiktoken cache
lightrag-download-cache --cache-dir ./offline_cache/tiktoken
# 3. Download all Python packages
pip download lightrag-hku[offline] -d ./offline_cache/packages
# 4. Create archive for transfer
tar -czf lightrag-offline-complete.tar.gz ./offline_cache
# 5. Verify contents
tar -tzf lightrag-offline-complete.tar.gz | head -20
```
### Step 2: Transfer to Offline Environment
```bash
# Using scp
scp lightrag-offline-complete.tar.gz user@offline-server:/tmp/
# Or using USB/physical media
# Copy lightrag-offline-complete.tar.gz to USB drive
```
### Step 3: Install in Offline Environment
```bash
# 1. Extract archive
cd /tmp
tar -xzf lightrag-offline-complete.tar.gz
# 2. Install Python packages
pip install --no-index \
--find-links=/tmp/offline_cache/packages \
lightrag-hku[offline]
# 3. Set up tiktoken cache
mkdir -p ~/.tiktoken_cache
cp -r /tmp/offline_cache/tiktoken/* ~/.tiktoken_cache/
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
# 4. Add to shell profile for persistence
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
```
### Step 4: Verify Installation
```bash
# Test Python import
python -c "from lightrag import LightRAG; print('✓ LightRAG imported')"
# Test tiktoken
python -c "from lightrag.utils import TiktokenTokenizer; t = TiktokenTokenizer(); print('✓ Tiktoken working')"
# Test optional dependencies (if installed)
python -c "import docling; print('✓ Docling available')"
python -c "import redis; print('✓ Redis available')"
```
## Troubleshooting
### Issue: Tiktoken fails with network error
**Problem**: `Unable to load tokenizer for model gpt-4o-mini`
**Solution**:
```bash
# Ensure TIKTOKEN_CACHE_DIR is set
echo $TIKTOKEN_CACHE_DIR
# Verify cache files exist
ls -la ~/.tiktoken_cache/
# If empty, you need to download cache in online environment first
```
### Issue: Dynamic package installation fails
**Problem**: `Error installing package xxx`
**Solution**:
```bash
# Pre-install the specific package you need
# For document processing:
pip install lightrag-hku[offline-docs]
# For storage backends:
pip install lightrag-hku[offline-storage]
# For LLM providers:
pip install lightrag-hku[offline-llm]
```
### Issue: Missing dependencies at runtime
**Problem**: `ModuleNotFoundError: No module named 'xxx'`
**Solution**:
```bash
# Check what you have installed
pip list | grep -i xxx
# Install missing component
pip install lightrag-hku[offline] # Install all offline deps
```
### Issue: Permission denied on tiktoken cache
**Problem**: `PermissionError: [Errno 13] Permission denied`
**Solution**:
```bash
# Ensure cache directory has correct permissions
chmod 755 ~/.tiktoken_cache
chmod 644 ~/.tiktoken_cache/*
# Or use a user-writable directory
export TIKTOKEN_CACHE_DIR=~/my_tiktoken_cache
mkdir -p ~/my_tiktoken_cache
```
## Best Practices
1. **Test in Online Environment First**: Always test your complete setup in an online environment before going offline.
2. **Keep Cache Updated**: Periodically update your offline cache when new models are released.
3. **Document Your Setup**: Keep notes on which optional dependencies you actually need.
4. **Version Pinning**: Consider pinning specific versions in production:
```bash
pip freeze > requirements-production.txt
```
5. **Minimal Installation**: Only install what you need:
```bash
# If you only process PDFs with OpenAI
pip install lightrag-hku[offline-docs]
# Then manually add: pip install openai
```
## Additional Resources
- [LightRAG GitHub Repository](https://github.com/HKUDS/LightRAG)
- [Docker Deployment Guide](./DockerDeployment.md)
- [API Documentation](../lightrag/api/README.md)
## Support
If you encounter issues not covered in this guide:
1. Check the [GitHub Issues](https://github.com/HKUDS/LightRAG/issues)
2. Review the [project documentation](../README.md)
3. Create a new issue with your offline deployment details