• Add tiktoken cache downloader CLI • Add layered offline dependencies • Add offline requirements files • Add offline deployment guide
311 lines
8.4 KiB
Markdown
311 lines
8.4 KiB
Markdown
# LightRAG Offline Deployment Guide
|
|
|
|
This guide provides comprehensive instructions for deploying LightRAG in offline environments where internet access is limited or unavailable.
|
|
|
|
## Table of Contents
|
|
|
|
- [Overview](#overview)
|
|
- [Quick Start](#quick-start)
|
|
- [Layered Dependencies](#layered-dependencies)
|
|
- [Tiktoken Cache Management](#tiktoken-cache-management)
|
|
- [Complete Offline Deployment Workflow](#complete-offline-deployment-workflow)
|
|
- [Troubleshooting](#troubleshooting)
|
|
|
|
## Overview
|
|
|
|
LightRAG uses dynamic package installation (`pipmaster`) for optional features based on file types and configurations. In offline environments, these dynamic installations will fail. This guide shows you how to pre-install all necessary dependencies and cache files.
|
|
|
|
### What Gets Dynamically Installed?
|
|
|
|
LightRAG dynamically installs packages for:
|
|
|
|
- **Document Processing**: `docling`, `pypdf2`, `python-docx`, `python-pptx`, `openpyxl`
|
|
- **Storage Backends**: `redis`, `neo4j`, `pymilvus`, `pymongo`, `asyncpg`, `qdrant-client`
|
|
- **LLM Providers**: `openai`, `anthropic`, `ollama`, `zhipuai`, `aioboto3`, `voyageai`, `llama-index`, `lmdeploy`, `transformers`, `torch`
|
|
- Tiktoken Models**: BPE encoding models downloaded from OpenAI CDN
|
|
|
|
## Quick Start
|
|
|
|
### Option 1: Using pip with Offline Extras
|
|
|
|
```bash
|
|
# Online environment: Install all offline dependencies
|
|
pip install lightrag-hku[offline]
|
|
|
|
# Download tiktoken cache
|
|
lightrag-download-cache
|
|
|
|
# Create offline package
|
|
pip download lightrag-hku[offline] -d ./offline-packages
|
|
tar -czf lightrag-offline.tar.gz ./offline-packages ~/.tiktoken_cache
|
|
|
|
# Transfer to offline server
|
|
scp lightrag-offline.tar.gz user@offline-server:/path/to/
|
|
|
|
# Offline environment: Install
|
|
tar -xzf lightrag-offline.tar.gz
|
|
pip install --no-index --find-links=./offline-packages lightrag-hku[offline]
|
|
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
|
|
```
|
|
|
|
### Option 2: Using Requirements Files
|
|
|
|
```bash
|
|
# Online environment: Download packages
|
|
pip download -r requirements-offline.txt -d ./packages
|
|
|
|
# Transfer to offline server
|
|
tar -czf packages.tar.gz ./packages
|
|
scp packages.tar.gz user@offline-server:/path/to/
|
|
|
|
# Offline environment: Install
|
|
tar -xzf packages.tar.gz
|
|
pip install --no-index --find-links=./packages -r requirements-offline.txt
|
|
```
|
|
|
|
## Layered Dependencies
|
|
|
|
LightRAG provides flexible dependency groups for different use cases:
|
|
|
|
### Available Dependency Groups
|
|
|
|
| Group | Description | Use Case |
|
|
|-------|-------------|----------|
|
|
| `offline-docs` | Document processing | PDF, DOCX, PPTX, XLSX files |
|
|
| `offline-storage` | Storage backends | Redis, Neo4j, MongoDB, PostgreSQL, etc. |
|
|
| `offline-llm` | LLM providers | OpenAI, Anthropic, Ollama, etc. |
|
|
| `offline` | All of the above | Complete offline deployment |
|
|
|
|
### Installation Examples
|
|
|
|
```bash
|
|
# Install only document processing dependencies
|
|
pip install lightrag-hku[offline-docs]
|
|
|
|
# Install document processing and storage backends
|
|
pip install lightrag-hku[offline-docs,offline-storage]
|
|
|
|
# Install all offline dependencies
|
|
pip install lightrag-hku[offline]
|
|
```
|
|
|
|
### Using Individual Requirements Files
|
|
|
|
```bash
|
|
# Document processing only
|
|
pip install -r requirements-offline-docs.txt
|
|
|
|
# Storage backends only
|
|
pip install -r requirements-offline-storage.txt
|
|
|
|
# LLM providers only
|
|
pip install -r requirements-offline-llm.txt
|
|
|
|
# All offline dependencies
|
|
pip install -r requirements-offline.txt
|
|
```
|
|
|
|
## Tiktoken Cache Management
|
|
|
|
Tiktoken downloads BPE encoding models on first use. In offline environments, you must pre-download these models.
|
|
|
|
### Using the CLI Command
|
|
|
|
After installing LightRAG, use the built-in command:
|
|
|
|
```bash
|
|
# Download to default location (~/.tiktoken_cache)
|
|
lightrag-download-cache
|
|
|
|
# Download to specific directory
|
|
lightrag-download-cache --cache-dir ./tiktoken_cache
|
|
|
|
# Download specific models only
|
|
lightrag-download-cache --models gpt-4o-mini gpt-4
|
|
```
|
|
|
|
### Default Models Downloaded
|
|
|
|
- `gpt-4o-mini` (LightRAG default)
|
|
- `gpt-4o`
|
|
- `gpt-4`
|
|
- `gpt-3.5-turbo`
|
|
- `text-embedding-ada-002`
|
|
- `text-embedding-3-small`
|
|
- `text-embedding-3-large`
|
|
|
|
### Setting Cache Location in Offline Environment
|
|
|
|
```bash
|
|
# Option 1: Environment variable (temporary)
|
|
export TIKTOKEN_CACHE_DIR=/path/to/tiktoken_cache
|
|
|
|
# Option 2: Add to ~/.bashrc or ~/.zshrc (persistent)
|
|
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
|
|
source ~/.bashrc
|
|
|
|
# Option 3: Copy to default location
|
|
cp -r /path/to/tiktoken_cache ~/.tiktoken_cache/
|
|
```
|
|
|
|
## Complete Offline Deployment Workflow
|
|
|
|
### Step 1: Prepare in Online Environment
|
|
|
|
```bash
|
|
# 1. Install LightRAG with offline dependencies
|
|
pip install lightrag-hku[offline]
|
|
|
|
# 2. Download tiktoken cache
|
|
lightrag-download-cache --cache-dir ./offline_cache/tiktoken
|
|
|
|
# 3. Download all Python packages
|
|
pip download lightrag-hku[offline] -d ./offline_cache/packages
|
|
|
|
# 4. Create archive for transfer
|
|
tar -czf lightrag-offline-complete.tar.gz ./offline_cache
|
|
|
|
# 5. Verify contents
|
|
tar -tzf lightrag-offline-complete.tar.gz | head -20
|
|
```
|
|
|
|
### Step 2: Transfer to Offline Environment
|
|
|
|
```bash
|
|
# Using scp
|
|
scp lightrag-offline-complete.tar.gz user@offline-server:/tmp/
|
|
|
|
# Or using USB/physical media
|
|
# Copy lightrag-offline-complete.tar.gz to USB drive
|
|
```
|
|
|
|
### Step 3: Install in Offline Environment
|
|
|
|
```bash
|
|
# 1. Extract archive
|
|
cd /tmp
|
|
tar -xzf lightrag-offline-complete.tar.gz
|
|
|
|
# 2. Install Python packages
|
|
pip install --no-index \
|
|
--find-links=/tmp/offline_cache/packages \
|
|
lightrag-hku[offline]
|
|
|
|
# 3. Set up tiktoken cache
|
|
mkdir -p ~/.tiktoken_cache
|
|
cp -r /tmp/offline_cache/tiktoken/* ~/.tiktoken_cache/
|
|
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
|
|
|
|
# 4. Add to shell profile for persistence
|
|
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
|
|
```
|
|
|
|
### Step 4: Verify Installation
|
|
|
|
```bash
|
|
# Test Python import
|
|
python -c "from lightrag import LightRAG; print('✓ LightRAG imported')"
|
|
|
|
# Test tiktoken
|
|
python -c "from lightrag.utils import TiktokenTokenizer; t = TiktokenTokenizer(); print('✓ Tiktoken working')"
|
|
|
|
# Test optional dependencies (if installed)
|
|
python -c "import docling; print('✓ Docling available')"
|
|
python -c "import redis; print('✓ Redis available')"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Tiktoken fails with network error
|
|
|
|
**Problem**: `Unable to load tokenizer for model gpt-4o-mini`
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Ensure TIKTOKEN_CACHE_DIR is set
|
|
echo $TIKTOKEN_CACHE_DIR
|
|
|
|
# Verify cache files exist
|
|
ls -la ~/.tiktoken_cache/
|
|
|
|
# If empty, you need to download cache in online environment first
|
|
```
|
|
|
|
### Issue: Dynamic package installation fails
|
|
|
|
**Problem**: `Error installing package xxx`
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Pre-install the specific package you need
|
|
# For document processing:
|
|
pip install lightrag-hku[offline-docs]
|
|
|
|
# For storage backends:
|
|
pip install lightrag-hku[offline-storage]
|
|
|
|
# For LLM providers:
|
|
pip install lightrag-hku[offline-llm]
|
|
```
|
|
|
|
### Issue: Missing dependencies at runtime
|
|
|
|
**Problem**: `ModuleNotFoundError: No module named 'xxx'`
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Check what you have installed
|
|
pip list | grep -i xxx
|
|
|
|
# Install missing component
|
|
pip install lightrag-hku[offline] # Install all offline deps
|
|
```
|
|
|
|
### Issue: Permission denied on tiktoken cache
|
|
|
|
**Problem**: `PermissionError: [Errno 13] Permission denied`
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Ensure cache directory has correct permissions
|
|
chmod 755 ~/.tiktoken_cache
|
|
chmod 644 ~/.tiktoken_cache/*
|
|
|
|
# Or use a user-writable directory
|
|
export TIKTOKEN_CACHE_DIR=~/my_tiktoken_cache
|
|
mkdir -p ~/my_tiktoken_cache
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Test in Online Environment First**: Always test your complete setup in an online environment before going offline.
|
|
|
|
2. **Keep Cache Updated**: Periodically update your offline cache when new models are released.
|
|
|
|
3. **Document Your Setup**: Keep notes on which optional dependencies you actually need.
|
|
|
|
4. **Version Pinning**: Consider pinning specific versions in production:
|
|
```bash
|
|
pip freeze > requirements-production.txt
|
|
```
|
|
|
|
5. **Minimal Installation**: Only install what you need:
|
|
```bash
|
|
# If you only process PDFs with OpenAI
|
|
pip install lightrag-hku[offline-docs]
|
|
# Then manually add: pip install openai
|
|
```
|
|
|
|
## Additional Resources
|
|
|
|
- [LightRAG GitHub Repository](https://github.com/HKUDS/LightRAG)
|
|
- [Docker Deployment Guide](./DockerDeployment.md)
|
|
- [API Documentation](../lightrag/api/README.md)
|
|
|
|
## Support
|
|
|
|
If you encounter issues not covered in this guide:
|
|
|
|
1. Check the [GitHub Issues](https://github.com/HKUDS/LightRAG/issues)
|
|
2. Review the [project documentation](../README.md)
|
|
3. Create a new issue with your offline deployment details
|