LightRAG/docs/OfflineDeployment.md

# LightRAG Offline Deployment Guide

This guide provides comprehensive instructions for deploying LightRAG in offline environments where internet access is limited or unavailable.

## Table of Contents

- [Overview](#overview)
- [Quick Start](#quick-start)
- [Layered Dependencies](#layered-dependencies)
- [Tiktoken Cache Management](#tiktoken-cache-management)
- [Complete Offline Deployment Workflow](#complete-offline-deployment-workflow)
- [Troubleshooting](#troubleshooting)

## Overview

LightRAG uses dynamic package installation (`pipmaster`) for optional features based on file types and configurations. In offline environments, these dynamic installations will fail. This guide shows you how to pre-install all necessary dependencies and cache files.

### What Gets Dynamically Installed?

LightRAG dynamically installs packages for:

- **Document Processing**: `docling`, `pypdf2`, `python-docx`, `python-pptx`, `openpyxl`
- **Storage Backends**: `redis`, `neo4j`, `pymilvus`, `pymongo`, `asyncpg`, `qdrant-client`
- **LLM Providers**: `openai`, `anthropic`, `ollama`, `zhipuai`, `aioboto3`, `voyageai`, `llama-index`, `lmdeploy`, `transformers`, `torch`
- Tiktoken Models**: BPE encoding models downloaded from OpenAI CDN

## Quick Start

### Option 1: Using pip with Offline Extras

```bash
# Online environment: Install all offline dependencies
pip install lightrag-hku[offline]

# Download tiktoken cache
lightrag-download-cache

# Create offline package
pip download lightrag-hku[offline] -d ./offline-packages
tar -czf lightrag-offline.tar.gz ./offline-packages ~/.tiktoken_cache

# Transfer to offline server
scp lightrag-offline.tar.gz user@offline-server:/path/to/

# Offline environment: Install
tar -xzf lightrag-offline.tar.gz
pip install --no-index --find-links=./offline-packages lightrag-hku[offline]
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
```

### Option 2: Using Requirements Files

```bash
# Online environment: Download packages
pip download -r requirements-offline.txt -d ./packages

# Transfer to offline server
tar -czf packages.tar.gz ./packages
scp packages.tar.gz user@offline-server:/path/to/

# Offline environment: Install
tar -xzf packages.tar.gz
pip install --no-index --find-links=./packages -r requirements-offline.txt
```

## Layered Dependencies

LightRAG provides flexible dependency groups for different use cases:

### Available Dependency Groups

| Group | Description | Use Case |
|-------|-------------|----------|
| `offline-docs` | Document processing | PDF, DOCX, PPTX, XLSX files |
| `offline-storage` | Storage backends | Redis, Neo4j, MongoDB, PostgreSQL, etc. |
| `offline-llm` | LLM providers | OpenAI, Anthropic, Ollama, etc. |
| `offline` | All of the above | Complete offline deployment |

### Installation Examples

```bash
# Install only document processing dependencies
pip install lightrag-hku[offline-docs]

# Install document processing and storage backends
pip install lightrag-hku[offline-docs,offline-storage]

# Install all offline dependencies
pip install lightrag-hku[offline]
```

### Using Individual Requirements Files

```bash
# Document processing only
pip install -r requirements-offline-docs.txt

# Storage backends only
pip install -r requirements-offline-storage.txt

# LLM providers only
pip install -r requirements-offline-llm.txt

# All offline dependencies
pip install -r requirements-offline.txt
```

## Tiktoken Cache Management

Tiktoken downloads BPE encoding models on first use. In offline environments, you must pre-download these models.

### Using the CLI Command

After installing LightRAG, use the built-in command:

```bash
# Download to default location (~/.tiktoken_cache)
lightrag-download-cache

# Download to specific directory
lightrag-download-cache --cache-dir ./tiktoken_cache

# Download specific models only
lightrag-download-cache --models gpt-4o-mini gpt-4
```

### Default Models Downloaded

- `gpt-4o-mini` (LightRAG default)
- `gpt-4o`
- `gpt-4`
- `gpt-3.5-turbo`
- `text-embedding-ada-002`
- `text-embedding-3-small`
- `text-embedding-3-large`

### Setting Cache Location in Offline Environment

```bash
# Option 1: Environment variable (temporary)
export TIKTOKEN_CACHE_DIR=/path/to/tiktoken_cache

# Option 2: Add to ~/.bashrc or ~/.zshrc (persistent)
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
source ~/.bashrc

# Option 3: Copy to default location
cp -r /path/to/tiktoken_cache ~/.tiktoken_cache/
```

## Complete Offline Deployment Workflow

### Step 1: Prepare in Online Environment

```bash
# 1. Install LightRAG with offline dependencies
pip install lightrag-hku[offline]

# 2. Download tiktoken cache
lightrag-download-cache --cache-dir ./offline_cache/tiktoken

# 3. Download all Python packages
pip download lightrag-hku[offline] -d ./offline_cache/packages

# 4. Create archive for transfer
tar -czf lightrag-offline-complete.tar.gz ./offline_cache

# 5. Verify contents
tar -tzf lightrag-offline-complete.tar.gz | head -20
```

### Step 2: Transfer to Offline Environment

```bash
# Using scp
scp lightrag-offline-complete.tar.gz user@offline-server:/tmp/

# Or using USB/physical media
# Copy lightrag-offline-complete.tar.gz to USB drive
```

### Step 3: Install in Offline Environment

```bash
# 1. Extract archive
cd /tmp
tar -xzf lightrag-offline-complete.tar.gz

# 2. Install Python packages
pip install --no-index \
    --find-links=/tmp/offline_cache/packages \
    lightrag-hku[offline]

# 3. Set up tiktoken cache
mkdir -p ~/.tiktoken_cache
cp -r /tmp/offline_cache/tiktoken/* ~/.tiktoken_cache/
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache

# 4. Add to shell profile for persistence
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
```

### Step 4: Verify Installation

```bash
# Test Python import
python -c "from lightrag import LightRAG; print('✓ LightRAG imported')"

# Test tiktoken
python -c "from lightrag.utils import TiktokenTokenizer; t = TiktokenTokenizer(); print('✓ Tiktoken working')"

# Test optional dependencies (if installed)
python -c "import docling; print('✓ Docling available')"
python -c "import redis; print('✓ Redis available')"
```

## Troubleshooting

### Issue: Tiktoken fails with network error

**Problem**: `Unable to load tokenizer for model gpt-4o-mini`

**Solution**:
```bash
# Ensure TIKTOKEN_CACHE_DIR is set
echo $TIKTOKEN_CACHE_DIR

# Verify cache files exist
ls -la ~/.tiktoken_cache/

# If empty, you need to download cache in online environment first
```

### Issue: Dynamic package installation fails

**Problem**: `Error installing package xxx`

**Solution**:
```bash
# Pre-install the specific package you need
# For document processing:
pip install lightrag-hku[offline-docs]

# For storage backends:
pip install lightrag-hku[offline-storage]

# For LLM providers:
pip install lightrag-hku[offline-llm]
```

### Issue: Missing dependencies at runtime

**Problem**: `ModuleNotFoundError: No module named 'xxx'`

**Solution**:
```bash
# Check what you have installed
pip list | grep -i xxx

# Install missing component
pip install lightrag-hku[offline]  # Install all offline deps
```

### Issue: Permission denied on tiktoken cache

**Problem**: `PermissionError: [Errno 13] Permission denied`

**Solution**:
```bash
# Ensure cache directory has correct permissions
chmod 755 ~/.tiktoken_cache
chmod 644 ~/.tiktoken_cache/*

# Or use a user-writable directory
export TIKTOKEN_CACHE_DIR=~/my_tiktoken_cache
mkdir -p ~/my_tiktoken_cache
```

## Best Practices

1. **Test in Online Environment First**: Always test your complete setup in an online environment before going offline.

2. **Keep Cache Updated**: Periodically update your offline cache when new models are released.

3. **Document Your Setup**: Keep notes on which optional dependencies you actually need.

4. **Version Pinning**: Consider pinning specific versions in production:
   ```bash
   pip freeze > requirements-production.txt
   ```

5. **Minimal Installation**: Only install what you need:
   ```bash
   # If you only process PDFs with OpenAI
   pip install lightrag-hku[offline-docs]
   # Then manually add: pip install openai
   ```

## Additional Resources

- [LightRAG GitHub Repository](https://github.com/HKUDS/LightRAG)
- [Docker Deployment Guide](./DockerDeployment.md)
- [API Documentation](../lightrag/api/README.md)

## Support

If you encounter issues not covered in this guide:

1. Check the [GitHub Issues](https://github.com/HKUDS/LightRAG/issues)
2. Review the [project documentation](../README.md)
3. Create a new issue with your offline deployment details