LightRAG/rag_storage/README.md

5.6 KiB

Pre-built Knowledge Graph for Docker Deployments

This directory can contain a pre-built knowledge graph that will be included in Docker images built with Dockerfile.prebuilt-graph, enabling instant query capability without re-indexing.

Benefits

  • 💰 Cost Savings: No embedding API costs in production
  • Fast Startup: Instant query capability (no indexing delay)
  • 🔄 Consistency: Same embeddings across all deployments
  • 📦 Portable: Ship ready-to-query Docker images

Usage

1. Build Your Knowledge Graph Locally

# Index your documents locally
python -m lightrag.examples.lightrag_api_openai_compatible_demo

# Or use the API
curl -X POST http://localhost:9621/insert \
  -H "Content-Type: application/json" \
  -d '{"text": "Your document content here"}'

This will create graph_chunk_entity_relation.graphml in your local rag_storage/ directory.

2. Build Docker Image with Pre-built Graph

# Ensure graph file exists
ls rag_storage/graph_chunk_entity_relation.graphml

# Build Docker image (graph will be included automatically)
docker build -f Dockerfile.prebuilt-graph -t lightrag:prebuilt .

3. Deploy Without Re-indexing

# Run container - queries work immediately
docker run -p 9621:9621 \
  -e OPENAI_API_KEY=your_key \
  lightrag:prebuilt

# Test query (no indexing needed!)
curl -X POST http://localhost:9621/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is LightRAG?", "mode": "hybrid"}'

How It Works

Dockerfile Integration

Dockerfile.prebuilt-graph includes this optional step:

# Copy pre-built knowledge graph if available (optional)
ARG GRAPH_SOURCE=rag_storage/
COPY --chown=root:root ${GRAPH_SOURCE} /app/data/rag_storage/

Specify an alternate directory or archive when building:

docker build -f Dockerfile.prebuilt-graph \
  --build-arg GRAPH_SOURCE=artifacts/graphs/ \
  -t lightrag:prebuilt .

.dockerignore Configuration

# Exclude rag_storage but allow pre-built knowledge graph (optional)
/rag_storage/*
!/rag_storage/graph_chunk_entity_relation.graphml

Build Behavior

  • With graph file: File is copied into image → instant queries
  • Without graph file: Build continues normally → index at runtime

File Format

The graph_chunk_entity_relation.graphml file contains:

  • Entities: Extracted from documents
  • Relationships: Connections between entities
  • Chunks: Document segments with embeddings
  • Metadata: Source information and timestamps

Use Cases

Good Use Cases

  • Production deployments with stable document corpus
  • Demo/POC environments with sample data
  • Multi-region deployments with consistent data
  • Offline deployments without embedding API access
  • Cost optimization for large document sets

⚠️ Consider Alternatives

  • Frequently updated content: Use volume mounts instead
  • User-specific data: Mount per-user graph files
  • Dynamic indexing: Let containers index at runtime

Advanced Usage

Multiple Graph Files

To include multiple pre-built graphs:

# Custom Dockerfile
COPY rag_storage/*.graphml /app/data/rag_storage/

Update .dockerignore:

/rag_storage/*
!/rag_storage/*.graphml

Volume Override

Even with pre-built graph, you can override at runtime:

# Use custom graph file
docker run -p 9621:9621 \
  -v /path/to/custom/graph:/app/data/rag_storage \
  lightrag:prebuilt

Multi-stage Builds

For CI/CD pipelines:

# Stage 1: Build graph
FROM lightrag:base AS indexer
COPY documents/ /documents/
RUN python index_documents.py /documents

# Stage 2: Production image with graph
FROM lightrag:base AS production
COPY --from=indexer /app/data/rag_storage/*.graphml /app/data/rag_storage/

Troubleshooting

Graph Not Loaded

Symptoms: Container queries return empty results

Check:

# Verify graph file in image
docker run lightrag:prebuilt ls -lh /app/data/rag_storage/

# Check logs
docker logs <container_id>

Build Fails

Error: COPY failed: file not found

Solution: This means the Dockerfile expects the graph file but it doesn't exist. Either:

  1. Create the graph file before building
  2. Remove the COPY instruction for optional builds

Wrong Graph Loaded

Issue: Old data in queries

Solution:

# Rebuild image with new graph
rm rag_storage/graph_chunk_entity_relation.graphml
python rebuild_index.py
docker build --no-cache -t lightrag:prebuilt .

Best Practices

  1. Version your graph files: Tag Docker images with graph versions

    docker build -t lightrag:v1.0-graph-20250101 .
    
  2. Document graph contents: Add metadata file

    echo "Built: 2025-01-01, Documents: 1000, Entities: 5000" > rag_storage/graph_metadata.txt
    
  3. Test before deploying:

    # Validate graph locally
    python -m lightrag.tools.validate_graph rag_storage/graph_chunk_entity_relation.graphml
    
  4. Monitor graph size:

    # Check file size
    du -h rag_storage/graph_chunk_entity_relation.graphml
    

Security Considerations

  • Sensitive Data: Don't include confidential information in public images
  • Access Control: Use private registries for graphs with proprietary data
  • Compliance: Ensure graph data complies with data residency requirements

Performance Tips

  • Graph Size: Optimize for < 100MB for faster image pulls
  • Compression: GraphML compresses well with gzip
  • Caching: Use Docker layer caching for unchanged graphs

Note: This feature is optional. LightRAG works without pre-built graphs by indexing at runtime.