History

anouarbm 3f64a74c0a Add dedicated Dockerfile for prebuilt graphs and refresh docs		2025-11-02 20:06:10 +01:00
..
README.md	Add dedicated Dockerfile for prebuilt graphs and refresh docs	2025-11-02 20:06:10 +01:00

README.md

Pre-built Knowledge Graph for Docker Deployments

This directory can contain a pre-built knowledge graph that will be included in Docker images built with Dockerfile.prebuilt-graph, enabling instant query capability without re-indexing.

Benefits

💰 Cost Savings: No embedding API costs in production
⚡ Fast Startup: Instant query capability (no indexing delay)
🔄 Consistency: Same embeddings across all deployments
📦 Portable: Ship ready-to-query Docker images

Usage

1. Build Your Knowledge Graph Locally

# Index your documents locally
python -m lightrag.examples.lightrag_api_openai_compatible_demo

# Or use the API
curl -X POST http://localhost:9621/insert \
  -H "Content-Type: application/json" \
  -d '{"text": "Your document content here"}'

This will create graph_chunk_entity_relation.graphml in your local rag_storage/ directory.

2. Build Docker Image with Pre-built Graph

# Ensure graph file exists
ls rag_storage/graph_chunk_entity_relation.graphml

# Build Docker image (graph will be included automatically)
docker build -f Dockerfile.prebuilt-graph -t lightrag:prebuilt .

3. Deploy Without Re-indexing

# Run container - queries work immediately
docker run -p 9621:9621 \
  -e OPENAI_API_KEY=your_key \
  lightrag:prebuilt

# Test query (no indexing needed!)
curl -X POST http://localhost:9621/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is LightRAG?", "mode": "hybrid"}'

How It Works

Dockerfile Integration

Dockerfile.prebuilt-graph includes this optional step:

# Copy pre-built knowledge graph if available (optional)
ARG GRAPH_SOURCE=rag_storage/
COPY --chown=root:root ${GRAPH_SOURCE} /app/data/rag_storage/

Specify an alternate directory or archive when building:

docker build -f Dockerfile.prebuilt-graph \
  --build-arg GRAPH_SOURCE=artifacts/graphs/ \
  -t lightrag:prebuilt .

.dockerignore Configuration

# Exclude rag_storage but allow pre-built knowledge graph (optional)
/rag_storage/*
!/rag_storage/graph_chunk_entity_relation.graphml

Build Behavior

With graph file: File is copied into image → instant queries
Without graph file: Build continues normally → index at runtime

File Format

The graph_chunk_entity_relation.graphml file contains:

Entities: Extracted from documents
Relationships: Connections between entities
Chunks: Document segments with embeddings
Metadata: Source information and timestamps

Use Cases

✅ Good Use Cases

Production deployments with stable document corpus
Demo/POC environments with sample data
Multi-region deployments with consistent data
Offline deployments without embedding API access
Cost optimization for large document sets

⚠️ Consider Alternatives

Frequently updated content: Use volume mounts instead
User-specific data: Mount per-user graph files
Dynamic indexing: Let containers index at runtime

Advanced Usage

Multiple Graph Files

To include multiple pre-built graphs:

# Custom Dockerfile
COPY rag_storage/*.graphml /app/data/rag_storage/

Update .dockerignore:

/rag_storage/*
!/rag_storage/*.graphml

Volume Override

Even with pre-built graph, you can override at runtime:

# Use custom graph file
docker run -p 9621:9621 \
  -v /path/to/custom/graph:/app/data/rag_storage \
  lightrag:prebuilt

Multi-stage Builds

For CI/CD pipelines:

# Stage 1: Build graph
FROM lightrag:base AS indexer
COPY documents/ /documents/
RUN python index_documents.py /documents

# Stage 2: Production image with graph
FROM lightrag:base AS production
COPY --from=indexer /app/data/rag_storage/*.graphml /app/data/rag_storage/

Troubleshooting

Graph Not Loaded

Symptoms: Container queries return empty results

Check:

# Verify graph file in image
docker run lightrag:prebuilt ls -lh /app/data/rag_storage/

# Check logs
docker logs <container_id>

Build Fails

Error: COPY failed: file not found

Solution: This means the Dockerfile expects the graph file but it doesn't exist. Either:

Create the graph file before building
Remove the COPY instruction for optional builds

Wrong Graph Loaded

Issue: Old data in queries

Solution:

# Rebuild image with new graph
rm rag_storage/graph_chunk_entity_relation.graphml
python rebuild_index.py
docker build --no-cache -t lightrag:prebuilt .

Best Practices

Version your graph files: Tag Docker images with graph versions
```
docker build -t lightrag:v1.0-graph-20250101 .
```

Document graph contents: Add metadata file

echo "Built: 2025-01-01, Documents: 1000, Entities: 5000" > rag_storage/graph_metadata.txt

Test before deploying:

# Validate graph locally
python -m lightrag.tools.validate_graph rag_storage/graph_chunk_entity_relation.graphml

Monitor graph size:

# Check file size
du -h rag_storage/graph_chunk_entity_relation.graphml

Security Considerations

Sensitive Data: Don't include confidential information in public images
Access Control: Use private registries for graphs with proprietary data
Compliance: Ensure graph data complies with data residency requirements

Performance Tips

Graph Size: Optimize for < 100MB for faster image pulls
Compression: GraphML compresses well with gzip
Caching: Use Docker layer caching for unchanged graphs

Note: This feature is optional. LightRAG works without pre-built graphs by indexing at runtime.