| .. | ||
| README.md | ||
Pre-built Knowledge Graph for Docker Deployments
This directory can contain a pre-built knowledge graph that will be included in Docker images built with Dockerfile.prebuilt-graph, enabling instant query capability without re-indexing.
Benefits
- 💰 Cost Savings: No embedding API costs in production
- ⚡ Fast Startup: Instant query capability (no indexing delay)
- 🔄 Consistency: Same embeddings across all deployments
- 📦 Portable: Ship ready-to-query Docker images
Usage
1. Build Your Knowledge Graph Locally
# Index your documents locally
python -m lightrag.examples.lightrag_api_openai_compatible_demo
# Or use the API
curl -X POST http://localhost:9621/insert \
-H "Content-Type: application/json" \
-d '{"text": "Your document content here"}'
This will create graph_chunk_entity_relation.graphml in your local rag_storage/ directory.
2. Build Docker Image with Pre-built Graph
# Ensure graph file exists
ls rag_storage/graph_chunk_entity_relation.graphml
# Build Docker image (graph will be included automatically)
docker build -f Dockerfile.prebuilt-graph -t lightrag:prebuilt .
3. Deploy Without Re-indexing
# Run container - queries work immediately
docker run -p 9621:9621 \
-e OPENAI_API_KEY=your_key \
lightrag:prebuilt
# Test query (no indexing needed!)
curl -X POST http://localhost:9621/query \
-H "Content-Type: application/json" \
-d '{"query": "What is LightRAG?", "mode": "hybrid"}'
How It Works
Dockerfile Integration
Dockerfile.prebuilt-graph includes this optional step:
# Copy pre-built knowledge graph if available (optional)
ARG GRAPH_SOURCE=rag_storage/
COPY --chown=root:root ${GRAPH_SOURCE} /app/data/rag_storage/
Specify an alternate directory or archive when building:
docker build -f Dockerfile.prebuilt-graph \
--build-arg GRAPH_SOURCE=artifacts/graphs/ \
-t lightrag:prebuilt .
.dockerignore Configuration
# Exclude rag_storage but allow pre-built knowledge graph (optional)
/rag_storage/*
!/rag_storage/graph_chunk_entity_relation.graphml
Build Behavior
- With graph file: File is copied into image → instant queries
- Without graph file: Build continues normally → index at runtime
File Format
The graph_chunk_entity_relation.graphml file contains:
- Entities: Extracted from documents
- Relationships: Connections between entities
- Chunks: Document segments with embeddings
- Metadata: Source information and timestamps
Use Cases
✅ Good Use Cases
- Production deployments with stable document corpus
- Demo/POC environments with sample data
- Multi-region deployments with consistent data
- Offline deployments without embedding API access
- Cost optimization for large document sets
⚠️ Consider Alternatives
- Frequently updated content: Use volume mounts instead
- User-specific data: Mount per-user graph files
- Dynamic indexing: Let containers index at runtime
Advanced Usage
Multiple Graph Files
To include multiple pre-built graphs:
# Custom Dockerfile
COPY rag_storage/*.graphml /app/data/rag_storage/
Update .dockerignore:
/rag_storage/*
!/rag_storage/*.graphml
Volume Override
Even with pre-built graph, you can override at runtime:
# Use custom graph file
docker run -p 9621:9621 \
-v /path/to/custom/graph:/app/data/rag_storage \
lightrag:prebuilt
Multi-stage Builds
For CI/CD pipelines:
# Stage 1: Build graph
FROM lightrag:base AS indexer
COPY documents/ /documents/
RUN python index_documents.py /documents
# Stage 2: Production image with graph
FROM lightrag:base AS production
COPY --from=indexer /app/data/rag_storage/*.graphml /app/data/rag_storage/
Troubleshooting
Graph Not Loaded
Symptoms: Container queries return empty results
Check:
# Verify graph file in image
docker run lightrag:prebuilt ls -lh /app/data/rag_storage/
# Check logs
docker logs <container_id>
Build Fails
Error: COPY failed: file not found
Solution: This means the Dockerfile expects the graph file but it doesn't exist. Either:
- Create the graph file before building
- Remove the COPY instruction for optional builds
Wrong Graph Loaded
Issue: Old data in queries
Solution:
# Rebuild image with new graph
rm rag_storage/graph_chunk_entity_relation.graphml
python rebuild_index.py
docker build --no-cache -t lightrag:prebuilt .
Best Practices
-
Version your graph files: Tag Docker images with graph versions
docker build -t lightrag:v1.0-graph-20250101 . -
Document graph contents: Add metadata file
echo "Built: 2025-01-01, Documents: 1000, Entities: 5000" > rag_storage/graph_metadata.txt -
Test before deploying:
# Validate graph locally python -m lightrag.tools.validate_graph rag_storage/graph_chunk_entity_relation.graphml -
Monitor graph size:
# Check file size du -h rag_storage/graph_chunk_entity_relation.graphml
Security Considerations
- Sensitive Data: Don't include confidential information in public images
- Access Control: Use private registries for graphs with proprietary data
- Compliance: Ensure graph data complies with data residency requirements
Performance Tips
- Graph Size: Optimize for < 100MB for faster image pulls
- Compression: GraphML compresses well with gzip
- Caching: Use Docker layer caching for unchanged graphs
Note: This feature is optional. LightRAG works without pre-built graphs by indexing at runtime.