# Pre-built Knowledge Graph for Docker Deployments This directory can contain a pre-built knowledge graph that will be included in Docker images built with `Dockerfile.prebuilt-graph`, enabling instant query capability without re-indexing. ## Benefits - **💰 Cost Savings**: No embedding API costs in production - **⚡ Fast Startup**: Instant query capability (no indexing delay) - **🔄 Consistency**: Same embeddings across all deployments - **📦 Portable**: Ship ready-to-query Docker images ## Usage ### 1. Build Your Knowledge Graph Locally ```bash # Index your documents locally python -m lightrag.examples.lightrag_api_openai_compatible_demo # Or use the API curl -X POST http://localhost:9621/insert \ -H "Content-Type: application/json" \ -d '{"text": "Your document content here"}' ``` This will create `graph_chunk_entity_relation.graphml` in your local `rag_storage/` directory. ### 2. Build Docker Image with Pre-built Graph ```bash # Ensure graph file exists ls rag_storage/graph_chunk_entity_relation.graphml # Build Docker image (graph will be included automatically) docker build -f Dockerfile.prebuilt-graph -t lightrag:prebuilt . ``` ### 3. Deploy Without Re-indexing ```bash # Run container - queries work immediately docker run -p 9621:9621 \ -e OPENAI_API_KEY=your_key \ lightrag:prebuilt # Test query (no indexing needed!) curl -X POST http://localhost:9621/query \ -H "Content-Type: application/json" \ -d '{"query": "What is LightRAG?", "mode": "hybrid"}' ``` ## How It Works ### Dockerfile Integration `Dockerfile.prebuilt-graph` includes this optional step: ```dockerfile # Copy pre-built knowledge graph if available (optional) ARG GRAPH_SOURCE=rag_storage/ COPY --chown=root:root ${GRAPH_SOURCE} /app/data/rag_storage/ ``` Specify an alternate directory or archive when building: ```bash docker build -f Dockerfile.prebuilt-graph \ --build-arg GRAPH_SOURCE=artifacts/graphs/ \ -t lightrag:prebuilt . ``` ### .dockerignore Configuration ``` # Exclude rag_storage but allow pre-built knowledge graph (optional) /rag_storage/* !/rag_storage/graph_chunk_entity_relation.graphml ``` ### Build Behavior - **With graph file**: File is copied into image → instant queries - **Without graph file**: Build continues normally → index at runtime ## File Format The `graph_chunk_entity_relation.graphml` file contains: - **Entities**: Extracted from documents - **Relationships**: Connections between entities - **Chunks**: Document segments with embeddings - **Metadata**: Source information and timestamps ## Use Cases ### ✅ Good Use Cases - **Production deployments** with stable document corpus - **Demo/POC environments** with sample data - **Multi-region deployments** with consistent data - **Offline deployments** without embedding API access - **Cost optimization** for large document sets ### ⚠️ Consider Alternatives - **Frequently updated content**: Use volume mounts instead - **User-specific data**: Mount per-user graph files - **Dynamic indexing**: Let containers index at runtime ## Advanced Usage ### Multiple Graph Files To include multiple pre-built graphs: ```dockerfile # Custom Dockerfile COPY rag_storage/*.graphml /app/data/rag_storage/ ``` Update `.dockerignore`: ``` /rag_storage/* !/rag_storage/*.graphml ``` ### Volume Override Even with pre-built graph, you can override at runtime: ```bash # Use custom graph file docker run -p 9621:9621 \ -v /path/to/custom/graph:/app/data/rag_storage \ lightrag:prebuilt ``` ### Multi-stage Builds For CI/CD pipelines: ```dockerfile # Stage 1: Build graph FROM lightrag:base AS indexer COPY documents/ /documents/ RUN python index_documents.py /documents # Stage 2: Production image with graph FROM lightrag:base AS production COPY --from=indexer /app/data/rag_storage/*.graphml /app/data/rag_storage/ ``` ## Troubleshooting ### Graph Not Loaded **Symptoms**: Container queries return empty results **Check**: ```bash # Verify graph file in image docker run lightrag:prebuilt ls -lh /app/data/rag_storage/ # Check logs docker logs ``` ### Build Fails **Error**: `COPY failed: file not found` **Solution**: This means the Dockerfile expects the graph file but it doesn't exist. Either: 1. Create the graph file before building 2. Remove the COPY instruction for optional builds ### Wrong Graph Loaded **Issue**: Old data in queries **Solution**: ```bash # Rebuild image with new graph rm rag_storage/graph_chunk_entity_relation.graphml python rebuild_index.py docker build --no-cache -t lightrag:prebuilt . ``` ## Best Practices 1. **Version your graph files**: Tag Docker images with graph versions ```bash docker build -t lightrag:v1.0-graph-20250101 . ``` 2. **Document graph contents**: Add metadata file ```bash echo "Built: 2025-01-01, Documents: 1000, Entities: 5000" > rag_storage/graph_metadata.txt ``` 3. **Test before deploying**: ```bash # Validate graph locally python -m lightrag.tools.validate_graph rag_storage/graph_chunk_entity_relation.graphml ``` 4. **Monitor graph size**: ```bash # Check file size du -h rag_storage/graph_chunk_entity_relation.graphml ``` ## Security Considerations - **Sensitive Data**: Don't include confidential information in public images - **Access Control**: Use private registries for graphs with proprietary data - **Compliance**: Ensure graph data complies with data residency requirements ## Performance Tips - **Graph Size**: Optimize for < 100MB for faster image pulls - **Compression**: GraphML compresses well with gzip - **Caching**: Use Docker layer caching for unchanged graphs --- **Note**: This feature is optional. LightRAG works without pre-built graphs by indexing at runtime.