Removes trailing whitespace and fixes minor formatting issues in Kubernetes deployment docs, storage report, and Helm chart files. Standardizes indentation and spacing in Docker Compose and deployment shell scripts to improve readability and maintainability. These edits improve documentation clarity and make deployment scripts more robust without altering functionality. Relates to MLO-469
16 KiB
LightRAG Storage Stack Configurations Report
Executive Summary
LightRAG supports a modular storage architecture with 4 distinct storage types that can be mixed and matched:
- Graph Storage: Knowledge graph relationships
- Vector Storage: Document embeddings
- KV Storage: Key-value pairs and metadata
- Document Status Storage: Document processing status
This report analyzes 25+ storage implementations across 8 database technologies to provide recommendations for different use cases.
Storage Architecture Overview
Storage Types & Available Implementations
| Storage Type | Implementations | Count |
|---|---|---|
| Graph Storage | NetworkXStorage, Neo4JStorage, PGGraphStorage, AGEStorage¹, MongoGraphStorage¹ | 5 |
| Vector Storage | NanoVectorDBStorage, MilvusVectorDBStorage, ChromaVectorDBStorage, PGVectorStorage, FaissVectorDBStorage, QdrantVectorDBStorage, MongoVectorDBStorage | 7 |
| KV Storage | JsonKVStorage, RedisKVStorage, PGKVStorage, MongoKVStorage | 4 |
| Doc Status Storage | JsonDocStatusStorage, PGDocStatusStorage, MongoDocStatusStorage | 3 |
¹ Currently commented out in production
Database Technology Analysis
1. PostgreSQL + pgvector
Implementations: PGVectorStorage, PGKVStorage, PGGraphStorage, PGDocStatusStorage
Strengths:
- ✅ Unified Database: Single database for all storage types
- ✅ ACID Compliance: Full transactional support
- ✅ Mature Ecosystem: Well-established, enterprise-ready
- ✅ Minimal: Single database to maintain
- ✅ pgvector Extension: Native vector operations with good performance
- ✅ SQL Familiarity: Easy to query and debug
Weaknesses:
- ❌ Graph Limitations: Requires AGE extension for advanced graph operations
- ❌ Vector Performance: Good but not specialized vector database performance
- ❌ Single Point of Failure: All data in one database
Configuration:
LIGHTRAG_KV_STORAGE: PGKVStorage
LIGHTRAG_VECTOR_STORAGE: PGVectorStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
LIGHTRAG_GRAPH_STORAGE: PGGraphStorage # Requires AGE extension
2. Neo4j (Graph Specialist)
Implementations: Neo4JStorage
Strengths:
- ✅ Graph Optimization: Purpose-built for graph operations
- ✅ Advanced Graph Analytics: Complex graph algorithms built-in
- ✅ Cypher Query Language: Powerful graph query capabilities
- ✅ Scalability: Excellent for large, complex graphs
- ✅ Visualization: Rich graph visualization tools
Weaknesses:
- ❌ Graph Only: Requires additional databases for vectors/KV
- ❌ Complexity: More complex setup and maintenance
- ❌ Cost: Enterprise features require licensing
- ❌ Memory Usage: Can be memory-intensive
Typical Configuration:
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage # Or Qdrant
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
3. Milvus (Vector Specialist)
Implementations: MilvusVectorDBStorage
Strengths:
- ✅ Vector Performance: Optimized for high-performance vector search
- ✅ Scalability: Designed for billion-scale vector collections
- ✅ Multiple Indexes: Various indexing algorithms (IVF, HNSW, etc.)
- ✅ GPU Support: CUDA acceleration for vector operations
- ✅ Cloud Native: Kubernetes-ready architecture
Weaknesses:
- ❌ Complexity: Complex distributed architecture
- ❌ Resource Usage: High memory and compute requirements
- ❌ Overkill: May be excessive for smaller datasets
- ❌ Dependencies: Requires etcd and MinIO for full deployment
Typical Configuration:
LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: MongoDocStatusStorage
4. Qdrant (Vector Specialist)
Implementations: QdrantVectorDBStorage
Strengths:
- ✅ Performance: High-performance vector search with Rust backend
- ✅ Simplicity: Easier deployment than Milvus
- ✅ Filtering: Advanced payload filtering capabilities
- ✅ API: Rich REST and gRPC APIs
- ✅ Memory Efficiency: Lower memory footprint than Milvus
Weaknesses:
- ❌ Ecosystem: Smaller ecosystem compared to alternatives
- ❌ Vector Only: Requires additional databases for other storage types
5. MongoDB (Multi-Purpose)
Implementations: MongoKVStorage, MongoVectorDBStorage, MongoDocStatusStorage
Strengths:
- ✅ Flexibility: Schema-less document storage
- ✅ Vector Search: Native vector search capabilities (Atlas Search)
- ✅ Multi-Purpose: Can handle KV, vectors, and document status
- ✅ Scalability: Horizontal scaling with sharding
- ✅ Developer Friendly: Easy to work with JSON documents
Weaknesses:
- ❌ Graph Limitations: Not optimized for graph operations
- ❌ Vector Performance: Vector search not as optimized as specialists
- ❌ Memory Usage: Can be memory-intensive for large datasets
6. Redis (KV Specialist)
Implementations: RedisKVStorage
Strengths:
- ✅ Speed: In-memory performance for KV operations
- ✅ Simplicity: Simple key-value operations
- ✅ Data Structures: Rich data structures (lists, sets, hashes)
- ✅ Caching: Excellent for caching and session storage
Weaknesses:
- ❌ Memory Bound: Limited by available RAM
- ❌ KV Only: Only suitable for key-value storage
- ❌ Persistence: Data persistence requires configuration
7. Local File Storage
Implementations: NetworkXStorage, JsonKVStorage, JsonDocStatusStorage, NanoVectorDBStorage, FaissVectorDBStorage
Strengths:
- ✅ Simplicity: No external dependencies
- ✅ Development: Perfect for development and testing
- ✅ Portability: Easy to backup and move
- ✅ Cost: No infrastructure costs
Weaknesses:
- ❌ Scalability: Limited by single machine resources
- ❌ Concurrency: No built-in concurrent access
- ❌ Performance: Limited performance for large datasets
- ❌ Reliability: Single point of failure
8. ChromaDB (Vector Specialist)
Implementations: ChromaVectorDBStorage
Strengths:
- ✅ Simplicity: Easy to deploy and use
- ✅ Python Native: Built for Python ML workflows
- ✅ Metadata: Rich metadata filtering capabilities
- ✅ Local/Distributed: Can run locally or distributed
Weaknesses:
- ❌ Performance: Slower than Milvus/Qdrant for large scales
- ❌ Maturity: Newer project with evolving feature set
Recommended Stack Configurations
1. 🏆 Production High-Performance Stack
Best for: Large-scale production deployments, complex graph analytics
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
Services Required:
- Neo4j (Graph operations)
- Milvus + etcd + MinIO (Vector search)
- Redis (KV cache)
- PostgreSQL (Document status)
Pros: Maximum performance, specialized for each data type Cons: High complexity, resource intensive, expensive
graph LR
LightRAG_App["LightRAG Application"]
Neo4j_Service["Neo4j Service"]
Milvus_Cluster["Milvus Cluster (Milvus, etcd, MinIO)"]
Redis_Service["Redis Service"]
PostgreSQL_Service["PostgreSQL Service"]
LightRAG_App --> |Graph Storage| Neo4j_Service
LightRAG_App --> |Vector Storage| Milvus_Cluster
LightRAG_App --> |KV Storage| Redis_Service
LightRAG_App --> |Doc Status Storage| PostgreSQL_Service
2. 🎯 Production Balanced Stack
Best for: Production deployments prioritizing simplicity
LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: QdrantVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
Services Required:
- Qdrant (Vector search)
- Redis (KV cache)
- PostgreSQL (Document status)
- File system (Graph storage)
Pros: Good performance, simpler than full specialist stack Cons: Graph operations limited by file-based storage
graph LR
subgraph "LightRAG Application Environment"
LightRAG_App["LightRAG Application"]
NetworkX["NetworkX Graph Storage (Local FS)"]
LightRAG_App -.-> NetworkX
end
Qdrant_Service["Qdrant Service"]
Redis_Service["Redis Service"]
PostgreSQL_Service["PostgreSQL Service"]
LightRAG_App --> |Vector Storage| Qdrant_Service
LightRAG_App --> |KV Storage| Redis_Service
LightRAG_App --> |Doc Status Storage| PostgreSQL_Service
3. 💰 Production Minimal Stack
Best for: Budget-conscious production deployments
LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: PGVectorStorage
LIGHTRAG_KV_STORAGE: PGKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
Services Required:
- PostgreSQL + pgvector (All storage except graph)
- File system (Graph storage)
Pros: Single database, low cost, good for medium scale Cons: Not optimized for very large datasets or complex graphs
graph LR
subgraph "LightRAG Application Environment"
LightRAG_App["LightRAG Application"]
NetworkX["NetworkX Graph Storage (Local FS)"]
LightRAG_App -.-> NetworkX
end
PostgreSQL_Service["PostgreSQL Service (+pgvector)"]
LightRAG_App --> |Vector Storage| PostgreSQL_Service
LightRAG_App --> |KV Storage| PostgreSQL_Service
LightRAG_App --> |Doc Status Storage| PostgreSQL_Service
4. 🚀 Development & Testing Stack
Best for: Local development, testing, small deployments
LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: NanoVectorDBStorage
LIGHTRAG_KV_STORAGE: JsonKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: JsonDocStatusStorage
Services Required:
- None (all file-based)
Pros: Zero infrastructure, fast setup, portable Cons: Limited scalability and performance
graph LR
subgraph "LightRAG Application (Local Process)"
LightRAG_App["LightRAG App"]
NetworkX["NetworkX (File System)"]
NanoVectorDB["NanoVectorDB (File System)"]
JsonKV["JsonKVStorage (File System)"]
JsonDocStatus["JsonDocStatusStorage (File System)"]
LightRAG_App -.-> |Graph| NetworkX
LightRAG_App -.-> |Vector| NanoVectorDB
LightRAG_App -.-> |KV| JsonKV
LightRAG_App -.-> |Doc Status| JsonDocStatus
end
5. 🐳 Docker All-in-One Stack
Best for: Containerized deployments, cloud environments
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: QdrantVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: MongoDocStatusStorage
Services Required:
- Neo4j (Graph)
- Qdrant (Vector)
- Redis (KV)
- MongoDB (Document status)
Pros: Cloud-native, each service containerized Cons: More services to manage
graph LR
subgraph "Docker Environment (e.g., Docker Compose)"
LightRAG_Container["LightRAG App (Container)"]
Neo4j_Container["Neo4j (Container)"]
Qdrant_Container["Qdrant (Container)"]
Redis_Container["Redis (Container)"]
MongoDB_Container["MongoDB (Container)"]
end
LightRAG_Container --> |Graph Storage| Neo4j_Container
LightRAG_Container --> |Vector Storage| Qdrant_Container
LightRAG_Container --> |KV Storage| Redis_Container
LightRAG_Container --> |Doc Status Storage| MongoDB_Container
Performance Comparison
Vector Search Performance (Approximate)
| Implementation | Small (1K docs) | Medium (100K docs) | Large (1M+ docs) | Memory Usage |
|---|---|---|---|---|
| MilvusVectorDB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | High |
| QdrantVectorDB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Medium |
| PGVectorStorage | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | Medium |
| ChromaVectorDB | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | Medium |
| FaissVectorDB | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Low |
| NanoVectorDB | ⭐⭐⭐ | ⭐⭐ | ⭐ | Low |
Graph Operations Performance
| Implementation | Node Queries | Edge Traversal | Complex Analytics | Scalability |
|---|---|---|---|---|
| Neo4JStorage | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| PGGraphStorage | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| NetworkXStorage | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ |
KV Operations Performance
| Implementation | Read Speed | Write Speed | Concurrency | Persistence |
|---|---|---|---|---|
| RedisKVStorage | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| PGKVStorage | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| MongoKVStorage | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| JsonKVStorage | ⭐⭐ | ⭐⭐ | ⭐ | ⭐⭐⭐⭐⭐ |
Deployment Considerations
Resource Requirements
| Configuration | CPU | Memory | Storage | Network |
|---|---|---|---|---|
| Development Stack | 2 cores | 4GB | 10GB | Minimal |
| Minimal Stack | 4 cores | 8GB | 50GB | Medium |
| Balanced Stack | 8 cores | 16GB | 100GB | High |
| High-Performance Stack | 16+ cores | 32GB+ | 500GB+ | Very High |
Maintenance Complexity
| Stack Type | Setup Complexity | Operational Overhead | Monitoring | Backup Strategy |
|---|---|---|---|---|
| Development | ⭐ | ⭐ | ⭐ | Simple |
| Minimal | ⭐⭐ | ⭐⭐ | ⭐⭐ | Medium |
| Balanced | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Complex |
| High-Performance | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Very Complex |
Migration Paths
Development → Production
- Start with Development Stack (all file-based)
- Migrate to Minimal Stack (PostgreSQL-based)
- Scale to Balanced Stack (add specialized vector DB)
- Optimize with High-Performance Stack (full specialization)
Data Migration Tools
- Database-specific: Use native tools (pg_dump, neo4j-admin, etc.)
- LightRAG native: Built-in export/import capabilities
- Cross-platform: JSON export for universal compatibility
Recommendations by Use Case
📚 Documentation/Knowledge Base
- Small (<10K docs): Development Stack
- Medium (<100K docs): Minimal Stack
- Large (>100K docs): Balanced Stack
🔬 Research/Analytics
- Graph-heavy: High-Performance Stack with Neo4j
- Vector-heavy: Balanced Stack with Milvus
- Mixed workload: Balanced Stack
💼 Enterprise
- High Availability: High-Performance Stack with clustering
- Budget Conscious: Minimal Stack with PostgreSQL
- Regulatory: On-premises with full control
🚀 Startups/SMBs
- MVP: Development Stack
- Growing: Minimal Stack
- Scaling: Balanced Stack
Conclusion
The Minimal Stack (PostgreSQL + NetworkX) provides the best balance of performance, complexity, and cost for most use cases. It offers:
- ✅ Production-ready reliability
- ✅ Reasonable performance for medium-scale deployments
- ✅ Low operational overhead
- ✅ Clear upgrade path to specialized components
For specialized needs:
- High graph complexity → Add Neo4j
- High vector performance → Add Qdrant/Milvus
- High concurrency KV → Add Redis
The modular architecture allows gradual optimization based on actual performance bottlenecks rather than premature optimization.
Report generated based on LightRAG v1.3.7 implementation analysis