* Partial implementation of phase-0 * Partial implementation of phase-1 * add report * add postgress * Revert "add postgress" This reverts commit 27778dc6bb3906b5220dd386e47fe32ca7415332. * remove junk * Cleaned up annd setup docs * update docs * moved report * Updated load_markdown_files function: Now returns tuples with (content, title, relative_path) instead of just (content, title) * fixes to load docs script and more env variables for llm configuration * update prod values * update docs * apolo docs support with linking * update docs to reflect url conventions and mapping with docs * Adds ingress and forwardAuth configurations Adds ingress configuration to expose the application. Adds forwardAuth configuration to enable user authentication. Includes middleware to strip headers. * Adds ingress and forward authentication middleware support
16 KiB
LightRAG Storage Stack Configurations Report
Executive Summary
LightRAG supports a modular storage architecture with 4 distinct storage types that can be mixed and matched:
- Graph Storage: Knowledge graph relationships
- Vector Storage: Document embeddings
- KV Storage: Key-value pairs and metadata
- Document Status Storage: Document processing status
This report analyzes 25+ storage implementations across 8 database technologies to provide recommendations for different use cases.
Storage Architecture Overview
Storage Types & Available Implementations
| Storage Type | Implementations | Count |
|---|---|---|
| Graph Storage | NetworkXStorage, Neo4JStorage, PGGraphStorage, AGEStorage¹, MongoGraphStorage¹ | 5 |
| Vector Storage | NanoVectorDBStorage, MilvusVectorDBStorage, ChromaVectorDBStorage, PGVectorStorage, FaissVectorDBStorage, QdrantVectorDBStorage, MongoVectorDBStorage | 7 |
| KV Storage | JsonKVStorage, RedisKVStorage, PGKVStorage, MongoKVStorage | 4 |
| Doc Status Storage | JsonDocStatusStorage, PGDocStatusStorage, MongoDocStatusStorage | 3 |
¹ Currently commented out in production
Database Technology Analysis
1. PostgreSQL + pgvector
Implementations: PGVectorStorage, PGKVStorage, PGGraphStorage, PGDocStatusStorage
Strengths:
- ✅ Unified Database: Single database for all storage types
- ✅ ACID Compliance: Full transactional support
- ✅ Mature Ecosystem: Well-established, enterprise-ready
- ✅ Minimal: Single database to maintain
- ✅ pgvector Extension: Native vector operations with good performance
- ✅ SQL Familiarity: Easy to query and debug
Weaknesses:
- ❌ Graph Limitations: Requires AGE extension for advanced graph operations
- ❌ Vector Performance: Good but not specialized vector database performance
- ❌ Single Point of Failure: All data in one database
Configuration:
LIGHTRAG_KV_STORAGE: PGKVStorage
LIGHTRAG_VECTOR_STORAGE: PGVectorStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
LIGHTRAG_GRAPH_STORAGE: PGGraphStorage # Requires AGE extension
2. Neo4j (Graph Specialist)
Implementations: Neo4JStorage
Strengths:
- ✅ Graph Optimization: Purpose-built for graph operations
- ✅ Advanced Graph Analytics: Complex graph algorithms built-in
- ✅ Cypher Query Language: Powerful graph query capabilities
- ✅ Scalability: Excellent for large, complex graphs
- ✅ Visualization: Rich graph visualization tools
Weaknesses:
- ❌ Graph Only: Requires additional databases for vectors/KV
- ❌ Complexity: More complex setup and maintenance
- ❌ Cost: Enterprise features require licensing
- ❌ Memory Usage: Can be memory-intensive
Typical Configuration:
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage # Or Qdrant
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
3. Milvus (Vector Specialist)
Implementations: MilvusVectorDBStorage
Strengths:
- ✅ Vector Performance: Optimized for high-performance vector search
- ✅ Scalability: Designed for billion-scale vector collections
- ✅ Multiple Indexes: Various indexing algorithms (IVF, HNSW, etc.)
- ✅ GPU Support: CUDA acceleration for vector operations
- ✅ Cloud Native: Kubernetes-ready architecture
Weaknesses:
- ❌ Complexity: Complex distributed architecture
- ❌ Resource Usage: High memory and compute requirements
- ❌ Overkill: May be excessive for smaller datasets
- ❌ Dependencies: Requires etcd and MinIO for full deployment
Typical Configuration:
LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: MongoDocStatusStorage
4. Qdrant (Vector Specialist)
Implementations: QdrantVectorDBStorage
Strengths:
- ✅ Performance: High-performance vector search with Rust backend
- ✅ Simplicity: Easier deployment than Milvus
- ✅ Filtering: Advanced payload filtering capabilities
- ✅ API: Rich REST and gRPC APIs
- ✅ Memory Efficiency: Lower memory footprint than Milvus
Weaknesses:
- ❌ Ecosystem: Smaller ecosystem compared to alternatives
- ❌ Vector Only: Requires additional databases for other storage types
5. MongoDB (Multi-Purpose)
Implementations: MongoKVStorage, MongoVectorDBStorage, MongoDocStatusStorage
Strengths:
- ✅ Flexibility: Schema-less document storage
- ✅ Vector Search: Native vector search capabilities (Atlas Search)
- ✅ Multi-Purpose: Can handle KV, vectors, and document status
- ✅ Scalability: Horizontal scaling with sharding
- ✅ Developer Friendly: Easy to work with JSON documents
Weaknesses:
- ❌ Graph Limitations: Not optimized for graph operations
- ❌ Vector Performance: Vector search not as optimized as specialists
- ❌ Memory Usage: Can be memory-intensive for large datasets
6. Redis (KV Specialist)
Implementations: RedisKVStorage
Strengths:
- ✅ Speed: In-memory performance for KV operations
- ✅ Simplicity: Simple key-value operations
- ✅ Data Structures: Rich data structures (lists, sets, hashes)
- ✅ Caching: Excellent for caching and session storage
Weaknesses:
- ❌ Memory Bound: Limited by available RAM
- ❌ KV Only: Only suitable for key-value storage
- ❌ Persistence: Data persistence requires configuration
7. Local File Storage
Implementations: NetworkXStorage, JsonKVStorage, JsonDocStatusStorage, NanoVectorDBStorage, FaissVectorDBStorage
Strengths:
- ✅ Simplicity: No external dependencies
- ✅ Development: Perfect for development and testing
- ✅ Portability: Easy to backup and move
- ✅ Cost: No infrastructure costs
Weaknesses:
- ❌ Scalability: Limited by single machine resources
- ❌ Concurrency: No built-in concurrent access
- ❌ Performance: Limited performance for large datasets
- ❌ Reliability: Single point of failure
8. ChromaDB (Vector Specialist)
Implementations: ChromaVectorDBStorage
Strengths:
- ✅ Simplicity: Easy to deploy and use
- ✅ Python Native: Built for Python ML workflows
- ✅ Metadata: Rich metadata filtering capabilities
- ✅ Local/Distributed: Can run locally or distributed
Weaknesses:
- ❌ Performance: Slower than Milvus/Qdrant for large scales
- ❌ Maturity: Newer project with evolving feature set
Recommended Stack Configurations
1. 🏆 Production High-Performance Stack
Best for: Large-scale production deployments, complex graph analytics
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
Services Required:
- Neo4j (Graph operations)
- Milvus + etcd + MinIO (Vector search)
- Redis (KV cache)
- PostgreSQL (Document status)
Pros: Maximum performance, specialized for each data type Cons: High complexity, resource intensive, expensive
graph LR
LightRAG_App["LightRAG Application"]
Neo4j_Service["Neo4j Service"]
Milvus_Cluster["Milvus Cluster (Milvus, etcd, MinIO)"]
Redis_Service["Redis Service"]
PostgreSQL_Service["PostgreSQL Service"]
LightRAG_App --> |Graph Storage| Neo4j_Service
LightRAG_App --> |Vector Storage| Milvus_Cluster
LightRAG_App --> |KV Storage| Redis_Service
LightRAG_App --> |Doc Status Storage| PostgreSQL_Service
2. 🎯 Production Balanced Stack
Best for: Production deployments prioritizing simplicity
LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: QdrantVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
Services Required:
- Qdrant (Vector search)
- Redis (KV cache)
- PostgreSQL (Document status)
- File system (Graph storage)
Pros: Good performance, simpler than full specialist stack Cons: Graph operations limited by file-based storage
graph LR
subgraph "LightRAG Application Environment"
LightRAG_App["LightRAG Application"]
NetworkX["NetworkX Graph Storage (Local FS)"]
LightRAG_App -.-> NetworkX
end
Qdrant_Service["Qdrant Service"]
Redis_Service["Redis Service"]
PostgreSQL_Service["PostgreSQL Service"]
LightRAG_App --> |Vector Storage| Qdrant_Service
LightRAG_App --> |KV Storage| Redis_Service
LightRAG_App --> |Doc Status Storage| PostgreSQL_Service
3. 💰 Production Minimal Stack
Best for: Budget-conscious production deployments
LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: PGVectorStorage
LIGHTRAG_KV_STORAGE: PGKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
Services Required:
- PostgreSQL + pgvector (All storage except graph)
- File system (Graph storage)
Pros: Single database, low cost, good for medium scale Cons: Not optimized for very large datasets or complex graphs
graph LR
subgraph "LightRAG Application Environment"
LightRAG_App["LightRAG Application"]
NetworkX["NetworkX Graph Storage (Local FS)"]
LightRAG_App -.-> NetworkX
end
PostgreSQL_Service["PostgreSQL Service (+pgvector)"]
LightRAG_App --> |Vector Storage| PostgreSQL_Service
LightRAG_App --> |KV Storage| PostgreSQL_Service
LightRAG_App --> |Doc Status Storage| PostgreSQL_Service
4. 🚀 Development & Testing Stack
Best for: Local development, testing, small deployments
LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: NanoVectorDBStorage
LIGHTRAG_KV_STORAGE: JsonKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: JsonDocStatusStorage
Services Required:
- None (all file-based)
Pros: Zero infrastructure, fast setup, portable Cons: Limited scalability and performance
graph LR
subgraph "LightRAG Application (Local Process)"
LightRAG_App["LightRAG App"]
NetworkX["NetworkX (File System)"]
NanoVectorDB["NanoVectorDB (File System)"]
JsonKV["JsonKVStorage (File System)"]
JsonDocStatus["JsonDocStatusStorage (File System)"]
LightRAG_App -.-> |Graph| NetworkX
LightRAG_App -.-> |Vector| NanoVectorDB
LightRAG_App -.-> |KV| JsonKV
LightRAG_App -.-> |Doc Status| JsonDocStatus
end
5. 🐳 Docker All-in-One Stack
Best for: Containerized deployments, cloud environments
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: QdrantVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: MongoDocStatusStorage
Services Required:
- Neo4j (Graph)
- Qdrant (Vector)
- Redis (KV)
- MongoDB (Document status)
Pros: Cloud-native, each service containerized Cons: More services to manage
graph LR
subgraph "Docker Environment (e.g., Docker Compose)"
LightRAG_Container["LightRAG App (Container)"]
Neo4j_Container["Neo4j (Container)"]
Qdrant_Container["Qdrant (Container)"]
Redis_Container["Redis (Container)"]
MongoDB_Container["MongoDB (Container)"]
end
LightRAG_Container --> |Graph Storage| Neo4j_Container
LightRAG_Container --> |Vector Storage| Qdrant_Container
LightRAG_Container --> |KV Storage| Redis_Container
LightRAG_Container --> |Doc Status Storage| MongoDB_Container
Performance Comparison
Vector Search Performance (Approximate)
| Implementation | Small (1K docs) | Medium (100K docs) | Large (1M+ docs) | Memory Usage |
|---|---|---|---|---|
| MilvusVectorDB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | High |
| QdrantVectorDB | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Medium |
| PGVectorStorage | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | Medium |
| ChromaVectorDB | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | Medium |
| FaissVectorDB | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Low |
| NanoVectorDB | ⭐⭐⭐ | ⭐⭐ | ⭐ | Low |
Graph Operations Performance
| Implementation | Node Queries | Edge Traversal | Complex Analytics | Scalability |
|---|---|---|---|---|
| Neo4JStorage | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| PGGraphStorage | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| NetworkXStorage | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ |
KV Operations Performance
| Implementation | Read Speed | Write Speed | Concurrency | Persistence |
|---|---|---|---|---|
| RedisKVStorage | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| PGKVStorage | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| MongoKVStorage | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| JsonKVStorage | ⭐⭐ | ⭐⭐ | ⭐ | ⭐⭐⭐⭐⭐ |
Deployment Considerations
Resource Requirements
| Configuration | CPU | Memory | Storage | Network |
|---|---|---|---|---|
| Development Stack | 2 cores | 4GB | 10GB | Minimal |
| Minimal Stack | 4 cores | 8GB | 50GB | Medium |
| Balanced Stack | 8 cores | 16GB | 100GB | High |
| High-Performance Stack | 16+ cores | 32GB+ | 500GB+ | Very High |
Maintenance Complexity
| Stack Type | Setup Complexity | Operational Overhead | Monitoring | Backup Strategy |
|---|---|---|---|---|
| Development | ⭐ | ⭐ | ⭐ | Simple |
| Minimal | ⭐⭐ | ⭐⭐ | ⭐⭐ | Medium |
| Balanced | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Complex |
| High-Performance | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Very Complex |
Migration Paths
Development → Production
- Start with Development Stack (all file-based)
- Migrate to Minimal Stack (PostgreSQL-based)
- Scale to Balanced Stack (add specialized vector DB)
- Optimize with High-Performance Stack (full specialization)
Data Migration Tools
- Database-specific: Use native tools (pg_dump, neo4j-admin, etc.)
- LightRAG native: Built-in export/import capabilities
- Cross-platform: JSON export for universal compatibility
Recommendations by Use Case
📚 Documentation/Knowledge Base
- Small (<10K docs): Development Stack
- Medium (<100K docs): Minimal Stack
- Large (>100K docs): Balanced Stack
🔬 Research/Analytics
- Graph-heavy: High-Performance Stack with Neo4j
- Vector-heavy: Balanced Stack with Milvus
- Mixed workload: Balanced Stack
💼 Enterprise
- High Availability: High-Performance Stack with clustering
- Budget Conscious: Minimal Stack with PostgreSQL
- Regulatory: On-premises with full control
🚀 Startups/SMBs
- MVP: Development Stack
- Growing: Minimal Stack
- Scaling: Balanced Stack
Conclusion
The Minimal Stack (PostgreSQL + NetworkX) provides the best balance of performance, complexity, and cost for most use cases. It offers:
- ✅ Production-ready reliability
- ✅ Reasonable performance for medium-scale deployments
- ✅ Low operational overhead
- ✅ Clear upgrade path to specialized components
For specialized needs:
- High graph complexity → Add Neo4j
- High vector performance → Add Qdrant/Milvus
- High concurrency KV → Add Redis
The modular architecture allows gradual optimization based on actual performance bottlenecks rather than premature optimization.
Report generated based on LightRAG v1.3.7 implementation analysis