Taddeus 4d9342c8e1 Cleans up documentation and deployment scripts for consistency

Removes trailing whitespace and fixes minor formatting issues in Kubernetes deployment docs, storage report, and Helm chart files.

Standardizes indentation and spacing in Docker Compose and deployment shell scripts to improve readability and maintainability.

These edits improve documentation clarity and make deployment scripts more robust without altering functionality.

Relates to MLO-469

2025-11-03 14:23:16 +02:00

16 KiB

Raw Blame History

LightRAG Storage Stack Configurations Report

Executive Summary

LightRAG supports a modular storage architecture with 4 distinct storage types that can be mixed and matched:

Graph Storage: Knowledge graph relationships
Vector Storage: Document embeddings
KV Storage: Key-value pairs and metadata
Document Status Storage: Document processing status

This report analyzes 25+ storage implementations across 8 database technologies to provide recommendations for different use cases.

Storage Architecture Overview

Storage Types & Available Implementations

Storage Type	Implementations	Count
Graph Storage	NetworkXStorage, Neo4JStorage, PGGraphStorage, AGEStorage¹, MongoGraphStorage¹	5
Vector Storage	NanoVectorDBStorage, MilvusVectorDBStorage, ChromaVectorDBStorage, PGVectorStorage, FaissVectorDBStorage, QdrantVectorDBStorage, MongoVectorDBStorage	7
KV Storage	JsonKVStorage, RedisKVStorage, PGKVStorage, MongoKVStorage	4
Doc Status Storage	JsonDocStatusStorage, PGDocStatusStorage, MongoDocStatusStorage	3

¹ Currently commented out in production

Database Technology Analysis

1. PostgreSQL + pgvector

Implementations: PGVectorStorage, PGKVStorage, PGGraphStorage, PGDocStatusStorage

Strengths:

✅ Unified Database: Single database for all storage types
✅ ACID Compliance: Full transactional support
✅ Mature Ecosystem: Well-established, enterprise-ready
✅ Minimal: Single database to maintain
✅ pgvector Extension: Native vector operations with good performance
✅ SQL Familiarity: Easy to query and debug

Weaknesses:

❌ Graph Limitations: Requires AGE extension for advanced graph operations
❌ Vector Performance: Good but not specialized vector database performance
❌ Single Point of Failure: All data in one database

Configuration:

LIGHTRAG_KV_STORAGE: PGKVStorage
LIGHTRAG_VECTOR_STORAGE: PGVectorStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
LIGHTRAG_GRAPH_STORAGE: PGGraphStorage  # Requires AGE extension

2. Neo4j (Graph Specialist)

Implementations: Neo4JStorage

Strengths:

✅ Graph Optimization: Purpose-built for graph operations
✅ Advanced Graph Analytics: Complex graph algorithms built-in
✅ Cypher Query Language: Powerful graph query capabilities
✅ Scalability: Excellent for large, complex graphs
✅ Visualization: Rich graph visualization tools

Weaknesses:

❌ Graph Only: Requires additional databases for vectors/KV
❌ Complexity: More complex setup and maintenance
❌ Cost: Enterprise features require licensing
❌ Memory Usage: Can be memory-intensive

Typical Configuration:

LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage  # Or Qdrant
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage

3. Milvus (Vector Specialist)

Implementations: MilvusVectorDBStorage

Strengths:

✅ Vector Performance: Optimized for high-performance vector search
✅ Scalability: Designed for billion-scale vector collections
✅ Multiple Indexes: Various indexing algorithms (IVF, HNSW, etc.)
✅ GPU Support: CUDA acceleration for vector operations
✅ Cloud Native: Kubernetes-ready architecture

Weaknesses:

❌ Complexity: Complex distributed architecture
❌ Resource Usage: High memory and compute requirements
❌ Overkill: May be excessive for smaller datasets
❌ Dependencies: Requires etcd and MinIO for full deployment

Typical Configuration:

LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: MongoDocStatusStorage

4. Qdrant (Vector Specialist)

Implementations: QdrantVectorDBStorage

Strengths:

✅ Performance: High-performance vector search with Rust backend
✅ Simplicity: Easier deployment than Milvus
✅ Filtering: Advanced payload filtering capabilities
✅ API: Rich REST and gRPC APIs
✅ Memory Efficiency: Lower memory footprint than Milvus

Weaknesses:

❌ Ecosystem: Smaller ecosystem compared to alternatives
❌ Vector Only: Requires additional databases for other storage types

5. MongoDB (Multi-Purpose)

Implementations: MongoKVStorage, MongoVectorDBStorage, MongoDocStatusStorage

Strengths:

✅ Flexibility: Schema-less document storage
✅ Vector Search: Native vector search capabilities (Atlas Search)
✅ Multi-Purpose: Can handle KV, vectors, and document status
✅ Scalability: Horizontal scaling with sharding
✅ Developer Friendly: Easy to work with JSON documents

Weaknesses:

❌ Graph Limitations: Not optimized for graph operations
❌ Vector Performance: Vector search not as optimized as specialists
❌ Memory Usage: Can be memory-intensive for large datasets

6. Redis (KV Specialist)

Implementations: RedisKVStorage

Strengths:

✅ Speed: In-memory performance for KV operations
✅ Simplicity: Simple key-value operations
✅ Data Structures: Rich data structures (lists, sets, hashes)
✅ Caching: Excellent for caching and session storage

Weaknesses:

❌ Memory Bound: Limited by available RAM
❌ KV Only: Only suitable for key-value storage
❌ Persistence: Data persistence requires configuration

7. Local File Storage

Implementations: NetworkXStorage, JsonKVStorage, JsonDocStatusStorage, NanoVectorDBStorage, FaissVectorDBStorage

Strengths:

✅ Simplicity: No external dependencies
✅ Development: Perfect for development and testing
✅ Portability: Easy to backup and move
✅ Cost: No infrastructure costs

Weaknesses:

❌ Scalability: Limited by single machine resources
❌ Concurrency: No built-in concurrent access
❌ Performance: Limited performance for large datasets
❌ Reliability: Single point of failure

8. ChromaDB (Vector Specialist)

Implementations: ChromaVectorDBStorage

Strengths:

✅ Simplicity: Easy to deploy and use
✅ Python Native: Built for Python ML workflows
✅ Metadata: Rich metadata filtering capabilities
✅ Local/Distributed: Can run locally or distributed

Weaknesses:

❌ Performance: Slower than Milvus/Qdrant for large scales
❌ Maturity: Newer project with evolving feature set

Recommended Stack Configurations

1. 🏆 Production High-Performance Stack

Best for: Large-scale production deployments, complex graph analytics

LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: MilvusVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage

Services Required:

Neo4j (Graph operations)
Milvus + etcd + MinIO (Vector search)
Redis (KV cache)
PostgreSQL (Document status)

Pros: Maximum performance, specialized for each data type Cons: High complexity, resource intensive, expensive

graph LR
    LightRAG_App["LightRAG Application"]
    Neo4j_Service["Neo4j Service"]
    Milvus_Cluster["Milvus Cluster (Milvus, etcd, MinIO)"]
    Redis_Service["Redis Service"]
    PostgreSQL_Service["PostgreSQL Service"]

    LightRAG_App --> |Graph Storage| Neo4j_Service
    LightRAG_App --> |Vector Storage| Milvus_Cluster
    LightRAG_App --> |KV Storage| Redis_Service
    LightRAG_App --> |Doc Status Storage| PostgreSQL_Service

2. 🎯 Production Balanced Stack

Best for: Production deployments prioritizing simplicity

LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: QdrantVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage

Services Required:

Qdrant (Vector search)
Redis (KV cache)
PostgreSQL (Document status)
File system (Graph storage)

Pros: Good performance, simpler than full specialist stack Cons: Graph operations limited by file-based storage

graph LR
    subgraph "LightRAG Application Environment"
        LightRAG_App["LightRAG Application"]
        NetworkX["NetworkX Graph Storage (Local FS)"]
        LightRAG_App -.-> NetworkX
    end
    Qdrant_Service["Qdrant Service"]
    Redis_Service["Redis Service"]
    PostgreSQL_Service["PostgreSQL Service"]

    LightRAG_App --> |Vector Storage| Qdrant_Service
    LightRAG_App --> |KV Storage| Redis_Service
    LightRAG_App --> |Doc Status Storage| PostgreSQL_Service

3. 💰 Production Minimal Stack

Best for: Budget-conscious production deployments

LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: PGVectorStorage
LIGHTRAG_KV_STORAGE: PGKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage

Services Required:

PostgreSQL + pgvector (All storage except graph)
File system (Graph storage)

Pros: Single database, low cost, good for medium scale Cons: Not optimized for very large datasets or complex graphs

graph LR
    subgraph "LightRAG Application Environment"
        LightRAG_App["LightRAG Application"]
        NetworkX["NetworkX Graph Storage (Local FS)"]
        LightRAG_App -.-> NetworkX
    end
    PostgreSQL_Service["PostgreSQL Service (+pgvector)"]

    LightRAG_App --> |Vector Storage| PostgreSQL_Service
    LightRAG_App --> |KV Storage| PostgreSQL_Service
    LightRAG_App --> |Doc Status Storage| PostgreSQL_Service

4. 🚀 Development & Testing Stack

Best for: Local development, testing, small deployments

LIGHTRAG_GRAPH_STORAGE: NetworkXStorage
LIGHTRAG_VECTOR_STORAGE: NanoVectorDBStorage
LIGHTRAG_KV_STORAGE: JsonKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: JsonDocStatusStorage

Services Required:

None (all file-based)

Pros: Zero infrastructure, fast setup, portable Cons: Limited scalability and performance

graph LR
    subgraph "LightRAG Application (Local Process)"
        LightRAG_App["LightRAG App"]
        NetworkX["NetworkX (File System)"]
        NanoVectorDB["NanoVectorDB (File System)"]
        JsonKV["JsonKVStorage (File System)"]
        JsonDocStatus["JsonDocStatusStorage (File System)"]

        LightRAG_App -.-> |Graph| NetworkX
        LightRAG_App -.-> |Vector| NanoVectorDB
        LightRAG_App -.-> |KV| JsonKV
        LightRAG_App -.-> |Doc Status| JsonDocStatus
    end

5. 🐳 Docker All-in-One Stack

Best for: Containerized deployments, cloud environments

LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_VECTOR_STORAGE: QdrantVectorDBStorage
LIGHTRAG_KV_STORAGE: RedisKVStorage
LIGHTRAG_DOC_STATUS_STORAGE: MongoDocStatusStorage

Services Required:

Neo4j (Graph)
Qdrant (Vector)
Redis (KV)
MongoDB (Document status)

Pros: Cloud-native, each service containerized Cons: More services to manage

graph LR
    subgraph "Docker Environment (e.g., Docker Compose)"
        LightRAG_Container["LightRAG App (Container)"]
        Neo4j_Container["Neo4j (Container)"]
        Qdrant_Container["Qdrant (Container)"]
        Redis_Container["Redis (Container)"]
        MongoDB_Container["MongoDB (Container)"]
    end
    LightRAG_Container --> |Graph Storage| Neo4j_Container
    LightRAG_Container --> |Vector Storage| Qdrant_Container
    LightRAG_Container --> |KV Storage| Redis_Container
    LightRAG_Container --> |Doc Status Storage| MongoDB_Container

Performance Comparison

Vector Search Performance (Approximate)

Implementation	Small (1K docs)	Medium (100K docs)	Large (1M+ docs)	Memory Usage
MilvusVectorDB	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	High
QdrantVectorDB	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	Medium
PGVectorStorage	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐	Medium
ChromaVectorDB	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐	Medium
FaissVectorDB	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	Low
NanoVectorDB	⭐⭐⭐	⭐⭐	⭐	Low

Graph Operations Performance

Implementation	Node Queries	Edge Traversal	Complex Analytics	Scalability
Neo4JStorage	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
PGGraphStorage	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
NetworkXStorage	⭐⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐

KV Operations Performance

Implementation	Read Speed	Write Speed	Concurrency	Persistence
RedisKVStorage	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
PGKVStorage	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
MongoKVStorage	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
JsonKVStorage	⭐⭐	⭐⭐	⭐	⭐⭐⭐⭐⭐

Deployment Considerations

Resource Requirements

Configuration	CPU	Memory	Storage	Network
Development Stack	2 cores	4GB	10GB	Minimal
Minimal Stack	4 cores	8GB	50GB	Medium
Balanced Stack	8 cores	16GB	100GB	High
High-Performance Stack	16+ cores	32GB+	500GB+	Very High

Maintenance Complexity

Stack Type	Setup Complexity	Operational Overhead	Monitoring	Backup Strategy
Development	⭐	⭐	⭐	Simple
Minimal	⭐⭐	⭐⭐	⭐⭐	Medium
Balanced	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	Complex
High-Performance	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Very Complex

Migration Paths

Development → Production

Start with Development Stack (all file-based)
Migrate to Minimal Stack (PostgreSQL-based)
Scale to Balanced Stack (add specialized vector DB)
Optimize with High-Performance Stack (full specialization)

Data Migration Tools

Database-specific: Use native tools (pg_dump, neo4j-admin, etc.)
LightRAG native: Built-in export/import capabilities
Cross-platform: JSON export for universal compatibility

Recommendations by Use Case

📚 Documentation/Knowledge Base

Small (<10K docs): Development Stack
Medium (<100K docs): Minimal Stack
Large (>100K docs): Balanced Stack

🔬 Research/Analytics

Graph-heavy: High-Performance Stack with Neo4j
Vector-heavy: Balanced Stack with Milvus
Mixed workload: Balanced Stack

💼 Enterprise

High Availability: High-Performance Stack with clustering
Budget Conscious: Minimal Stack with PostgreSQL
Regulatory: On-premises with full control

🚀 Startups/SMBs

MVP: Development Stack
Growing: Minimal Stack
Scaling: Balanced Stack

Conclusion

The Minimal Stack (PostgreSQL + NetworkX) provides the best balance of performance, complexity, and cost for most use cases. It offers:

✅ Production-ready reliability
✅ Reasonable performance for medium-scale deployments
✅ Low operational overhead
✅ Clear upgrade path to specialized components

For specialized needs:

High graph complexity → Add Neo4j
High vector performance → Add Qdrant/Milvus
High concurrency KV → Add Redis

The modular architecture allows gradual optimization based on actual performance bottlenecks rather than premature optimization.

Report generated based on LightRAG v1.3.7 implementation analysis

16 KiB Raw Blame History

LightRAG Storage Stack Configurations Report

Executive Summary

Storage Architecture Overview

Storage Types & Available Implementations

Database Technology Analysis

1. PostgreSQL + pgvector

2. Neo4j (Graph Specialist)

3. Milvus (Vector Specialist)

4. Qdrant (Vector Specialist)

5. MongoDB (Multi-Purpose)

6. Redis (KV Specialist)

7. Local File Storage

8. ChromaDB (Vector Specialist)

Recommended Stack Configurations

1. 🏆 Production High-Performance Stack

2. 🎯 Production Balanced Stack

3. 💰 Production Minimal Stack

4. 🚀 Development & Testing Stack

5. 🐳 Docker All-in-One Stack

Performance Comparison

Vector Search Performance (Approximate)

Graph Operations Performance

KV Operations Performance

Deployment Considerations

Resource Requirements

Maintenance Complexity

Migration Paths

Development → Production

Data Migration Tools

Recommendations by Use Case

📚 Documentation/Knowledge Base

🔬 Research/Analytics

💼 Enterprise

🚀 Startups/SMBs

Conclusion

16 KiB

Raw Blame History