# LightRAG Storage Backends > Complete guide to storage backend configuration and implementation **Version**: 1.4.9.1 | **Last Updated**: December 2025 --- ## Table of Contents 1. [Overview](#overview) 2. [Storage Types](#storage-types) 3. [Backend Comparison](#backend-comparison) 4. [PostgreSQL Backend](#postgresql-backend) 5. [MongoDB Backend](#mongodb-backend) 6. [Neo4j Backend](#neo4j-backend) 7. [Redis Backend](#redis-backend) 8. [File-Based Backends](#file-based-backends) 9. [Vector Databases](#vector-databases) 10. [Configuration Reference](#configuration-reference) --- ## Overview LightRAG uses four types of storage, each with multiple backend options: ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ Storage Architecture │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ LightRAG Core │ │ │ └───────────────────────────┬─────────────────────────────────────┘ │ │ │ │ │ ┌───────────────────┼───────────────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │ │ KV Storage │ │ Vector Store │ │ Graph Store │ │ │ │ (Documents, │ │ (Embeddings) │ │ (KG Nodes │ │ │ │ Chunks, │ │ │ │ & Edges) │ │ │ │ Cache) │ │ │ │ │ │ │ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Backend Implementations │ │ │ │ │ │ │ │ PostgreSQL │ MongoDB │ Redis │ Neo4j │ Milvus │ Qdrant │ FAISS │ │ │ │ JSON/File │ NetworkX │ NanoVectorDB │ Memgraph │ ... │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ ``` --- ## Storage Types ### 1. Key-Value Storage (`BaseKVStorage`) Stores documents, chunks, and LLM cache. | Implementation | Description | Use Case | |----------------|-------------|----------| | `JsonKVStorage` | File-based JSON | Development, single-node | | `PGKVStorage` | PostgreSQL tables | Production, multi-node | | `MongoKVStorage` | MongoDB collections | Production, flexible schema | | `RedisKVStorage` | Redis hash maps | High-performance caching | ### 2. Vector Storage (`BaseVectorStorage`) Stores and queries embedding vectors. | Implementation | Description | Use Case | |----------------|-------------|----------| | `NanoVectorDBStorage` | In-memory, file-persisted | Development, small datasets | | `PGVectorStorage` | PostgreSQL + pgvector | Production, unified DB | | `MilvusVectorDBStorage` | Milvus vector DB | Large-scale production | | `QdrantVectorDBStorage` | Qdrant vector DB | Cloud-native production | | `FaissVectorDBStorage` | FAISS index | Local high-performance | | `MongoVectorDBStorage` | MongoDB Atlas Vector | MongoDB ecosystem | ### 3. Graph Storage (`BaseGraphStorage`) Stores knowledge graph nodes and edges. | Implementation | Description | Use Case | |----------------|-------------|----------| | `NetworkXStorage` | In-memory NetworkX | Development, small graphs | | `PGGraphStorage` | PostgreSQL tables | Production, unified DB | | `Neo4JStorage` | Native graph DB | Complex graph queries | | `MemgraphStorage` | In-memory graph DB | Real-time analytics | | `MongoGraphStorage` | MongoDB documents | Document-graph hybrid | ### 4. Document Status Storage (`DocStatusStorage`) Tracks document processing status. | Implementation | Description | Use Case | |----------------|-------------|----------| | `JsonDocStatusStorage` | File-based JSON | Development | | `PGDocStatusStorage` | PostgreSQL | Production | | `MongoDocStatusStorage` | MongoDB | Production | | `RedisDocStatusStorage` | Redis | Distributed | --- ## Backend Comparison ### Feature Matrix ``` ┌────────────────────┬─────────┬────────┬───────┬────────┬───────────┐ │ Feature │ PG Full │ Mongo │ Neo4j │ Mixed │ File-Only │ ├────────────────────┼─────────┼────────┼───────┼────────┼───────────┤ │ KV Storage │ ✅ │ ✅ │ ❌ │ ✅ │ ✅ │ │ Vector Storage │ ✅ │ ✅ │ ❌ │ ✅ │ ✅ │ │ Graph Storage │ ✅ │ ✅ │ ✅ │ ✅ │ ✅ │ │ Doc Status │ ✅ │ ✅ │ ❌ │ ✅ │ ✅ │ │ Multi-tenant │ ✅ │ ✅ │ ✅ │ ✅ │ ⚠️ │ │ Horizontal Scale │ ✅ │ ✅ │ ✅ │ ✅ │ ❌ │ │ ACID Transactions │ ✅ │ ⚠️ │ ✅ │ ⚠️ │ ❌ │ │ Zero Dependencies │ ❌ │ ❌ │ ❌ │ ❌ │ ✅ │ │ Graph Queries │ ⚠️ │ ⚠️ │ ✅ │ ✅ │ ⚠️ │ │ Vector Search │ ✅ │ ✅ │ ❌ │ ✅ │ ✅ │ └────────────────────┴─────────┴────────┴───────┴────────┴───────────┘ Legend: ✅ Full support ⚠️ Limited ❌ Not supported ``` ### Performance Characteristics | Backend | Write Speed | Read Speed | Memory Usage | Disk Usage | |---------|-------------|------------|--------------|------------| | PostgreSQL Full | Fast | Fast | Medium | Compact | | MongoDB Full | Fast | Fast | Medium | Medium | | Neo4j + Vector | Slow | Fast (graph) | High | Medium | | File-based | Slow | Medium | Low | Compact | | Milvus/Qdrant | Fast | Very Fast | High | Large | --- ## PostgreSQL Backend ### Complete PostgreSQL Setup PostgreSQL can handle ALL storage types (recommended for production): ```python from lightrag import LightRAG rag = LightRAG( working_dir="./rag_storage", # All PostgreSQL backends kv_storage="PGKVStorage", vector_storage="PGVectorStorage", graph_storage="PGGraphStorage", doc_status_storage="PGDocStatusStorage", ) ``` ### Environment Variables ```bash # Required POSTGRES_HOST=localhost POSTGRES_PORT=5432 POSTGRES_USER=postgres POSTGRES_PASSWORD=your_password POSTGRES_DATABASE=lightrag # Optional POSTGRES_MAX_CONNECTIONS=100 POSTGRES_SSL_MODE=prefer # disable|allow|prefer|require|verify-ca|verify-full POSTGRES_SSL_CERT=/path/to/cert POSTGRES_SSL_KEY=/path/to/key POSTGRES_SSL_ROOT_CERT=/path/to/ca # Vector index configuration POSTGRES_VECTOR_INDEX_TYPE=hnsw # hnsw|ivfflat POSTGRES_HNSW_M=16 POSTGRES_HNSW_EF=64 POSTGRES_IVFFLAT_LISTS=100 ``` ### Schema Overview ```sql -- Documents table CREATE TABLE LIGHTRAG_DOC_FULL ( workspace VARCHAR(1024) NOT NULL, id VARCHAR(255) NOT NULL, doc_name VARCHAR(1024), content TEXT, meta JSONB, createtime TIMESTAMP(0) DEFAULT CURRENT_TIMESTAMP, updatetime TIMESTAMP(0) DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (workspace, id) ); -- Chunks table CREATE TABLE LIGHTRAG_DOC_CHUNKS ( workspace VARCHAR(1024) NOT NULL, id VARCHAR(255) NOT NULL, full_doc_id VARCHAR(255), chunk_order_index INT, tokens INT, content TEXT, content_summary TEXT, file_path VARCHAR(32768), create_time TIMESTAMP(0) DEFAULT CURRENT_TIMESTAMP, update_time TIMESTAMP(0) DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (workspace, id) ); -- Entity vectors (pgvector extension required) CREATE TABLE LIGHTRAG_VDB_ENTITY ( workspace VARCHAR(1024) NOT NULL, id VARCHAR(255) NOT NULL, entity_name VARCHAR(1024), content TEXT, content_vector VECTOR(1024), -- Adjust dimension to match embedding source_id TEXT, file_path TEXT, create_time TIMESTAMP(0) DEFAULT CURRENT_TIMESTAMP, update_time TIMESTAMP(0) DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (workspace, id) ); -- Graph nodes CREATE TABLE LIGHTRAG_GRAPH_NODES ( workspace VARCHAR(1024) NOT NULL, id VARCHAR(255) NOT NULL, entity_type VARCHAR(255), description TEXT, source_id TEXT, file_path TEXT, created_at INT, PRIMARY KEY (workspace, id) ); -- Graph edges CREATE TABLE LIGHTRAG_GRAPH_EDGES ( workspace VARCHAR(1024) NOT NULL, source_id VARCHAR(255) NOT NULL, target_id VARCHAR(255) NOT NULL, weight FLOAT, description TEXT, keywords TEXT, source_chunk_id TEXT, file_path TEXT, created_at INT, PRIMARY KEY (workspace, source_id, target_id) ); ``` ### pgvector Index Types ```sql -- HNSW index (recommended for accuracy) CREATE INDEX ON LIGHTRAG_VDB_ENTITY USING hnsw (content_vector vector_cosine_ops) WITH (m = 16, ef_construction = 64); -- IVFFlat index (faster but less accurate) CREATE INDEX ON LIGHTRAG_VDB_ENTITY USING ivfflat (content_vector vector_cosine_ops) WITH (lists = 100); ``` --- ## MongoDB Backend ### Complete MongoDB Setup ```python from lightrag import LightRAG rag = LightRAG( working_dir="./rag_storage", # All MongoDB backends kv_storage="MongoKVStorage", vector_storage="MongoVectorDBStorage", graph_storage="MongoGraphStorage", doc_status_storage="MongoDocStatusStorage", ) ``` ### Environment Variables ```bash MONGO_URI=mongodb://localhost:27017 MONGO_DATABASE=lightrag # Atlas Vector Search (optional) MONGO_ATLAS_CLUSTER=your-cluster MONGO_ATLAS_API_KEY=your-api-key ``` ### Collection Structure ```javascript // Documents collection db.lightrag_doc_full.insertOne({ _id: "workspace:doc_id", workspace: "default", doc_id: "abc123", doc_name: "document.txt", content: "Full document text...", meta: { source: "upload" }, created_at: ISODate(), updated_at: ISODate() }); // Entities collection (with vector) db.lightrag_entities.insertOne({ _id: "workspace:entity_id", workspace: "default", entity_name: "Apple Inc.", entity_type: "organization", description: "Technology company...", content: "Apple Inc.\nTechnology company...", embedding: [0.1, 0.2, ...], // Vector embedding source_id: "chunk_001,chunk_002", file_path: "document.txt" }); // Graph edges collection db.lightrag_graph_edges.insertOne({ _id: "workspace:source:target", workspace: "default", source: "Apple Inc.", target: "iPhone", weight: 3.5, description: "Produces the iPhone", keywords: "technology,smartphone" }); ``` ### Vector Search Index (Atlas) ```javascript // Create vector search index db.lightrag_entities.createSearchIndex({ name: "vector_index", definition: { mappings: { dynamic: true, fields: { embedding: { type: "knnVector", dimensions: 1024, similarity: "cosine" } } } } }); ``` --- ## Neo4j Backend ### Neo4j for Graph Storage Neo4j provides native graph storage with Cypher queries: ```python from lightrag import LightRAG rag = LightRAG( working_dir="./rag_storage", # Neo4j for graph, other backends for KV/Vector kv_storage="PGKVStorage", # or JsonKVStorage vector_storage="PGVectorStorage", # or other vector DB graph_storage="Neo4JStorage", # Neo4j graph doc_status_storage="PGDocStatusStorage", ) ``` ### Environment Variables ```bash NEO4J_URI=bolt://localhost:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your_password # Optional NEO4J_DATABASE=neo4j NEO4J_ENCRYPTED=false ``` ### Graph Schema ```cypher // Entity nodes CREATE (e:Entity { entity_id: "Apple Inc.", entity_type: "organization", description: "Technology company...", source_id: "chunk_001", workspace: "default" }) // Relationship MATCH (a:Entity {entity_id: "Apple Inc."}) MATCH (b:Entity {entity_id: "iPhone"}) CREATE (a)-[r:RELATED_TO { weight: 3.5, description: "Produces", keywords: "technology" }]->(b) ``` ### Cypher Queries Used ```cypher -- Get node with edges MATCH (n:Entity {entity_id: $entity_id, workspace: $workspace}) OPTIONAL MATCH (n)-[r]-(m) RETURN n, r, m -- Get knowledge graph (BFS) MATCH path = (start:Entity {entity_id: $label})-[*1..3]-(connected) WHERE start.workspace = $workspace RETURN path LIMIT $max_nodes -- Search nodes MATCH (n:Entity) WHERE n.workspace = $workspace AND toLower(n.entity_id) CONTAINS toLower($query) RETURN n.entity_id ORDER BY n.degree DESC LIMIT $limit ``` --- ## Redis Backend ### Redis for KV and Doc Status ```python from lightrag import LightRAG rag = LightRAG( working_dir="./rag_storage", kv_storage="RedisKVStorage", vector_storage="NanoVectorDBStorage", # Redis doesn't have vector graph_storage="NetworkXStorage", # Redis doesn't have graph doc_status_storage="RedisDocStatusStorage", ) ``` ### Environment Variables ```bash REDIS_URI=redis://localhost:6379 # or with auth REDIS_URI=redis://user:password@localhost:6379/0 ``` ### Key Structure ``` # Document storage lightrag:{workspace}:full_docs:{doc_id} -> JSON document # Chunks storage lightrag:{workspace}:text_chunks:{chunk_id} -> JSON chunk # LLM cache lightrag:{workspace}:llm_cache:{cache_key} -> JSON response # Document status lightrag:{workspace}:doc_status:{doc_id} -> JSON status ``` --- ## File-Based Backends ### Zero-Dependency Setup Best for development and small-scale usage: ```python from lightrag import LightRAG rag = LightRAG( working_dir="./rag_storage", # All file-based (default) kv_storage="JsonKVStorage", vector_storage="NanoVectorDBStorage", graph_storage="NetworkXStorage", doc_status_storage="JsonDocStatusStorage", ) ``` ### File Structure ``` ./rag_storage/ ├── full_docs.json # Complete documents ├── text_chunks.json # Document chunks ├── llm_response_cache.json # LLM cache ├── full_entities.json # Entity metadata ├── full_relations.json # Relation metadata ├── vdb_entities.json # Entity vectors ├── vdb_relationships.json # Relation vectors ├── vdb_chunks.json # Chunk vectors ├── graph_chunk_entity_relation.graphml # Knowledge graph └── doc_status.json # Processing status ``` ### NanoVectorDB Format ```json { "data": { "ent-abc123": { "__id__": "ent-abc123", "__vector__": [0.1, 0.2, 0.3, ...], "entity_name": "Apple Inc.", "content": "Apple Inc.\nTechnology company", "source_id": "chunk_001" } }, "matrix": [[0.1, 0.2, ...], ...], "index_to_id": ["ent-abc123", ...] } ``` --- ## Vector Databases ### Milvus ```python rag = LightRAG( vector_storage="MilvusVectorDBStorage", vector_db_storage_cls_kwargs={ "host": "localhost", "port": 19530, "collection_name": "lightrag_vectors" } ) ``` ```bash # Environment variables MILVUS_HOST=localhost MILVUS_PORT=19530 MILVUS_TOKEN=your_token # For Zilliz Cloud ``` ### Qdrant ```python rag = LightRAG( vector_storage="QdrantVectorDBStorage", vector_db_storage_cls_kwargs={ "collection_name": "lightrag" } ) ``` ```bash QDRANT_URL=http://localhost:6333 QDRANT_API_KEY=your_api_key # Optional ``` ### FAISS ```python rag = LightRAG( vector_storage="FaissVectorDBStorage", vector_db_storage_cls_kwargs={ "index_type": "IVF_FLAT", # or HNSW "nlist": 100 } ) ``` --- ## Configuration Reference ### Complete Environment Variables ```bash # Storage Selection KV_STORAGE=PGKVStorage VECTOR_STORAGE=PGVectorStorage GRAPH_STORAGE=PGGraphStorage DOC_STATUS_STORAGE=PGDocStatusStorage # PostgreSQL POSTGRES_HOST=localhost POSTGRES_PORT=5432 POSTGRES_USER=postgres POSTGRES_PASSWORD=secret POSTGRES_DATABASE=lightrag POSTGRES_MAX_CONNECTIONS=100 POSTGRES_SSL_MODE=prefer # MongoDB MONGO_URI=mongodb://localhost:27017 MONGO_DATABASE=lightrag # Neo4j NEO4J_URI=bolt://localhost:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=password # Redis REDIS_URI=redis://localhost:6379 # Milvus MILVUS_HOST=localhost MILVUS_PORT=19530 # Qdrant QDRANT_URL=http://localhost:6333 QDRANT_API_KEY= # Memgraph MEMGRAPH_URI=bolt://localhost:7687 ``` ### Programmatic Configuration ```python from lightrag import LightRAG rag = LightRAG( # Working directory working_dir="./rag_storage", workspace="my_project", # Multi-tenant namespace # Storage backends kv_storage="PGKVStorage", vector_storage="PGVectorStorage", graph_storage="PGGraphStorage", doc_status_storage="PGDocStatusStorage", # Vector DB options vector_db_storage_cls_kwargs={ "cosine_better_than_threshold": 0.2, # Backend-specific options... }, # Processing chunk_token_size=1200, chunk_overlap_token_size=100, ) ``` --- ## Multi-Tenant Data Isolation All storage backends support multi-tenant isolation: ```python # Workspace creates isolated namespace rag = LightRAG( working_dir="./rag_storage", workspace="tenant_a:kb_prod", # Composite namespace ) # Or with explicit tenant context from lightrag.tenant_rag_manager import TenantRAGManager manager = TenantRAGManager( base_working_dir="./rag_storage", tenant_service=tenant_service, template_rag=template_rag, ) # Get tenant-specific instance rag = await manager.get_rag_instance( tenant_id="tenant_a", kb_id="kb_prod" ) ``` ### Isolation Pattern ``` ┌─────────────────────────────────────────────────────────────┐ │ Multi-Tenant Data Isolation │ ├─────────────────────────────────────────────────────────────┤ │ │ │ PostgreSQL: WHERE workspace = 'tenant_a:kb_prod:default' │ │ │ │ MongoDB: { workspace: "tenant_a:kb_prod:default" } │ │ │ │ Redis: lightrag:tenant_a:kb_prod:default:{key} │ │ │ │ Neo4j: MATCH (n {workspace: $workspace}) │ │ │ │ File: ./rag_storage/tenant_a:kb_prod/ │ │ │ └─────────────────────────────────────────────────────────────┘ ``` --- ## Migration Between Backends ### Export/Import Pattern ```python # Export from source source_rag = LightRAG( kv_storage="JsonKVStorage", vector_storage="NanoVectorDBStorage", graph_storage="NetworkXStorage", ) # Initialize source await source_rag.initialize_storages() # Get all data docs = await source_rag.full_docs.get_all() chunks = await source_rag.text_chunks.get_all() # ... export other data # Import to target target_rag = LightRAG( kv_storage="PGKVStorage", vector_storage="PGVectorStorage", graph_storage="PGGraphStorage", ) await target_rag.initialize_storages() await target_rag.full_docs.upsert(docs) await target_rag.text_chunks.upsert(chunks) # ... import other data await target_rag.finalize_storages() ``` --- ## Best Practices ### Production Recommendations 1. **Use PostgreSQL Full Stack** for simplicity and reliability 2. **Enable connection pooling** for high concurrency 3. **Create indexes** on frequently queried columns 4. **Monitor storage growth** and plan capacity 5. **Regular backups** with point-in-time recovery 6. **Use SSL/TLS** for database connections ### Performance Tuning ```bash # PostgreSQL tuning POSTGRES_MAX_CONNECTIONS=200 POSTGRES_VECTOR_INDEX_TYPE=hnsw POSTGRES_HNSW_M=32 POSTGRES_HNSW_EF=128 # LightRAG tuning MAX_PARALLEL_INSERT=4 EMBEDDING_BATCH_NUM=20 MAX_ASYNC=8 ``` --- **Version**: 1.4.9.1 | **License**: MIT