# LightRAG Multi-Tenant Stack with PostgreSQL A complete, production-ready multi-tenant RAG (Retrieval-Augmented Generation) system using LightRAG with PostgreSQL as the backend. ## πŸš€ Quick Start ```bash # 1. Initialize environment (first time only) make setup # 2. Start all services make up # 3. Initialize database schema make init-db # 4. View service status make status # 5. Access the application # WebUI: http://localhost:3001 # API Server: http://localhost:8000 # API Docs: http://localhost:8000/docs ## πŸ” Demo credentials (local/dev only) Use the following defaults when running the stack locally for demonstrations or testing. These come from `starter/env.example` β€” change them in `.env` for any shared or production deployments. ``` User: lightrag Password: lightrag_secure_password Database: lightrag_multitenant Host: postgres (inside Docker) Port: 5432 (internal-only; not forwarded to localhost by default) ``` ⚠️ Note: These credentials are for development/demos only. Always pick strong, unique passwords for production and avoid committing secrets to source control. ``` ## πŸ“‹ System Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ LightRAG Multi-Tenant Stack β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Web UI (React) β”‚ β”‚ β”‚ β”‚ http://localhost:3001 β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ LightRAG API Server (FastAPI) β”‚ β”‚ β”‚ β”‚ http://localhost:8000 β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ Multi-Tenant Context: (tenant_id, kb_id) β”‚ β”‚ β”‚ β”‚ - Enforces data isolation at API level β”‚ β”‚ β”‚ β”‚ - Routes queries to appropriate backends β”‚ β”‚ β”‚ β”‚ - Manages document processing β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ PostgreSQLβ”‚ β”‚ Redis β”‚ β”‚ Embeddingβ”‚ β”‚ β”‚ β”‚ Storage β”‚ β”‚ Cache β”‚ β”‚ Service β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ (Ollama) β”‚ β”‚ β”‚ β”‚ - KV β”‚ β”‚ LLM cache β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ - Documentsβ”‚ β”‚ Session β”‚ β”‚ bge-m3 β”‚ β”‚ β”‚ β”‚ - Entitiesβ”‚ β”‚ Temporary β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ - Relationsβ”‚ β”‚ Data β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ - Vectors β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## 🎯 Key Features ### Multi-Tenant Data Isolation - **Composite Key Pattern**: Each resource identified by `(tenant_id, kb_id, resource_id)` - **Database-Level Enforcement**: Queries automatically scoped to tenant/KB - **Cross-Tenant Access Prevention**: Impossible to retrieve data from other tenants - **Complete Isolation**: Works across all 10 storage backends ### Storage Architecture - **PostgreSQL**: Primary storage with pgvector extension - Key-Value storage (PGKVStorage) - Document metadata (PGDocStatusStorage) - Knowledge graph (PGGraphStorage) - Vector embeddings (PGVectorStorage) - **Redis**: Caching and session management - **Embedding Service**: Ollama (configurable to OpenAI, Jina, etc.) ### Supported Tenants & Knowledge Bases Default sample data (automatically created): ``` Tenant: 595ea68b-0f3a-4dbe-8a86-9276a1bbd10c └─ kb-prod (Production KB) └─ kb-dev (Development KB) Tenant: 44bf3e0d-d633-4dea-9b74-3e24140cd7e3 └─ kb-main (Main KB) └─ kb-backup (Backup KB) ``` ## πŸ“– Makefile Commands ### Setup & Configuration ```bash make help # Show all available commands make setup # Initialize .env file (first time only) make init-db # Initialize PostgreSQL database schema ``` ### Service Control ```bash make up # Start all services make down # Stop all services make restart # Restart all services make logs # Stream logs from all services make logs-api # Stream logs from API only make logs-db # Stream logs from PostgreSQL only make logs-webui # Stream logs from WebUI only make status # Show status of all running services ``` ### Database Management ```bash make db-shell # Connect to PostgreSQL interactive shell make db-backup # Create database backup make db-restore # Restore from latest backup make db-reset # Delete and reinitialize database (⚠️ WARNING) ``` ### Health & Testing ```bash make api-health # Check API health status make test # Run multi-tenant tests make test-isolation # Run tenant isolation tests ``` ### Cleanup & Maintenance ```bash make clean # Remove stopped containers and dangling images make reset # Full system reset (⚠️ WARNING: deletes all data) make prune # Prune unused Docker resources ``` ## πŸ”§ Configuration ### Environment Variables Edit `.env` file to configure: ```bash # LLM Provider (OpenAI, Ollama, Azure, etc.) LLM_BINDING=openai LLM_MODEL=gpt-4o LLM_BINDING_API_KEY=your_api_key_here # Embedding Service EMBEDDING_BINDING=ollama EMBEDDING_MODEL=bge-m3:latest EMBEDDING_BINDING_HOST=http://localhost:11434 # Database Credentials POSTGRES_USER=lightrag POSTGRES_PASSWORD=lightrag_secure_password # Multi-Tenant Settings DEFAULT_TENANT=default DEFAULT_KB=default # See env.template.example for all available options ``` ## πŸ” Security & Multi-Tenant Isolation ### Isolation Guarantees 1. **Database-Level Filtering**: Every query includes `tenant_id` and `kb_id` constraints 2. **Composite Key Constraints**: Prevents accidental ID collisions between tenants 3. **No Application-Level Trust**: Storage layer enforces isolation even if app code has bugs 4. **Audit Trail**: All operations include tenant context for traceability ### Best Practices βœ… **DO:** - Always pass tenant context to every operation - Use support module helpers for queries - Create composite indexes on (tenant_id, kb_id, ...) - Validate tenant context early in request pipeline - Log all tenant-related operations ❌ **DON'T:** - Query without tenant filtering - Hardcode tenant IDs in application code - Assume application code enforces isolation - Skip index creation after migration - Mix tenants in a single transaction ## πŸ“‹ Service Endpoints | Service | URL | Purpose | |---------|-----|---------| | **WebUI** | `http://localhost:3001` | Interactive frontend for document upload, KB visualization, queries | | **API Server** | `http://localhost:8000` | RESTful API for programmatic access | | **PostgreSQL** | `internal-only (container network)` | Database backend (not exposed to host by default) | | **Redis** | `localhost:6379` | Cache backend (internal only) | | **Health Check** | `http://localhost:8000/health` | API health status | ## πŸ§ͺ Testing Multi-Tenant Features ### Run All Multi-Tenant Tests ```bash make test ``` ### Run Specific Test Suites ```bash # Test tenant isolation make test-isolation # Test PostgreSQL backend pytest tests/test_multi_tenant_backends.py::TestPostgreSQLTenantSupport -v # Test data integrity pytest tests/test_multi_tenant_backends.py::TestDataIntegrity -v ``` ### Manual Testing 1. **Create document for tenant "595ea68b-0f3a-4dbe-8a86-9276a1bbd10c"**: ```bash curl -X POST http://localhost:8000/api/v1/insert \ -H "Content-Type: application/json" \ -H "X-Tenant-Id: 595ea68b-0f3a-4dbe-8a86-9276a1bbd10c" \ -H "X-KB-Id: kb-prod" \ -d '{"document": "Sample document"}' ``` 2. **Query as "595ea68b-0f3a-4dbe-8a86-9276a1bbd10c"**: ```bash curl "http://localhost:8000/api/v1/query" \ -H "X-Tenant-Id: 595ea68b-0f3a-4dbe-8a86-9276a1bbd10c" \ -H "X-KB-Id: kb-prod" \ -G --data-urlencode "param=test" ``` 3. **Verify isolation** - query with different tenant: ```bash curl "http://localhost:8000/api/v1/query" \ -H "X-Tenant-Id: 44bf3e0d-d633-4dea-9b74-3e24140cd7e3" \ -H "X-KB-Id: kb-main" \ -G --data-urlencode "param=test" # Should return different or empty results ``` ## πŸ“¦ Docker Services ### PostgreSQL (pgvector/pgvector:pg15-latest) - **Purpose**: Primary data storage with vector support - **Volume**: `postgres_data` (persists database files) - **Port**: 5432 (internal), configurable via `POSTGRES_PORT` - **Health Check**: Every 10 seconds ### Redis (redis:7-alpine) - **Purpose**: Caching, sessions, temporary data - **Volume**: `redis_data` (persists snapshot) - **Port**: 6379 (internal), configurable via `REDIS_PORT` - **Health Check**: Every 10 seconds ### LightRAG API - **Port**: 8621 (internal), 8000 (external/host) - **Volume**: `./data/*` (documents, storage, tiktoken cache) - **Dependencies**: PostgreSQL, Redis - **Health Check**: Every 30 seconds - **Resources**: Limited to 2 CPUs / 4GB RAM ### Web UI - **Port**: 3000 (internal), 3001 (external/host) - **Framework**: React + Vite - **Dependencies**: LightRAG API - **Health Check**: Every 30 seconds ## πŸ› Troubleshooting ### Services not starting? ```bash # Check service status make status # View detailed logs make logs # Check specific service make logs-api ``` ### Database connection error? ```bash # Verify database is ready make api-health # Check PostgreSQL directly make db-shell # Reinitialize database make db-reset ``` ### API responding slowly? ```bash # Check resource usage docker stats lightrag-api # View API logs for errors make logs-api # Restart API service docker compose -p lightrag-multitenant restart lightrag-api ``` ### Data isolation issues? ```bash # Check tenant context in logs make logs | grep -i tenant # Verify database schema make db-shell # \dt (list tables) # \di (list indexes) ``` ## πŸ“‚ Directory Structure ``` starter/ β”œβ”€β”€ Makefile # Main command interface β”œβ”€β”€ docker-compose.yml # Docker services definition β”œβ”€β”€ env.template.example # Environment variables template β”œβ”€β”€ init-postgres.sql # PostgreSQL initialization (optional) β”œβ”€β”€ README.md # This file β”œβ”€β”€ data/ β”‚ β”œβ”€β”€ inputs/ # Document input directory β”‚ β”œβ”€β”€ rag_storage/ # LightRAG storage β”‚ └── tiktoken/ # Tiktoken cache └── backups/ # Database backups (created by make db-backup) ``` ## πŸ”„ Data Migration ### Backup Database ```bash make db-backup # Backs up to: ./backups/lightrag_backup_YYYYMMDD_HHMMSS.sql ``` ### Restore Database ```bash make db-restore # Restores from latest backup in ./backups/ ``` ### Export Data for Another Tenant ```bash # Export make db-shell \COPY (SELECT * FROM documents WHERE tenant_id='acme-corp') TO 'acme-corp-export.csv' CSV HEADER; \q # Import make db-shell \COPY documents FROM 'acme-corp-export.csv' CSV HEADER; \q ``` ## πŸš€ Production Deployment For production deployments: 1. **Use strong passwords**: Update `POSTGRES_PASSWORD` and `REDIS_PASSWORD` 2. **Enable SSL**: Uncomment SSL configuration in `.env` 3. **Use external LLM provider**: Configure production API keys 4. **Set up monitoring**: Monitor logs and health endpoints 5. **Regular backups**: Schedule `make db-backup` via cron 6. **Resource limits**: Adjust resource limits in docker-compose.yml 7. **Network isolation**: Use only internal networks, expose via proxy ## πŸ“ API Usage Examples ### Using Multi-Tenant Context ```python import requests BASE_URL = "http://localhost:8000" # Headers with tenant context headers = { "X-Tenant-Id": "acme-corp", "X-KB-Id": "kb-prod", "Content-Type": "application/json" } # Insert document response = requests.post( f"{BASE_URL}/api/v1/insert", headers=headers, json={"document": "Company policy document"} ) # Query with tenant isolation response = requests.get( f"{BASE_URL}/api/v1/query", headers=headers, params={"param": "policy"}, params={"top_k": 5} ) # Results are automatically isolated to acme-corp/kb-prod print(response.json()) ``` ### Python SDK Example ```python from lightrag import LightRAG # Initialize with tenant context rag = LightRAG( tenant_id="acme-corp", kb_id="kb-prod", storage_type="PostgreSQL", llm_model_name="gpt-4o", embedding_model_name="bge-m3:latest" ) # Insert document (automatically scoped to tenant/kb) rag.insert("Company documentation", source="internal") # Query (automatically scoped to tenant/kb) results = rag.query("What is the company policy?") print(results) ``` ## πŸ“š Documentation References - **Multi-Tenant Architecture**: See `docs/0001-multi-tenant-architecture.md` - **LightRAG Documentation**: https://github.com/HKUDS/LightRAG - **PostgreSQL Vector Extension**: https://github.com/pgvector/pgvector - **Docker Compose Documentation**: https://docs.docker.com/compose/ ## πŸ†˜ Support & Issues ### Common Issues **Q: Port already in use?** ```bash # Change port in .env WEBUI_PORT=3001 API_PORT=9622 POSTGRES_PORT=5433 ``` **Q: Out of memory?** ```bash # Reduce resource limits in docker-compose.yml or adjust system resources ``` **Q: API not responding?** ```bash # Check if services are running make ps # View logs make logs # Restart services make down && make up ``` **Q: Database errors?** ```bash # Connect to database shell make db-shell # Check table structure \d documents # Check indexes \di ``` ## πŸ“„ License LightRAG is licensed under MIT License. See LICENSE file for details. ## πŸ™‹ Contributing Contributions are welcome! Please refer to the main LightRAG repository for contribution guidelines. --- **Last Updated**: November 20, 2025 **Status**: Production Ready **Version**: 1.0 For more information about multi-tenant features, see the architecture documentation in `docs/0001-multi-tenant-architecture.md`