* feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit |
||
|---|---|---|
| .. | ||
| migrations | ||
| .gitignore | ||
| docker-compose.yml | ||
| Dockerfile.postgres | ||
| env.example | ||
| init-age.sql | ||
| init-extensions.sh | ||
| init-postgres.sql | ||
| Makefile | ||
| MAKEFILE_GUIDE.md | ||
| MULTITENANT_TESTING_ENABLED.md | ||
| QUICK_REFERENCE.md | ||
| QUICK_START.md | ||
| QUICK_START_MULTITENANT.md | ||
| README.md | ||
| reproduce_issue.py | ||
| run_all_tests.sh | ||
LightRAG Multi-Tenant Stack with PostgreSQL
A complete, production-ready multi-tenant RAG (Retrieval-Augmented Generation) system using LightRAG with PostgreSQL as the backend.
🚀 Quick Start
# 1. Initialize environment (first time only)
make setup
# 2. Start all services
make up
# 3. Initialize database schema
make init-db
# 4. View service status
make status
# 5. Access the application
# WebUI: http://localhost:3001
# API Server: http://localhost:8000
# API Docs: http://localhost:8000/docs
## 🔐 Demo credentials (local/dev only)
Use the following defaults when running the stack locally for demonstrations or testing. These come from `starter/env.example` — change them in `.env` for any shared or production deployments.
User: lightrag Password: lightrag_secure_password Database: lightrag_multitenant Host: postgres (inside Docker) Port: 5432 (internal-only; not forwarded to localhost by default)
⚠️ Note: These credentials are for development/demos only. Always pick strong, unique passwords for production and avoid committing secrets to source control.
📋 System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ LightRAG Multi-Tenant Stack │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Web UI (React) │ │
│ │ http://localhost:3001 │ │
│ └────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────▼─────────────────────────────────────┐ │
│ │ LightRAG API Server (FastAPI) │ │
│ │ http://localhost:8000 │ │
│ │ │ │
│ │ Multi-Tenant Context: (tenant_id, kb_id) │ │
│ │ - Enforces data isolation at API level │ │
│ │ - Routes queries to appropriate backends │ │
│ │ - Manages document processing │ │
│ └────────────────────┬────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ┌────▼──────┐ ┌─────▼──────┐ ┌────▼──────┐ │
│ │ PostgreSQL│ │ Redis │ │ Embedding│ │
│ │ Storage │ │ Cache │ │ Service │ │
│ │ │ │ │ │ (Ollama) │ │
│ │ - KV │ │ LLM cache │ │ │ │
│ │ - Documents│ │ Session │ │ bge-m3 │ │
│ │ - Entities│ │ Temporary │ │ │ │
│ │ - Relations│ │ Data │ │ │ │
│ │ - Vectors │ │ │ │ │ │
│ └───────────┘ └────────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
🎯 Key Features
Multi-Tenant Data Isolation
- Composite Key Pattern: Each resource identified by
(tenant_id, kb_id, resource_id) - Database-Level Enforcement: Queries automatically scoped to tenant/KB
- Cross-Tenant Access Prevention: Impossible to retrieve data from other tenants
- Complete Isolation: Works across all 10 storage backends
Storage Architecture
- PostgreSQL: Primary storage with pgvector extension
- Key-Value storage (PGKVStorage)
- Document metadata (PGDocStatusStorage)
- Knowledge graph (PGGraphStorage)
- Vector embeddings (PGVectorStorage)
- Redis: Caching and session management
- Embedding Service: Ollama (configurable to OpenAI, Jina, etc.)
Supported Tenants & Knowledge Bases
Default sample data (automatically created):
Tenant: 595ea68b-0f3a-4dbe-8a86-9276a1bbd10c
└─ kb-prod (Production KB)
└─ kb-dev (Development KB)
Tenant: 44bf3e0d-d633-4dea-9b74-3e24140cd7e3
└─ kb-main (Main KB)
└─ kb-backup (Backup KB)
📖 Makefile Commands
Setup & Configuration
make help # Show all available commands
make setup # Initialize .env file (first time only)
make init-db # Initialize PostgreSQL database schema
Service Control
make up # Start all services
make down # Stop all services
make restart # Restart all services
make logs # Stream logs from all services
make logs-api # Stream logs from API only
make logs-db # Stream logs from PostgreSQL only
make logs-webui # Stream logs from WebUI only
make status # Show status of all running services
Database Management
make db-shell # Connect to PostgreSQL interactive shell
make db-backup # Create database backup
make db-restore # Restore from latest backup
make db-reset # Delete and reinitialize database (⚠️ WARNING)
Health & Testing
make api-health # Check API health status
make test # Run multi-tenant tests
make test-isolation # Run tenant isolation tests
Cleanup & Maintenance
make clean # Remove stopped containers and dangling images
make reset # Full system reset (⚠️ WARNING: deletes all data)
make prune # Prune unused Docker resources
🔧 Configuration
Environment Variables
Edit .env file to configure:
# LLM Provider (OpenAI, Ollama, Azure, etc.)
LLM_BINDING=openai
LLM_MODEL=gpt-4o
LLM_BINDING_API_KEY=your_api_key_here
# Embedding Service
EMBEDDING_BINDING=ollama
EMBEDDING_MODEL=bge-m3:latest
EMBEDDING_BINDING_HOST=http://localhost:11434
# Database Credentials
POSTGRES_USER=lightrag
POSTGRES_PASSWORD=lightrag_secure_password
# Multi-Tenant Settings
DEFAULT_TENANT=default
DEFAULT_KB=default
# See env.template.example for all available options
🔐 Security & Multi-Tenant Isolation
Isolation Guarantees
- Database-Level Filtering: Every query includes
tenant_idandkb_idconstraints - Composite Key Constraints: Prevents accidental ID collisions between tenants
- No Application-Level Trust: Storage layer enforces isolation even if app code has bugs
- Audit Trail: All operations include tenant context for traceability
Best Practices
✅ DO:
- Always pass tenant context to every operation
- Use support module helpers for queries
- Create composite indexes on (tenant_id, kb_id, ...)
- Validate tenant context early in request pipeline
- Log all tenant-related operations
❌ DON'T:
- Query without tenant filtering
- Hardcode tenant IDs in application code
- Assume application code enforces isolation
- Skip index creation after migration
- Mix tenants in a single transaction
📋 Service Endpoints
| Service | URL | Purpose |
|---|---|---|
| WebUI | http://localhost:3001 |
Interactive frontend for document upload, KB visualization, queries |
| API Server | http://localhost:8000 |
RESTful API for programmatic access |
| PostgreSQL | internal-only (container network) |
Database backend (not exposed to host by default) |
| Redis | localhost:6379 |
Cache backend (internal only) |
| Health Check | http://localhost:8000/health |
API health status |
🧪 Testing Multi-Tenant Features
Run All Multi-Tenant Tests
make test
Run Specific Test Suites
# Test tenant isolation
make test-isolation
# Test PostgreSQL backend
pytest tests/test_multi_tenant_backends.py::TestPostgreSQLTenantSupport -v
# Test data integrity
pytest tests/test_multi_tenant_backends.py::TestDataIntegrity -v
Manual Testing
- Create document for tenant "595ea68b-0f3a-4dbe-8a86-9276a1bbd10c":
curl -X POST http://localhost:8000/api/v1/insert \
-H "Content-Type: application/json" \
-H "X-Tenant-Id: 595ea68b-0f3a-4dbe-8a86-9276a1bbd10c" \
-H "X-KB-Id: kb-prod" \
-d '{"document": "Sample document"}'
- Query as "595ea68b-0f3a-4dbe-8a86-9276a1bbd10c":
curl "http://localhost:8000/api/v1/query" \
-H "X-Tenant-Id: 595ea68b-0f3a-4dbe-8a86-9276a1bbd10c" \
-H "X-KB-Id: kb-prod" \
-G --data-urlencode "param=test"
- Verify isolation - query with different tenant:
curl "http://localhost:8000/api/v1/query" \
-H "X-Tenant-Id: 44bf3e0d-d633-4dea-9b74-3e24140cd7e3" \
-H "X-KB-Id: kb-main" \
-G --data-urlencode "param=test"
# Should return different or empty results
📦 Docker Services
PostgreSQL (pgvector/pgvector:pg15-latest)
- Purpose: Primary data storage with vector support
- Volume:
postgres_data(persists database files) - Port: 5432 (internal), configurable via
POSTGRES_PORT - Health Check: Every 10 seconds
Redis (redis:7-alpine)
- Purpose: Caching, sessions, temporary data
- Volume:
redis_data(persists snapshot) - Port: 6379 (internal), configurable via
REDIS_PORT - Health Check: Every 10 seconds
LightRAG API
- Port: 8621 (internal), 8000 (external/host)
- Volume:
./data/*(documents, storage, tiktoken cache) - Dependencies: PostgreSQL, Redis
- Health Check: Every 30 seconds
- Resources: Limited to 2 CPUs / 4GB RAM
Web UI
- Port: 3000 (internal), 3001 (external/host)
- Framework: React + Vite
- Dependencies: LightRAG API
- Health Check: Every 30 seconds
🐛 Troubleshooting
Services not starting?
# Check service status
make status
# View detailed logs
make logs
# Check specific service
make logs-api
Database connection error?
# Verify database is ready
make api-health
# Check PostgreSQL directly
make db-shell
# Reinitialize database
make db-reset
API responding slowly?
# Check resource usage
docker stats lightrag-api
# View API logs for errors
make logs-api
# Restart API service
docker compose -p lightrag-multitenant restart lightrag-api
Data isolation issues?
# Check tenant context in logs
make logs | grep -i tenant
# Verify database schema
make db-shell
# \dt (list tables)
# \di (list indexes)
📂 Directory Structure
starter/
├── Makefile # Main command interface
├── docker-compose.yml # Docker services definition
├── env.template.example # Environment variables template
├── init-postgres.sql # PostgreSQL initialization (optional)
├── README.md # This file
├── data/
│ ├── inputs/ # Document input directory
│ ├── rag_storage/ # LightRAG storage
│ └── tiktoken/ # Tiktoken cache
└── backups/ # Database backups (created by make db-backup)
🔄 Data Migration
Backup Database
make db-backup
# Backs up to: ./backups/lightrag_backup_YYYYMMDD_HHMMSS.sql
Restore Database
make db-restore
# Restores from latest backup in ./backups/
Export Data for Another Tenant
# Export
make db-shell
\COPY (SELECT * FROM documents WHERE tenant_id='acme-corp') TO 'acme-corp-export.csv' CSV HEADER;
\q
# Import
make db-shell
\COPY documents FROM 'acme-corp-export.csv' CSV HEADER;
\q
🚀 Production Deployment
For production deployments:
- Use strong passwords: Update
POSTGRES_PASSWORDandREDIS_PASSWORD - Enable SSL: Uncomment SSL configuration in
.env - Use external LLM provider: Configure production API keys
- Set up monitoring: Monitor logs and health endpoints
- Regular backups: Schedule
make db-backupvia cron - Resource limits: Adjust resource limits in docker-compose.yml
- Network isolation: Use only internal networks, expose via proxy
📝 API Usage Examples
Using Multi-Tenant Context
import requests
BASE_URL = "http://localhost:8000"
# Headers with tenant context
headers = {
"X-Tenant-Id": "acme-corp",
"X-KB-Id": "kb-prod",
"Content-Type": "application/json"
}
# Insert document
response = requests.post(
f"{BASE_URL}/api/v1/insert",
headers=headers,
json={"document": "Company policy document"}
)
# Query with tenant isolation
response = requests.get(
f"{BASE_URL}/api/v1/query",
headers=headers,
params={"param": "policy"},
params={"top_k": 5}
)
# Results are automatically isolated to acme-corp/kb-prod
print(response.json())
Python SDK Example
from lightrag import LightRAG
# Initialize with tenant context
rag = LightRAG(
tenant_id="acme-corp",
kb_id="kb-prod",
storage_type="PostgreSQL",
llm_model_name="gpt-4o",
embedding_model_name="bge-m3:latest"
)
# Insert document (automatically scoped to tenant/kb)
rag.insert("Company documentation", source="internal")
# Query (automatically scoped to tenant/kb)
results = rag.query("What is the company policy?")
print(results)
📚 Documentation References
- Multi-Tenant Architecture: See
docs/0001-multi-tenant-architecture.md - LightRAG Documentation: https://github.com/HKUDS/LightRAG
- PostgreSQL Vector Extension: https://github.com/pgvector/pgvector
- Docker Compose Documentation: https://docs.docker.com/compose/
🆘 Support & Issues
Common Issues
Q: Port already in use?
# Change port in .env
WEBUI_PORT=3001
API_PORT=9622
POSTGRES_PORT=5433
Q: Out of memory?
# Reduce resource limits in docker-compose.yml or adjust system resources
Q: API not responding?
# Check if services are running
make ps
# View logs
make logs
# Restart services
make down && make up
Q: Database errors?
# Connect to database shell
make db-shell
# Check table structure
\d documents
# Check indexes
\di
📄 License
LightRAG is licensed under MIT License. See LICENSE file for details.
🙋 Contributing
Contributions are welcome! Please refer to the main LightRAG repository for contribution guidelines.
Last Updated: November 20, 2025
Status: Production Ready
Version: 1.0
For more information about multi-tenant features, see the architecture documentation in docs/0001-multi-tenant-architecture.md