* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad. * Add multi-tenant testing strategy and ADR index documentation - Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details. - Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles. * feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise - Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints. - Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options. - Documented the LightRAG architecture, storage backends, LLM integrations, and query modes. - Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
3.2 KiB
3.2 KiB
Multi-Tenancy Implementation Plan
Goal: Upgrade LightRAG to a battle-tested, production-grade multi-tenant architecture.
Phase 1: Tenant Identification & Middleware
- Step 1.1: Create
lightrag/api/middleware/tenant.py.- Implement
TenantMiddlewareto extract tenant from subdomain (optional) and JWT. - Use Redis to cache
subdomain -> tenant_idresolution. - Set
request.state.tenant_id.
- Implement
- Step 1.2: Update
lightrag/api/dependencies.py.- Update
get_tenant_contextto read fromrequest.state. - Remove reliance on
X-Tenant-IDheader when subdomain/JWT is present (enforce source of truth).
- Update
Phase 2: PostgreSQL Row-Level Security (RLS)
- Step 2.1: Update
lightrag/kg/postgres_tenant_support.py.- Add SQL to enable RLS on tables:
ALTER TABLE ... ENABLE ROW LEVEL SECURITY. - Add SQL to create policies:
CREATE POLICY ... USING (tenant_id = current_setting('app.tenant_id')).
- Add SQL to enable RLS on tables:
- Step 2.2: Update Database Connection Logic.
- In
lightrag/kg/postgres_impl.py(or equivalent), ensureapp.tenant_idis set for each session/connection. - Use
SET LOCAL app.tenant_id = ...at the start of transactions.
- In
Phase 3: MongoDB Strict Scoping
- Step 3.1: Create
lightrag/kg/mongo_repo.py.- Implement
MongoTenantRepoclass. - It should take
tenant_idin__init__. - Override
find,find_one,insert_one, etc., to automatically injecttenant_id.
- Implement
- Step 3.2: Refactor
lightrag/kg/mongo_impl.py.- Use
MongoTenantRepoinstead of rawmotorcollection.
- Use
Phase 4: Graph Database Session Wrapper (Neo4j, Memgraph)
- Step 4.1: Create
lightrag/kg/graph_session.py.- Implement
GraphTenantSessionabstract base class. - Implement
Neo4jTenantSessionandMemgraphTenantSession. - Wrap
runmethod to injecttenant_idparameter and appendWHERE n.tenant_id = $tenant_idif missing (or rely on strict parameterized queries).
- Implement
- Step 4.2: Refactor
lightrag/kg/neo4j_impl.pyandmemgraph_impl.py.- Use
GraphTenantSession.
- Use
Phase 5: Vector Database Strict Scoping
- Step 5.1: Create
lightrag/kg/vector_repo.py.- Implement
VectorTenantRepoabstract base class. - Implement specific repositories for Qdrant, Milvus, FAISS, Nano.
- Qdrant: Automatically add
mustfilter fortenant_idandkb_idto all searches. - Milvus: Automatically append
tenant_id == "..."to expressions. - FAISS: Manage tenant-specific indices (e.g.,
index_tenant_kb) to avoid scanning all vectors. - Nano: Enforce metadata filtering.
- Implement
- Step 5.2: Refactor Vector Implementations.
- Update
qdrant_impl.py,milvus_impl.py,faiss_impl.py,nano_vector_db_impl.pyto use the new repositories.
- Update
Phase 6: Redis Strict Prefixing
- Step 6.1: Enforce
RedisTenantNamespace.- Ensure all Redis interactions in
lightrag/kg/redis_impl.pyuse the namespace wrapper.
- Ensure all Redis interactions in
Phase 7: Verification
- Step 7.1: Create tests in
tests/test_multi_tenant_security.py.- Test RLS: Try to access another tenant's data via raw SQL.
- Test Middleware: Verify subdomain resolution.
- Test Isolation: Verify data separation across all backends (SQL, NoSQL, Graph, Vector).