* feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit
342 lines
11 KiB
Python
342 lines
11 KiB
Python
#!/usr/bin/env python
|
|
"""
|
|
Workspace-to-Tenant Migration Script
|
|
|
|
Migrates existing single-tenant workspace-based deployments to multi-tenant architecture.
|
|
This script:
|
|
1. Scans existing workspace directories
|
|
2. Creates a default tenant for each workspace
|
|
3. Creates a default knowledge base within each tenant
|
|
4. Preserves all existing data structure for backward compatibility
|
|
|
|
Usage:
|
|
python migrate_workspace_to_tenant.py --working-dir /path/to/rag_storage
|
|
python migrate_workspace_to_tenant.py --working-dir /path/to/rag_storage --dry-run
|
|
python migrate_workspace_to_tenant.py --working-dir /path/to/rag_storage --skip-backup
|
|
"""
|
|
|
|
import asyncio
|
|
import argparse
|
|
import os
|
|
import sys
|
|
import shutil
|
|
from pathlib import Path
|
|
from datetime import datetime
|
|
from typing import List, Dict, Optional
|
|
from lightrag.services.tenant_service import TenantService
|
|
from lightrag.models.tenant import Tenant, TenantConfig
|
|
from lightrag.utils import logger
|
|
|
|
|
|
class WorkspaceToTenantMigrator:
|
|
"""
|
|
Handles migration from workspace-based to multi-tenant architecture.
|
|
"""
|
|
|
|
def __init__(self, working_dir: str, dry_run: bool = False, backup: bool = True):
|
|
"""
|
|
Initialize the migrator.
|
|
|
|
Args:
|
|
working_dir: Root directory containing workspace folders
|
|
dry_run: If True, simulate migration without making changes
|
|
backup: If True, create backup before migration
|
|
"""
|
|
self.working_dir = Path(working_dir)
|
|
self.dry_run = dry_run
|
|
self.backup = backup
|
|
self.tenant_service = TenantService()
|
|
self.migration_log: List[str] = []
|
|
self.error_log: List[str] = []
|
|
|
|
def validate_working_dir(self) -> bool:
|
|
"""Validate that working directory exists."""
|
|
if not self.working_dir.exists():
|
|
self.error_log.append(f"Working directory does not exist: {self.working_dir}")
|
|
return False
|
|
|
|
if not self.working_dir.is_dir():
|
|
self.error_log.append(f"Path is not a directory: {self.working_dir}")
|
|
return False
|
|
|
|
return True
|
|
|
|
def discover_workspaces(self) -> List[str]:
|
|
"""
|
|
Discover existing workspace directories.
|
|
|
|
Workspaces are identified by common RAG storage files like:
|
|
- kv_store_*.json
|
|
- doc_status_storage.json
|
|
- rag_storage.db
|
|
|
|
Returns:
|
|
List of workspace directory names
|
|
"""
|
|
workspaces = []
|
|
|
|
# Check for common RAG storage files
|
|
for item in self.working_dir.iterdir():
|
|
if not item.is_dir():
|
|
continue
|
|
|
|
# Skip special directories
|
|
if item.name.startswith(('.', '__')):
|
|
continue
|
|
|
|
# Check if directory contains RAG storage files
|
|
has_rag_files = any([
|
|
(item / f"kv_store_{name}.json").exists()
|
|
for name in ["full_docs", "text_chunks", "entities", "relations"]
|
|
]) or (item / "doc_status_storage.json").exists()
|
|
|
|
if has_rag_files or item.name.startswith("workspace_"):
|
|
workspaces.append(item.name)
|
|
|
|
return sorted(workspaces)
|
|
|
|
def backup_working_dir(self) -> Optional[Path]:
|
|
"""
|
|
Create a backup of the working directory.
|
|
|
|
Returns:
|
|
Path to backup directory, or None if backup failed
|
|
"""
|
|
if not self.backup:
|
|
return None
|
|
|
|
backup_dir = self.working_dir.parent / f"{self.working_dir.name}_backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
|
|
|
|
try:
|
|
msg = f"Creating backup at {backup_dir}"
|
|
logger.info(msg)
|
|
self.migration_log.append(msg)
|
|
|
|
if not self.dry_run:
|
|
shutil.copytree(self.working_dir, backup_dir)
|
|
|
|
return backup_dir
|
|
except Exception as e:
|
|
msg = f"Failed to create backup: {e}"
|
|
logger.error(msg)
|
|
self.error_log.append(msg)
|
|
return None
|
|
|
|
async def migrate_workspace(self, workspace_name: str) -> bool:
|
|
"""
|
|
Migrate a single workspace to multi-tenant structure.
|
|
|
|
Args:
|
|
workspace_name: Name of the workspace to migrate
|
|
|
|
Returns:
|
|
True if migration successful, False otherwise
|
|
"""
|
|
try:
|
|
msg = f"\nMigrating workspace: {workspace_name}"
|
|
logger.info(msg)
|
|
self.migration_log.append(msg)
|
|
|
|
# Create tenant from workspace
|
|
tenant_name = workspace_name if workspace_name != "" else "default"
|
|
|
|
if not self.dry_run:
|
|
tenant = await self.tenant_service.create_tenant(
|
|
tenant_name=tenant_name,
|
|
config=None # Use default config
|
|
)
|
|
|
|
msg = f" ✓ Created tenant '{tenant_name}' with ID: {tenant.tenant_id}"
|
|
logger.info(msg)
|
|
self.migration_log.append(msg)
|
|
|
|
# Create default knowledge base
|
|
kb = await self.tenant_service.create_knowledge_base(
|
|
tenant_id=tenant.tenant_id,
|
|
kb_name="default",
|
|
description="Default knowledge base (migrated from workspace)"
|
|
)
|
|
|
|
msg = f" ✓ Created default KB with ID: {kb.kb_id}"
|
|
logger.info(msg)
|
|
self.migration_log.append(msg)
|
|
else:
|
|
msg = f" [DRY RUN] Would create tenant '{tenant_name}' with default KB"
|
|
logger.info(msg)
|
|
self.migration_log.append(msg)
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
msg = f" ✗ Failed to migrate workspace '{workspace_name}': {e}"
|
|
logger.error(msg)
|
|
self.error_log.append(msg)
|
|
return False
|
|
|
|
async def migrate_all_workspaces(self, workspaces: List[str]) -> Dict[str, bool]:
|
|
"""
|
|
Migrate all discovered workspaces.
|
|
|
|
Args:
|
|
workspaces: List of workspace names to migrate
|
|
|
|
Returns:
|
|
Dictionary mapping workspace name to migration status
|
|
"""
|
|
results = {}
|
|
|
|
for workspace in workspaces:
|
|
success = await self.migrate_workspace(workspace)
|
|
results[workspace] = success
|
|
|
|
return results
|
|
|
|
def generate_report(self, workspaces: List[str], results: Dict[str, bool]) -> str:
|
|
"""
|
|
Generate a migration report.
|
|
|
|
Args:
|
|
workspaces: List of workspaces processed
|
|
results: Migration results
|
|
|
|
Returns:
|
|
Formatted report string
|
|
"""
|
|
successful = sum(1 for v in results.values() if v)
|
|
failed = len(workspaces) - successful
|
|
|
|
report = f"""
|
|
╔══════════════════════════════════════════════════════════════╗
|
|
║ WORKSPACE-TO-TENANT MIGRATION REPORT ║
|
|
╚══════════════════════════════════════════════════════════════╝
|
|
|
|
Working Directory: {self.working_dir}
|
|
Dry Run Mode: {self.dry_run}
|
|
Workspaces Processed: {len(workspaces)}
|
|
Successfully Migrated: {successful}
|
|
Failed: {failed}
|
|
|
|
Migration Log:
|
|
"""
|
|
for line in self.migration_log:
|
|
report += f"\n{line}"
|
|
|
|
if self.error_log:
|
|
report += "\n\nErrors Encountered:"
|
|
for error in self.error_log:
|
|
report += f"\n{error}"
|
|
|
|
report += "\n"
|
|
return report
|
|
|
|
async def run(self) -> bool:
|
|
"""
|
|
Execute the migration process.
|
|
|
|
Returns:
|
|
True if migration completed successfully, False otherwise
|
|
"""
|
|
# Validate setup
|
|
if not self.validate_working_dir():
|
|
logger.error("Validation failed")
|
|
return False
|
|
|
|
# Discover workspaces
|
|
workspaces = self.discover_workspaces()
|
|
|
|
if not workspaces:
|
|
msg = "No workspaces found to migrate"
|
|
logger.warning(msg)
|
|
self.migration_log.append(msg)
|
|
return True
|
|
|
|
msg = f"Discovered {len(workspaces)} workspace(s): {', '.join(workspaces)}"
|
|
logger.info(msg)
|
|
self.migration_log.append(msg)
|
|
|
|
# Create backup if not dry-run
|
|
if not self.dry_run:
|
|
backup_path = self.backup_working_dir()
|
|
if not backup_path and self.backup:
|
|
logger.warning("Backup failed but continuing with migration")
|
|
|
|
# Migrate workspaces
|
|
results = await self.migrate_all_workspaces(workspaces)
|
|
|
|
# Generate and display report
|
|
report = self.generate_report(workspaces, results)
|
|
print(report)
|
|
|
|
# Save report to file
|
|
report_path = self.working_dir / f"migration_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt"
|
|
try:
|
|
if not self.dry_run:
|
|
with open(report_path, 'w') as f:
|
|
f.write(report)
|
|
logger.info(f"Migration report saved to {report_path}")
|
|
except Exception as e:
|
|
logger.error(f"Failed to save migration report: {e}")
|
|
|
|
# Return success if no failures
|
|
return all(results.values())
|
|
|
|
|
|
def main():
|
|
"""Main entry point for migration script."""
|
|
parser = argparse.ArgumentParser(
|
|
description="Migrate workspace-based deployment to multi-tenant architecture",
|
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
|
epilog="""
|
|
Examples:
|
|
# Perform actual migration
|
|
python migrate_workspace_to_tenant.py --working-dir /path/to/rag_storage
|
|
|
|
# Preview what would be migrated without making changes
|
|
python migrate_workspace_to_tenant.py --working-dir /path/to/rag_storage --dry-run
|
|
|
|
# Migrate without creating backup
|
|
python migrate_workspace_to_tenant.py --working-dir /path/to/rag_storage --skip-backup
|
|
"""
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--working-dir",
|
|
required=True,
|
|
help="Path to the working directory containing workspaces"
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--dry-run",
|
|
action="store_true",
|
|
help="Simulate migration without making actual changes"
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--skip-backup",
|
|
action="store_true",
|
|
help="Skip creating a backup of the working directory"
|
|
)
|
|
|
|
args = parser.parse_args()
|
|
|
|
# Create migrator
|
|
migrator = WorkspaceToTenantMigrator(
|
|
working_dir=args.working_dir,
|
|
dry_run=args.dry_run,
|
|
backup=not args.skip_backup
|
|
)
|
|
|
|
# Run migration
|
|
try:
|
|
success = asyncio.run(migrator.run())
|
|
sys.exit(0 if success else 1)
|
|
except KeyboardInterrupt:
|
|
logger.warning("Migration interrupted by user")
|
|
sys.exit(1)
|
|
except Exception as e:
|
|
logger.error(f"Migration failed: {e}", exc_info=True)
|
|
sys.exit(1)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|