* feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit
351 lines
12 KiB
Python
351 lines
12 KiB
Python
"""
|
|
Pytest configuration and fixtures for multi-tenant testing.
|
|
|
|
Provides:
|
|
- Database fixtures for different testing modes
|
|
- Tenant and KB context fixtures
|
|
- Mock LLM and embedding services
|
|
- Multi-tenant test utilities
|
|
"""
|
|
|
|
import os
|
|
import pytest
|
|
import asyncio
|
|
import psycopg2
|
|
import json
|
|
from typing import Dict, List, Optional, Generator
|
|
from contextlib import contextmanager
|
|
from datetime import datetime
|
|
from unittest.mock import MagicMock, patch
|
|
import uuid
|
|
|
|
# ============================================================================
|
|
# Environment and Mode Detection
|
|
# ============================================================================
|
|
|
|
MULTITENANT_MODE = os.getenv("MULTITENANT_MODE", "demo")
|
|
POSTGRES_HOST = os.getenv("POSTGRES_HOST", "localhost")
|
|
POSTGRES_PORT = int(os.getenv("POSTGRES_PORT", "5432"))
|
|
POSTGRES_USER = os.getenv("POSTGRES_USER", "lightrag")
|
|
POSTGRES_PASSWORD = os.getenv("POSTGRES_PASSWORD", "lightrag_secure_password")
|
|
POSTGRES_DATABASE = os.getenv("POSTGRES_DATABASE", "lightrag_multitenant")
|
|
|
|
|
|
# ============================================================================
|
|
# Database Connection Management
|
|
# ============================================================================
|
|
|
|
@pytest.fixture(scope="session")
|
|
def db_connection_string():
|
|
"""Generate PostgreSQL connection string."""
|
|
return f"postgresql://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DATABASE}"
|
|
|
|
|
|
@pytest.fixture(scope="session")
|
|
def postgres_connection():
|
|
"""Create persistent PostgreSQL connection for session."""
|
|
try:
|
|
conn = psycopg2.connect(
|
|
host=POSTGRES_HOST,
|
|
port=POSTGRES_PORT,
|
|
user=POSTGRES_USER,
|
|
password=POSTGRES_PASSWORD,
|
|
database=POSTGRES_DATABASE
|
|
)
|
|
conn.autocommit = False
|
|
yield conn
|
|
conn.close()
|
|
except psycopg2.Error as e:
|
|
pytest.skip(f"PostgreSQL not available: {e}")
|
|
|
|
|
|
@contextmanager
|
|
def database_transaction(postgres_connection):
|
|
"""Context manager for database transactions with rollback."""
|
|
cursor = postgres_connection.cursor()
|
|
try:
|
|
yield cursor
|
|
postgres_connection.commit()
|
|
except Exception as e:
|
|
postgres_connection.rollback()
|
|
raise e
|
|
finally:
|
|
cursor.close()
|
|
|
|
|
|
# ============================================================================
|
|
# Mode-Specific Fixtures
|
|
# ============================================================================
|
|
|
|
@pytest.fixture
|
|
def testing_mode():
|
|
"""Return current testing mode."""
|
|
return MULTITENANT_MODE
|
|
|
|
|
|
@pytest.fixture
|
|
def is_compatibility_mode():
|
|
"""Check if running in compatibility mode (MULTITENANT_MODE=off)."""
|
|
return MULTITENANT_MODE == "off"
|
|
|
|
|
|
@pytest.fixture
|
|
def is_single_tenant_mode():
|
|
"""Check if running in single-tenant mode (MULTITENANT_MODE=on)."""
|
|
return MULTITENANT_MODE == "on"
|
|
|
|
|
|
@pytest.fixture
|
|
def is_demo_mode():
|
|
"""Check if running in demo mode (MULTITENANT_MODE=demo)."""
|
|
return MULTITENANT_MODE == "demo"
|
|
|
|
|
|
# ============================================================================
|
|
# Tenant and KB Fixtures
|
|
# ============================================================================
|
|
|
|
@pytest.fixture
|
|
def demo_tenant_acme():
|
|
"""Acme Corp tenant for demo mode."""
|
|
return {
|
|
"tenant_id": "acme-corp",
|
|
"name": "Acme Corporation",
|
|
"kbs": ["kb-prod", "kb-dev"]
|
|
}
|
|
|
|
|
|
@pytest.fixture
|
|
def demo_tenant_techstart():
|
|
"""TechStart tenant for demo mode."""
|
|
return {
|
|
"tenant_id": "techstart",
|
|
"name": "TechStart Inc",
|
|
"kbs": ["kb-main", "kb-backup"]
|
|
}
|
|
|
|
|
|
@pytest.fixture
|
|
def default_tenant():
|
|
"""Default tenant for compatibility and on modes."""
|
|
return {
|
|
"tenant_id": "default",
|
|
"name": "Default Tenant",
|
|
"kbs": ["default"]
|
|
}
|
|
|
|
|
|
@pytest.fixture
|
|
def test_tenant_1():
|
|
"""Test tenant 1 for single-tenant mode."""
|
|
return {
|
|
"tenant_id": "tenant-1",
|
|
"name": "Test Tenant 1",
|
|
"kbs": ["kb-default", "kb-secondary", "kb-experimental"]
|
|
}
|
|
|
|
|
|
# ============================================================================
|
|
# Test Data Fixtures
|
|
# ============================================================================
|
|
|
|
@pytest.fixture
|
|
def sample_document():
|
|
"""Sample document for testing."""
|
|
return {
|
|
"title": "Test Document",
|
|
"content": "This is a test document for LightRAG multi-tenant testing.",
|
|
"file_type": "text",
|
|
"metadata": {
|
|
"source": "test",
|
|
"version": "1.0"
|
|
}
|
|
}
|
|
|
|
|
|
@pytest.fixture
|
|
def sample_entity():
|
|
"""Sample entity for testing."""
|
|
return {
|
|
"name": "TestEntity",
|
|
"type": "Person",
|
|
"description": "A test entity for multi-tenant isolation testing",
|
|
"metadata": {
|
|
"test": True,
|
|
"created_by": "pytest"
|
|
}
|
|
}
|
|
|
|
|
|
@pytest.fixture
|
|
def sample_relation():
|
|
"""Sample relation for testing."""
|
|
return {
|
|
"source_entity": "Entity1",
|
|
"target_entity": "Entity2",
|
|
"relation_type": "knows",
|
|
"description": "Test relationship between entities",
|
|
"weight": 0.8
|
|
}
|
|
|
|
|
|
# ============================================================================
|
|
# Database Query Helpers
|
|
# ============================================================================
|
|
|
|
class DatabaseHelper:
|
|
"""Helper class for database operations in tests."""
|
|
|
|
def __init__(self, connection):
|
|
self.connection = connection
|
|
|
|
def execute_query(self, query: str, params: tuple = ()) -> List[Dict]:
|
|
"""Execute a SELECT query and return results."""
|
|
with database_transaction(self.connection) as cursor:
|
|
cursor.execute(query, params)
|
|
columns = [desc[0] for desc in cursor.description]
|
|
return [dict(zip(columns, row)) for row in cursor.fetchall()]
|
|
|
|
def execute_insert(self, table: str, data: Dict) -> None:
|
|
"""Insert a row into a table."""
|
|
columns = ", ".join(data.keys())
|
|
placeholders = ", ".join(["%s"] * len(data))
|
|
query = f"INSERT INTO {table} ({columns}) VALUES ({placeholders})"
|
|
with database_transaction(self.connection) as cursor:
|
|
cursor.execute(query, tuple(data.values()))
|
|
|
|
def execute_delete(self, table: str, where: Dict) -> int:
|
|
"""Delete rows from a table."""
|
|
where_clause = " AND ".join([f"{k} = %s" for k in where.keys()])
|
|
query = f"DELETE FROM {table} WHERE {where_clause}"
|
|
with database_transaction(self.connection) as cursor:
|
|
cursor.execute(query, tuple(where.values()))
|
|
return cursor.rowcount
|
|
|
|
def count_documents(self, tenant_id: str, kb_id: str) -> int:
|
|
"""Count documents for a tenant/KB."""
|
|
query = "SELECT COUNT(*) as count FROM documents WHERE tenant_id = %s AND kb_id = %s"
|
|
result = self.execute_query(query, (tenant_id, kb_id))
|
|
return result[0]["count"] if result else 0
|
|
|
|
def count_entities(self, tenant_id: str, kb_id: str) -> int:
|
|
"""Count entities for a tenant/KB."""
|
|
query = "SELECT COUNT(*) as count FROM entities WHERE tenant_id = %s AND kb_id = %s"
|
|
result = self.execute_query(query, (tenant_id, kb_id))
|
|
return result[0]["count"] if result else 0
|
|
|
|
def get_all_documents(self, tenant_id: str, kb_id: str) -> List[Dict]:
|
|
"""Get all documents for a tenant/KB."""
|
|
query = "SELECT * FROM documents WHERE tenant_id = %s AND kb_id = %s ORDER BY created_at DESC"
|
|
return self.execute_query(query, (tenant_id, kb_id))
|
|
|
|
def get_all_entities(self, tenant_id: str, kb_id: str) -> List[Dict]:
|
|
"""Get all entities for a tenant/KB."""
|
|
query = "SELECT * FROM entities WHERE tenant_id = %s AND kb_id = %s ORDER BY created_at DESC"
|
|
return self.execute_query(query, (tenant_id, kb_id))
|
|
|
|
def verify_tenant_isolation(self, tenant_id: str) -> bool:
|
|
"""Verify that no cross-tenant data exists when querying this tenant."""
|
|
# Check that all documents belong to this tenant
|
|
query = """
|
|
SELECT COUNT(*) as count FROM documents
|
|
WHERE tenant_id != %s AND EXISTS (
|
|
SELECT 1 FROM documents d2
|
|
WHERE d2.tenant_id = %s AND d2.id = documents.id
|
|
)
|
|
"""
|
|
result = self.execute_query(query, (tenant_id, tenant_id))
|
|
return result[0]["count"] == 0 if result else True
|
|
|
|
def clear_tenant_data(self, tenant_id: str, kb_id: Optional[str] = None) -> None:
|
|
"""Clear all data for a tenant/KB."""
|
|
tables = ["document_status", "embeddings", "documents", "entities", "relations"]
|
|
|
|
for table in tables:
|
|
if kb_id:
|
|
where = {"tenant_id": tenant_id, "kb_id": kb_id}
|
|
else:
|
|
where = {"tenant_id": tenant_id}
|
|
self.execute_delete(table, where)
|
|
|
|
|
|
@pytest.fixture
|
|
def db_helper(postgres_connection):
|
|
"""Provide database helper for tests."""
|
|
return DatabaseHelper(postgres_connection)
|
|
|
|
|
|
# ============================================================================
|
|
# Mock Services
|
|
# ============================================================================
|
|
|
|
@pytest.fixture
|
|
def mock_llm_service():
|
|
"""Mock LLM service for testing."""
|
|
mock = MagicMock()
|
|
mock.generate = MagicMock(return_value="Mock LLM response")
|
|
mock.extract_entities = MagicMock(return_value=["Entity1", "Entity2"])
|
|
return mock
|
|
|
|
|
|
@pytest.fixture
|
|
def mock_embedding_service():
|
|
"""Mock embedding service for testing."""
|
|
mock = MagicMock()
|
|
mock.embed_text = MagicMock(return_value=[0.1] * 1024) # 1024-dim vector
|
|
mock.embed_batch = MagicMock(return_value=[[0.1] * 1024 for _ in range(10)])
|
|
return mock
|
|
|
|
|
|
# ============================================================================
|
|
# Async Event Loop
|
|
# ============================================================================
|
|
|
|
@pytest.fixture(scope="session")
|
|
def event_loop():
|
|
"""Create event loop for async tests."""
|
|
loop = asyncio.get_event_loop_policy().new_event_loop()
|
|
yield loop
|
|
loop.close()
|
|
|
|
|
|
# ============================================================================
|
|
# Markers and Parametrization
|
|
# ============================================================================
|
|
|
|
def pytest_configure(config):
|
|
"""Register custom pytest markers."""
|
|
config.addinivalue_line(
|
|
"markers", "compatibility: mark test to run only in compatibility mode"
|
|
)
|
|
config.addinivalue_line(
|
|
"markers", "single_tenant: mark test to run only in single-tenant mode"
|
|
)
|
|
config.addinivalue_line(
|
|
"markers", "multi_tenant: mark test to run only in demo/multi-tenant mode"
|
|
)
|
|
config.addinivalue_line(
|
|
"markers", "database: mark test that requires database connection"
|
|
)
|
|
config.addinivalue_line(
|
|
"markers", "isolation: mark test that verifies data isolation"
|
|
)
|
|
|
|
|
|
# ============================================================================
|
|
# Test Collection Hooks
|
|
# ============================================================================
|
|
|
|
def pytest_collection_modifyitems(config, items):
|
|
"""Skip tests based on testing mode."""
|
|
skip_compatibility = pytest.mark.skip(reason="Not in compatibility mode")
|
|
skip_single_tenant = pytest.mark.skip(reason="Not in single-tenant mode")
|
|
skip_multi_tenant = pytest.mark.skip(reason="Not in multi-tenant mode")
|
|
|
|
for item in items:
|
|
if "compatibility" in item.keywords and MULTITENANT_MODE != "off":
|
|
item.add_marker(skip_compatibility)
|
|
if "single_tenant" in item.keywords and MULTITENANT_MODE != "on":
|
|
item.add_marker(skip_single_tenant)
|
|
if "multi_tenant" in item.keywords and MULTITENANT_MODE != "demo":
|
|
item.add_marker(skip_multi_tenant)
|