LightRAG

Author	SHA1	Message	Date
Raphaël MANSUY	31ceebf8a8	cherry-pick `fc9f7c70`	2025-12-04 19:19:05 +08:00
Raphaël MANSUY	8357b7795d	cherry-pick `c1ec657c`	2025-12-04 19:19:05 +08:00
Raphaël MANSUY	5f36666ac1	cherry-pick `1874cfaf`	2025-12-04 19:19:04 +08:00
Raphaël MANSUY	c62aab61f0	cherry-pick `0c47d1a2`	2025-12-04 19:19:01 +08:00
Raphaël MANSUY	d87bbe0b40	cherry-pick `6190fa89`	2025-12-04 19:18:14 +08:00
Raphaël MANSUY	1a167fb7f7	cherry-pick `cca0800e`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	60b6b6bbae	cherry-pick `70cc2419`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	68cc386456	cherry-pick `a990c1d4`	2025-12-04 19:14:31 +08:00
yangdx	429cd6a66f	Fix top_n behavior with chunking to limit documents not chunks - Disable API-level top_n when chunking - Apply top_n to aggregated documents - Add comprehensive test coverage (cherry picked from commit `9009abed3e`)	2025-12-04 19:11:22 +08:00
copilot-swe-agent[bot]	85f21aecd5	Fix chunking infinite loop when overlap_tokens >= max_tokens Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com> (cherry picked from commit `1d6ea0c5f7`)	2025-12-04 19:11:22 +08:00
netbrah	b65ef37569	Add Cohere reranker config, chunking, and tests (cherry picked from commit `a05bbf105e`)	2025-12-04 19:11:22 +08:00
yangdx	8a8bdba8f4	Add comprehensive chunking tests with multi-token tokenizer edge cases • Add MultiTokenCharacterTokenizer for testing • Test token vs character counting accuracy • Verify delimiter splitting precision • Test overlap with distinctive content • Add decode content preservation tests (cherry picked from commit `fec7c67f45`)	2025-12-04 19:11:22 +08:00
yangdx	7f7574c8b7	Add token limit validation for character-only chunking - Add ChunkTokenLimitExceededError exception - Validate chunks against token limits - Include chunk preview in error messages - Add comprehensive test coverage - Log warnings for oversized chunks (cherry picked from commit `f988a22652`)	2025-12-04 19:11:22 +08:00
yangdx	326acbf19b	Add comprehensive tests for chunking with recursive splitting - Test recursive split mode - Add edge case coverage - Test parameter combinations - Verify chunk order indexing - Add integration test scenarios (cherry picked from commit `5733292557`)	2025-12-04 19:11:21 +08:00
yangdx	ce702ccb2f	Add workspace parameter and remove chunk-based query unit tests - Add workspace param to test storage init - Remove get_nodes_by_chunk_ids tests - Remove get_edges_by_chunk_ids tests - Clean up batch operations test function (cherry picked from commit `6b0f9795be`)	2025-12-04 19:11:20 +08:00
yangdx	7e7c86601e	Improve workspace isolation tests with better parallelism checks and cleanup • Add finalize_share_data cleanup • Refactor lock timing measurement • Add timeline overlap validation • Include purpose/scope documentation • Fix tokenizer integration (cherry picked from commit `21ad990e36`)	2025-12-04 19:11:18 +08:00
yangdx	94ae13a037	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking (cherry picked from commit `926960e957`)	2025-12-04 19:11:17 +08:00
yangdx	d56b4c856e	Fix trailing whitespace and update test mocking for rerank module • Remove trailing whitespace • Fix TiktokenTokenizer import patch • Add async context manager mocks • Update aiohttp.ClientSession patch • Improve test reliability (cherry picked from commit `561ba4e4b5`)	2025-12-04 19:11:14 +08:00
yangdx	fd76e0f7ce	Enhance workspace isolation test with distinct mock data and persistence • Use different mock LLM per workspace • Add persistent test directory • Create workspace-specific responses • Skip cleanup for inspection (cherry picked from commit `99262adaaa`)	2025-12-04 19:11:13 +08:00
yangdx	4da291468d	Rename test classes to prevent warning from pytest • TestResult → ExecutionResult • TestStats → ExecutionStats • Update class docstrings • Update type hints • Update variable references (cherry picked from commit `7e9c8ed1e8`)	2025-12-04 19:11:12 +08:00
yangdx	60520e0188	test: add concurrent execution to workspace isolation test • Add async sleep to mock functions • Test concurrent ainsert operations • Use asyncio.gather for parallel exec • Measure concurrent execution time (cherry picked from commit `6ae0c14438`)	2025-12-04 19:11:12 +08:00
yangdx	668b842862	Standardize test directory creation and remove tempfile dependency • Remove unused tempfile import • Use consistent project temp/ structure • Clean up existing directories first • Create directories with os.makedirs • Use descriptive test directory names (cherry picked from commit `4fef731f37`)	2025-12-04 19:11:12 +08:00
yangdx	660ccc7ada	Add GitHub CI workflow and test markers for offline/integration tests - Add GitHub Actions workflow for CI - Mark integration tests requiring services - Add offline test markers for isolated tests - Skip integration tests by default - Configure pytest markers and collection (cherry picked from commit `4ea2124001`)	2025-12-04 19:11:12 +08:00
yangdx	d790a660cd	Fix test to use default workspace parameter behavior (cherry picked from commit `41bf6d0283`)	2025-12-04 19:11:12 +08:00
yangdx	d011a1c0e7	Refactor test configuration to use pytest fixtures and CLI options • Add pytest command-line options • Create session-scoped fixtures • Remove hardcoded environment vars • Update test function signatures • Improve configuration priority (cherry picked from commit `1fe05df211`)	2025-12-04 19:11:12 +08:00
yangdx	97cf689dfb	Remove unused variables from workspace isolation test * Remove initial_ok check * Remove both_set verification (cherry picked from commit `cf73cb4d24`)	2025-12-04 19:11:11 +08:00
BukeLy	6559dc4fed	test: Add comprehensive workspace isolation test suite for PR #2366 Why this change is needed: PR #2366 introduces critical workspace isolation functionality to resolve multi-instance concurrency issues, but lacks comprehensive automated tests to validate the implementation. Without proper test coverage, we cannot ensure the feature works correctly across all scenarios mentioned in the PR. What this test suite covers: 1. Pipeline Status Isolation: Verifies different workspaces maintain independent pipeline status without interference 2. Lock Mechanism: Validates the new keyed lock system works correctly - Different workspaces can acquire locks in parallel - Same workspace locks serialize properly - No deadlocks occur 3. Backward Compatibility: Ensures legacy code without workspace parameters continues to work using default workspace 4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with different workspaces can run concurrently without data interference Testing approach: - All tests are automated and deterministic - Uses timing assertions to verify parallel vs serial lock behavior - Validates data isolation through direct namespace data inspection - Comprehensive error handling and detailed test output Test results: All 9 test cases passed successfully, confirming the workspace isolation feature is working correctly across all key scenarios. Impact: Provides confidence that PR #2366's workspace isolation feature is production-ready and won't introduce regressions. (cherry picked from commit `4742fc8efa`)	2025-12-04 19:11:11 +08:00
BukeLy	f1fa1cd340	test: Enhance E2E workspace isolation detection with content verification Add specific content assertions to detect cross-contamination between workspaces. Previously only checked that workspaces had different data, now verifies: - Each workspace contains only its own text content - Each workspace does NOT contain the other workspace's content - Cross-contamination would be immediately detected This ensures the test can find problems, not just pass. Changes: - Add assertions for "Artificial Intelligence" and "Machine Learning" in project_a - Add assertions for "Deep Learning" and "Neural Networks" in project_b - Add negative assertions to verify data leakage doesn't occur - Add detailed output messages showing what was verified Testing: - pytest tests/test_workspace_isolation.py::test_lightrag_end_to_end_workspace_isolation - Test passes with proper content isolation verified (cherry picked from commit `3ec736932e`)	2025-12-04 19:11:11 +08:00
BukeLy	f2771cc953	test: Add real integration and E2E tests for workspace isolation Implemented two critical test scenarios: Test 10 - JsonKVStorage Integration Test: - Instantiate two JsonKVStorage instances with different workspaces - Write different data to each instance (entity1, entity2) - Read back and verify complete data isolation - Verify workspace directories are created correctly - Result: Data correctly isolated, no mixing between workspaces Test 11 - LightRAG End-to-End Test: - Instantiate two LightRAG instances with different workspaces - Insert different documents to each instance - Verify workspace directory structure (project_a/, project_b/) - Verify file separation and data isolation - Result: All 8 storage files created separately per workspace - Document data correctly isolated between workspaces Test Results: 23/23 passed - 19 unit tests - 2 integration tests (JsonKVStorage data + file structure) - 2 E2E tests (LightRAG file structure + data isolation) Coverage: 100% - Unit, Integration, and E2E validated (cherry picked from commit `3e759f46d1`)	2025-12-04 19:11:11 +08:00
BukeLy	00cf52b0bf	test: Convert test_workspace_isolation.py to pytest style Why this change is needed: The test file was using a custom TestResults class for tracking test execution and results, which is not standard practice for pytest-based test suites. This makes the tests harder to integrate with CI/CD pipelines and reduces compatibility with pytest plugins and tooling. How it solves it: - Removed custom TestResults class and manual result tracking - Added @pytest.mark.asyncio decorator to all async test functions - Converted all results.add() calls to standard pytest assert statements - Added pytest fixture (setup_shared_data) for common test setup - Removed custom main() runner (pytest handles test discovery/execution) - Kept all test logic, assertions, and debugging print statements intact Impact: - All 11 test functions maintain identical behavior and coverage - Tests now follow pytest conventions and integrate with pytest ecosystem - Test output is cleaner and more informative with pytest's reporting - Easier to run selective tests using pytest's filtering options Testing: Verified by running: uv run pytest tests/test_workspace_isolation.py -v Result: All 11 tests passed in 2.41s (cherry picked from commit `288498ccdc`)	2025-12-04 19:11:11 +08:00
BukeLy	d5a67ea888	docs: Update test file docstring to reflect all 11 test scenarios Previous docstring mentioned only 4 scenarios but the file actually contains 11 comprehensive test cases. Updated to list all scenarios: 1. Pipeline Status Isolation 2. Lock Mechanism (Parallel/Serial) 3. Backward Compatibility 4. Multi-Workspace Concurrency 5. NamespaceLock Re-entrance Protection 6. Different Namespace Lock Isolation 7. Error Handling 8. Update Flags Workspace Isolation 9. Empty Workspace Standardization 10. JsonKVStorage Workspace Isolation 11. LightRAG End-to-End Workspace Isolation This makes the file header accurately describe its contents. (cherry picked from commit `1a1837028a`)	2025-12-04 19:11:11 +08:00
yangdx	e138c3a11e	Add test script for aquery_data endpoint validation (cherry picked from commit `91387628ff`)	2025-12-04 19:11:07 +08:00
copilot-swe-agent[bot]	b28a701532	Improve edge case handling for max_tokens=1 Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com> (cherry picked from commit `8835fc244a`)	2025-12-04 19:09:07 +08:00
BukeLy	c52c1aea69	test: Enhance workspace isolation test suite to 100% coverage Why this enhancement is needed: The initial test suite covered the 4 core scenarios from PR #2366, but lacked comprehensive coverage of edge cases and implementation details. This update adds 5 additional test scenarios to achieve complete validation of the workspace isolation feature. What was added: Test 5 - NamespaceLock Re-entrance Protection (2 sub-tests): - Verifies re-entrance in same coroutine raises RuntimeError - Confirms same NamespaceLock instance works in concurrent coroutines Test 6 - Different Namespace Lock Isolation: - Validates locks with same workspace but different namespaces are independent Test 7 - Error Handling (2 sub-tests): - Tests None workspace conversion to empty string - Validates empty workspace creates correct namespace format Test 8 - Update Flags Workspace Isolation (3 sub-tests): - set_all_update_flags isolation between workspaces - clear_all_update_flags isolation between workspaces - get_all_update_flags_status workspace filtering Test 9 - Empty Workspace Standardization (2 sub-tests): - Empty workspace namespace format verification - Empty vs non-empty workspace independence Test Results: All 19 test cases passed (previously 9/9, now 19/19) - 4 core PR requirements: 100% coverage - 5 additional scenarios: 100% coverage - Total coverage: 100% of workspace isolation implementation Testing approach improvements: - Proper initialization of update flags using get_update_flag() - Correct handling of flag objects (.value property) - Updated error handling tests to match actual implementation behavior - All edge cases and boundary conditions validated Impact: Provides complete confidence in the workspace isolation feature with comprehensive test coverage of all implementation details, edge cases, and error handling paths. (cherry picked from commit `436e41439e`)	2025-12-04 19:09:05 +08:00
yangdx	ed79218550	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage (cherry picked from commit `777c987371`)	2025-12-04 19:09:04 +08:00
yangdx	d1ab42bb36	Translate graph storage test from Chinese to English (cherry picked from commit `f3b2ba8152`)	2025-12-04 19:09:03 +08:00
yangdx	cea34d6691	Initialize shared storage for all graph storage types in graph unit test (cherry picked from commit `36501b82f5`)	2025-12-04 19:09:03 +08:00
yangdx	17106225dd	Add PostgreSQL connection retry mechanism with comprehensive error handling • Implement connection retry with backoff • Add transient error detection • Pool management with timeout guards (cherry picked from commit `e758204ab2`)	2025-12-04 19:08:58 +08:00
yangdx	8f924d6f21	Add PostgreSQL connection retry configuration options - Add retry environment variables - Fix asyncpg import in retry tests (cherry picked from commit `bd535e3e7a`)	2025-12-04 19:08:57 +08:00
yangdx	60a695539a	Refactor PostgreSQL retry config to use centralized configuration • Move retry config to ClientManager • Remove env var parsing from PostgreSQLDB • Add config params to test setup (cherry picked from commit `b3ed264707`)	2025-12-04 19:08:57 +08:00
yangdx	de2713ca93	Add PostgreSQL connection retry mechanism with comprehensive error handling • Implement connection retry with backoff • Add transient error detection • Pool management with timeout guards (cherry picked from commit `e758204ab2`)	2025-12-04 19:06:30 +08:00
yangdx	39ad057384	Refactor PostgreSQL retry config to use centralized configuration • Move retry config to ClientManager • Remove env var parsing from PostgreSQLDB • Add config params to test setup (cherry picked from commit `b3ed264707`)	2025-12-04 19:06:06 +08:00
Raphael MANSUY	fe9b8ec02a	tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency (#4 ) * feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit	2025-12-04 16:04:21 +08:00
yangdx	46187b2507	Fix conditional logic in streaming response parser of unit test • Change elif to if for response field • Change elif to if for error field • Allow multiple data types per chunk • Fix mutually exclusive conditions • Enable concurrent field processing	2025-09-27 21:43:46 +08:00
yangdx	bcf30a4c8a	Add comprehensive reference testing for query endpoints - Add reference format validation - Test streaming response parsing - Check reference consistency - Support references enable/disable - Add --references-only test mode	2025-09-25 16:56:09 +08:00
yangdx	5eb4a4b799	feat: simplify citations, add reference merging, and restructure API response format	2025-09-24 14:30:10 +08:00
yangdx	c0d5abba6b	Fix linting	2025-09-15 02:59:21 +08:00
yangdx	b1c8206346	Add aquery_data endpoint for structured retrieval without LLM generation - Add QueryDataResponse model - Implement /query/data endpoint - Add aquery_data method to LightRAG - Return entities, relationships, chunks	2025-09-15 02:15:14 +08:00
yangdx	a69194c079	Merge branch 'main' into add-Memgraph-graph-db	2025-07-04 23:53:07 +08:00
yangdx	f15e67c82c	Update comments	2025-06-29 21:53:05 +08:00

1 2

63 commits