Commit graph

49 commits

Author SHA1 Message Date
yangdx
ce702ccb2f Add workspace parameter and remove chunk-based query unit tests
- Add workspace param to test storage init
- Remove get_nodes_by_chunk_ids tests
- Remove get_edges_by_chunk_ids tests
- Clean up batch operations test function

(cherry picked from commit 6b0f9795be)
2025-12-04 19:11:20 +08:00
yangdx
7e7c86601e Improve workspace isolation tests with better parallelism checks and cleanup
• Add finalize_share_data cleanup
• Refactor lock timing measurement
• Add timeline overlap validation
• Include purpose/scope documentation
• Fix tokenizer integration

(cherry picked from commit 21ad990e36)
2025-12-04 19:11:18 +08:00
yangdx
94ae13a037 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking

(cherry picked from commit 926960e957)
2025-12-04 19:11:17 +08:00
yangdx
d56b4c856e Fix trailing whitespace and update test mocking for rerank module
• Remove trailing whitespace
• Fix TiktokenTokenizer import patch
• Add async context manager mocks
• Update aiohttp.ClientSession patch
• Improve test reliability

(cherry picked from commit 561ba4e4b5)
2025-12-04 19:11:14 +08:00
yangdx
fd76e0f7ce Enhance workspace isolation test with distinct mock data and persistence
• Use different mock LLM per workspace
• Add persistent test directory
• Create workspace-specific responses
• Skip cleanup for inspection

(cherry picked from commit 99262adaaa)
2025-12-04 19:11:13 +08:00
yangdx
4da291468d Rename test classes to prevent warning from pytest
• TestResult → ExecutionResult
• TestStats → ExecutionStats
• Update class docstrings
• Update type hints
• Update variable references

(cherry picked from commit 7e9c8ed1e8)
2025-12-04 19:11:12 +08:00
yangdx
60520e0188 test: add concurrent execution to workspace isolation test
• Add async sleep to mock functions
• Test concurrent ainsert operations
• Use asyncio.gather for parallel exec
• Measure concurrent execution time

(cherry picked from commit 6ae0c14438)
2025-12-04 19:11:12 +08:00
yangdx
668b842862 Standardize test directory creation and remove tempfile dependency
• Remove unused tempfile import
• Use consistent project temp/ structure
• Clean up existing directories first
• Create directories with os.makedirs
• Use descriptive test directory names

(cherry picked from commit 4fef731f37)
2025-12-04 19:11:12 +08:00
yangdx
660ccc7ada Add GitHub CI workflow and test markers for offline/integration tests
- Add GitHub Actions workflow for CI
- Mark integration tests requiring services
- Add offline test markers for isolated tests
- Skip integration tests by default
- Configure pytest markers and collection

(cherry picked from commit 4ea2124001)
2025-12-04 19:11:12 +08:00
yangdx
d790a660cd Fix test to use default workspace parameter behavior
(cherry picked from commit 41bf6d0283)
2025-12-04 19:11:12 +08:00
yangdx
d011a1c0e7 Refactor test configuration to use pytest fixtures and CLI options
• Add pytest command-line options
• Create session-scoped fixtures
• Remove hardcoded environment vars
• Update test function signatures
• Improve configuration priority

(cherry picked from commit 1fe05df211)
2025-12-04 19:11:12 +08:00
yangdx
97cf689dfb Remove unused variables from workspace isolation test
* Remove initial_ok check
* Remove both_set verification

(cherry picked from commit cf73cb4d24)
2025-12-04 19:11:11 +08:00
BukeLy
6559dc4fed test: Add comprehensive workspace isolation test suite for PR #2366
Why this change is needed:
PR #2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.

What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
   independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
   - Different workspaces can acquire locks in parallel
   - Same workspace locks serialize properly
   - No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
   continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
   different workspaces can run concurrently without data interference

Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output

Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.

Impact:
Provides confidence that PR #2366's workspace isolation feature is
production-ready and won't introduce regressions.

(cherry picked from commit 4742fc8efa)
2025-12-04 19:11:11 +08:00
BukeLy
f1fa1cd340 test: Enhance E2E workspace isolation detection with content verification
Add specific content assertions to detect cross-contamination between workspaces.
Previously only checked that workspaces had different data, now verifies:

- Each workspace contains only its own text content
- Each workspace does NOT contain the other workspace's content
- Cross-contamination would be immediately detected

This ensures the test can find problems, not just pass.

Changes:
- Add assertions for "Artificial Intelligence" and "Machine Learning" in project_a
- Add assertions for "Deep Learning" and "Neural Networks" in project_b
- Add negative assertions to verify data leakage doesn't occur
- Add detailed output messages showing what was verified

Testing:
- pytest tests/test_workspace_isolation.py::test_lightrag_end_to_end_workspace_isolation
- Test passes with proper content isolation verified

(cherry picked from commit 3ec736932e)
2025-12-04 19:11:11 +08:00
BukeLy
f2771cc953 test: Add real integration and E2E tests for workspace isolation
Implemented two critical test scenarios:

Test 10 - JsonKVStorage Integration Test:
- Instantiate two JsonKVStorage instances with different workspaces
- Write different data to each instance (entity1, entity2)
- Read back and verify complete data isolation
- Verify workspace directories are created correctly
- Result: Data correctly isolated, no mixing between workspaces

Test 11 - LightRAG End-to-End Test:
- Instantiate two LightRAG instances with different workspaces
- Insert different documents to each instance
- Verify workspace directory structure (project_a/, project_b/)
- Verify file separation and data isolation
- Result: All 8 storage files created separately per workspace
- Document data correctly isolated between workspaces

Test Results: 23/23 passed
- 19 unit tests
- 2 integration tests (JsonKVStorage data + file structure)
- 2 E2E tests (LightRAG file structure + data isolation)

Coverage: 100% - Unit, Integration, and E2E validated
(cherry picked from commit 3e759f46d1)
2025-12-04 19:11:11 +08:00
BukeLy
00cf52b0bf test: Convert test_workspace_isolation.py to pytest style
Why this change is needed:
The test file was using a custom TestResults class for tracking test
execution and results, which is not standard practice for pytest-based
test suites. This makes the tests harder to integrate with CI/CD pipelines
and reduces compatibility with pytest plugins and tooling.

How it solves it:
- Removed custom TestResults class and manual result tracking
- Added @pytest.mark.asyncio decorator to all async test functions
- Converted all results.add() calls to standard pytest assert statements
- Added pytest fixture (setup_shared_data) for common test setup
- Removed custom main() runner (pytest handles test discovery/execution)
- Kept all test logic, assertions, and debugging print statements intact

Impact:
- All 11 test functions maintain identical behavior and coverage
- Tests now follow pytest conventions and integrate with pytest ecosystem
- Test output is cleaner and more informative with pytest's reporting
- Easier to run selective tests using pytest's filtering options

Testing:
Verified by running: uv run pytest tests/test_workspace_isolation.py -v
Result: All 11 tests passed in 2.41s

(cherry picked from commit 288498ccdc)
2025-12-04 19:11:11 +08:00
BukeLy
d5a67ea888 docs: Update test file docstring to reflect all 11 test scenarios
Previous docstring mentioned only 4 scenarios but the file actually contains
11 comprehensive test cases. Updated to list all scenarios:

1. Pipeline Status Isolation
2. Lock Mechanism (Parallel/Serial)
3. Backward Compatibility
4. Multi-Workspace Concurrency
5. NamespaceLock Re-entrance Protection
6. Different Namespace Lock Isolation
7. Error Handling
8. Update Flags Workspace Isolation
9. Empty Workspace Standardization
10. JsonKVStorage Workspace Isolation
11. LightRAG End-to-End Workspace Isolation

This makes the file header accurately describe its contents.

(cherry picked from commit 1a1837028a)
2025-12-04 19:11:11 +08:00
yangdx
e138c3a11e Add test script for aquery_data endpoint validation
(cherry picked from commit 91387628ff)
2025-12-04 19:11:07 +08:00
copilot-swe-agent[bot]
b28a701532 Improve edge case handling for max_tokens=1
Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com>
(cherry picked from commit 8835fc244a)
2025-12-04 19:09:07 +08:00
BukeLy
c52c1aea69 test: Enhance workspace isolation test suite to 100% coverage
Why this enhancement is needed:
The initial test suite covered the 4 core scenarios from PR #2366, but
lacked comprehensive coverage of edge cases and implementation details.
This update adds 5 additional test scenarios to achieve complete validation
of the workspace isolation feature.

What was added:
Test 5 - NamespaceLock Re-entrance Protection (2 sub-tests):
  - Verifies re-entrance in same coroutine raises RuntimeError
  - Confirms same NamespaceLock instance works in concurrent coroutines

Test 6 - Different Namespace Lock Isolation:
  - Validates locks with same workspace but different namespaces are independent

Test 7 - Error Handling (2 sub-tests):
  - Tests None workspace conversion to empty string
  - Validates empty workspace creates correct namespace format

Test 8 - Update Flags Workspace Isolation (3 sub-tests):
  - set_all_update_flags isolation between workspaces
  - clear_all_update_flags isolation between workspaces
  - get_all_update_flags_status workspace filtering

Test 9 - Empty Workspace Standardization (2 sub-tests):
  - Empty workspace namespace format verification
  - Empty vs non-empty workspace independence

Test Results:
All 19 test cases passed (previously 9/9, now 19/19)
- 4 core PR requirements: 100% coverage
- 5 additional scenarios: 100% coverage
- Total coverage: 100% of workspace isolation implementation

Testing approach improvements:
- Proper initialization of update flags using get_update_flag()
- Correct handling of flag objects (.value property)
- Updated error handling tests to match actual implementation behavior
- All edge cases and boundary conditions validated

Impact:
Provides complete confidence in the workspace isolation feature with
comprehensive test coverage of all implementation details, edge cases,
and error handling paths.

(cherry picked from commit 436e41439e)
2025-12-04 19:09:05 +08:00
yangdx
ed79218550 Optimize JSON write with fast/slow path to reduce memory usage
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage

(cherry picked from commit 777c987371)
2025-12-04 19:09:04 +08:00
yangdx
d1ab42bb36 Translate graph storage test from Chinese to English
(cherry picked from commit f3b2ba8152)
2025-12-04 19:09:03 +08:00
yangdx
cea34d6691 Initialize shared storage for all graph storage types in graph unit test
(cherry picked from commit 36501b82f5)
2025-12-04 19:09:03 +08:00
yangdx
17106225dd Add PostgreSQL connection retry mechanism with comprehensive error handling
• Implement connection retry with backoff
• Add transient error detection
• Pool management with timeout guards

(cherry picked from commit e758204ab2)
2025-12-04 19:08:58 +08:00
yangdx
8f924d6f21 Add PostgreSQL connection retry configuration options
- Add retry environment variables
- Fix asyncpg import in retry tests

(cherry picked from commit bd535e3e7a)
2025-12-04 19:08:57 +08:00
yangdx
60a695539a Refactor PostgreSQL retry config to use centralized configuration
• Move retry config to ClientManager
• Remove env var parsing from PostgreSQLDB
• Add config params to test setup

(cherry picked from commit b3ed264707)
2025-12-04 19:08:57 +08:00
yangdx
de2713ca93 Add PostgreSQL connection retry mechanism with comprehensive error handling
• Implement connection retry with backoff
• Add transient error detection
• Pool management with timeout guards

(cherry picked from commit e758204ab2)
2025-12-04 19:06:30 +08:00
yangdx
39ad057384 Refactor PostgreSQL retry config to use centralized configuration
• Move retry config to ClientManager
• Remove env var parsing from PostgreSQLDB
• Add config params to test setup

(cherry picked from commit b3ed264707)
2025-12-04 19:06:06 +08:00
Raphael MANSUY
fe9b8ec02a
tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency (#4)
* feat: Implement multi-tenant architecture with tenant and knowledge base models

- Added data models for tenants, knowledge bases, and related configurations.
- Introduced role and permission management for users in the multi-tenant system.
- Created a service layer for managing tenants and knowledge bases, including CRUD operations.
- Developed a tenant-aware instance manager for LightRAG with caching and isolation features.
- Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture.

* chore: ignore lightrag/api/webui/assets/ directory

* chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore)

* feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL

- Added README.md for project overview, setup instructions, and architecture details.
- Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI.
- Introduced env.example for environment variable configuration.
- Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support.
- Added reproduce_issue.py for testing default tenant access via API.

* feat: Enhance TenantSelector and update related components for improved multi-tenant support

* feat: Enhance testing capabilities and update documentation

- Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run).
- Modified API health check endpoint in Makefile to reflect new port configuration.
- Updated QUICK_START.md and README.md to reflect changes in service URLs and ports.
- Added environment variables for testing modes in env.example.
- Introduced run_all_tests.sh script to automate testing across different modes.
- Created conftest.py for pytest configuration, including database fixtures and mock services.
- Implemented database helper functions for streamlined database operations in tests.
- Added test collection hooks to skip tests based on the current MULTITENANT_MODE.

* feat: Implement multi-tenant support with demo mode enabled by default

- Added multi-tenant configuration to the environment and Docker setup.
- Created pre-configured demo tenants (acme-corp and techstart) for testing.
- Updated API endpoints to support tenant-specific data access.
- Enhanced Makefile commands for better service management and database operations.
- Introduced user-tenant membership system with role-based access control.
- Added comprehensive documentation for multi-tenant setup and usage.
- Fixed issues with document visibility in multi-tenant environments.
- Implemented necessary database migrations for user memberships and legacy support.

* feat(audit): Add final audit report for multi-tenant implementation

- Documented overall assessment, architecture overview, test results, security findings, and recommendations.
- Included detailed findings on critical security issues and architectural concerns.

fix(security): Implement security fixes based on audit findings

- Removed global RAG fallback and enforced strict tenant context.
- Configured super-admin access and required user authentication for tenant access.
- Cleared localStorage on logout and improved error handling in WebUI.

chore(logs): Create task logs for audit and security fixes implementation

- Documented actions, decisions, and next steps for both audit and security fixes.
- Summarized test results and remaining recommendations.

chore(scripts): Enhance development stack management scripts

- Added scripts for cleaning, starting, and stopping the development stack.
- Improved output messages and ensured graceful shutdown of services.

feat(starter): Initialize PostgreSQL with AGE extension support

- Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE.
- Ensured successful installation and verification of extensions.

* feat: Implement auto-select for first tenant and KB on initial load in WebUI

- Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved.
- Added useTenantInitialization hook to automatically select the first available tenant and KB on app load.
- Integrated the new hook into the Root component of the WebUI.
- Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction.
- Created end-to-end tests for multi-tenant isolation and real service interactions.
- Added scripts for starting, stopping, and cleaning the development stack.
- Enhanced API and tenant routes to support tenant-specific pipeline status initialization.
- Updated constants for backend URL to reflect the correct port.
- Improved error handling and logging in various components.

* feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality

* update client

* Add integration and unit tests for multi-tenant API, models, security, and storage

- Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`.
- Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`.
- Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`.
- Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`.

* feat(e2e): Implement OpenAI model support and database reset functionality

* Add comprehensive test suite for gpt-5-nano compatibility

- Introduced tests for parameter normalization, embeddings, and entity extraction.
- Implemented direct API testing for gpt-5-nano.
- Validated .env configuration loading and OpenAI API connectivity.
- Analyzed reasoning token overhead with various token limits.
- Documented test procedures and expected outcomes in README files.
- Ensured all tests pass for production readiness.

* kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization

* dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs

* feat(dev): add dev helper scripts and local development documentation for hybrid setup

* feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline

* feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs

* test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages)

* chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments

* tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency

- gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest
- Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes
- Graph storage tests: rename interactive test functions to avoid pytest collection
- Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised
- LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic
- Tests updated to match API changes (tenant routes & document routes)
- Add logs and scripts for inspection and audit
2025-12-04 16:04:21 +08:00
yangdx
46187b2507 Fix conditional logic in streaming response parser of unit test
• Change elif to if for response field
• Change elif to if for error field
• Allow multiple data types per chunk
• Fix mutually exclusive conditions
• Enable concurrent field processing
2025-09-27 21:43:46 +08:00
yangdx
bcf30a4c8a Add comprehensive reference testing for query endpoints
- Add reference format validation
- Test streaming response parsing
- Check reference consistency
- Support references enable/disable
- Add --references-only test mode
2025-09-25 16:56:09 +08:00
yangdx
5eb4a4b799 feat: simplify citations, add reference merging, and restructure API response format 2025-09-24 14:30:10 +08:00
yangdx
c0d5abba6b Fix linting 2025-09-15 02:59:21 +08:00
yangdx
b1c8206346 Add aquery_data endpoint for structured retrieval without LLM generation
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
a69194c079 Merge branch 'main' into add-Memgraph-graph-db 2025-07-04 23:53:07 +08:00
yangdx
f15e67c82c Update comments 2025-06-29 21:53:05 +08:00
DavIvek
c0a3638d01 fix memgraph_impl.py according to test_graph_storage.py 2025-06-27 15:35:20 +02:00
Ken Chen
a3865caaea Implement get_nodes_by_chunk_ids and get_edges_by_chunk_ids, 2025-06-25 22:17:17 +08:00
yangdx
e9dcac7caf Update graph db test 2025-04-17 23:09:01 +08:00
yangdx
09cca6dbe6 Update graph db unit test 2025-04-17 22:58:49 +08:00
yangdx
54f720cb27 Fix linting 2025-04-16 14:55:54 +08:00
yangdx
d370c0ae12 Fix graph unit test edge direction problem 2025-04-16 14:33:25 +08:00
yangdx
2a950f3ff9 Fix linting 2025-04-16 14:07:22 +08:00
yangdx
e6b2a035ea Update graph unit test 2025-04-16 14:06:05 +08:00
yangdx
1de74c9228 Fix linting 2025-04-15 12:34:04 +08:00
yangdx
262c93d8da Add batch query unit test for grap storage 2025-04-13 01:07:39 +08:00
yangdx
394a6063ba Fix linting 2025-04-04 03:41:05 +08:00
yangdx
99cce237df Add graph storage unit test 2025-04-04 03:40:46 +08:00
Yannick Stephan
55cd900e8e clean comments and unused libs 2025-02-18 21:12:06 +01:00