Commit graph

5309 commits

Author SHA1 Message Date
BukeLy
6559dc4fed test: Add comprehensive workspace isolation test suite for PR #2366
Why this change is needed:
PR #2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.

What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
   independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
   - Different workspaces can acquire locks in parallel
   - Same workspace locks serialize properly
   - No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
   continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
   different workspaces can run concurrently without data interference

Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output

Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.

Impact:
Provides confidence that PR #2366's workspace isolation feature is
production-ready and won't introduce regressions.

(cherry picked from commit 4742fc8efa)
2025-12-04 19:11:11 +08:00
BukeLy
f1fa1cd340 test: Enhance E2E workspace isolation detection with content verification
Add specific content assertions to detect cross-contamination between workspaces.
Previously only checked that workspaces had different data, now verifies:

- Each workspace contains only its own text content
- Each workspace does NOT contain the other workspace's content
- Cross-contamination would be immediately detected

This ensures the test can find problems, not just pass.

Changes:
- Add assertions for "Artificial Intelligence" and "Machine Learning" in project_a
- Add assertions for "Deep Learning" and "Neural Networks" in project_b
- Add negative assertions to verify data leakage doesn't occur
- Add detailed output messages showing what was verified

Testing:
- pytest tests/test_workspace_isolation.py::test_lightrag_end_to_end_workspace_isolation
- Test passes with proper content isolation verified

(cherry picked from commit 3ec736932e)
2025-12-04 19:11:11 +08:00
BukeLy
f2771cc953 test: Add real integration and E2E tests for workspace isolation
Implemented two critical test scenarios:

Test 10 - JsonKVStorage Integration Test:
- Instantiate two JsonKVStorage instances with different workspaces
- Write different data to each instance (entity1, entity2)
- Read back and verify complete data isolation
- Verify workspace directories are created correctly
- Result: Data correctly isolated, no mixing between workspaces

Test 11 - LightRAG End-to-End Test:
- Instantiate two LightRAG instances with different workspaces
- Insert different documents to each instance
- Verify workspace directory structure (project_a/, project_b/)
- Verify file separation and data isolation
- Result: All 8 storage files created separately per workspace
- Document data correctly isolated between workspaces

Test Results: 23/23 passed
- 19 unit tests
- 2 integration tests (JsonKVStorage data + file structure)
- 2 E2E tests (LightRAG file structure + data isolation)

Coverage: 100% - Unit, Integration, and E2E validated
(cherry picked from commit 3e759f46d1)
2025-12-04 19:11:11 +08:00
BukeLy
00cf52b0bf test: Convert test_workspace_isolation.py to pytest style
Why this change is needed:
The test file was using a custom TestResults class for tracking test
execution and results, which is not standard practice for pytest-based
test suites. This makes the tests harder to integrate with CI/CD pipelines
and reduces compatibility with pytest plugins and tooling.

How it solves it:
- Removed custom TestResults class and manual result tracking
- Added @pytest.mark.asyncio decorator to all async test functions
- Converted all results.add() calls to standard pytest assert statements
- Added pytest fixture (setup_shared_data) for common test setup
- Removed custom main() runner (pytest handles test discovery/execution)
- Kept all test logic, assertions, and debugging print statements intact

Impact:
- All 11 test functions maintain identical behavior and coverage
- Tests now follow pytest conventions and integrate with pytest ecosystem
- Test output is cleaner and more informative with pytest's reporting
- Easier to run selective tests using pytest's filtering options

Testing:
Verified by running: uv run pytest tests/test_workspace_isolation.py -v
Result: All 11 tests passed in 2.41s

(cherry picked from commit 288498ccdc)
2025-12-04 19:11:11 +08:00
BukeLy
d5a67ea888 docs: Update test file docstring to reflect all 11 test scenarios
Previous docstring mentioned only 4 scenarios but the file actually contains
11 comprehensive test cases. Updated to list all scenarios:

1. Pipeline Status Isolation
2. Lock Mechanism (Parallel/Serial)
3. Backward Compatibility
4. Multi-Workspace Concurrency
5. NamespaceLock Re-entrance Protection
6. Different Namespace Lock Isolation
7. Error Handling
8. Update Flags Workspace Isolation
9. Empty Workspace Standardization
10. JsonKVStorage Workspace Isolation
11. LightRAG End-to-End Workspace Isolation

This makes the file header accurately describe its contents.

(cherry picked from commit 1a1837028a)
2025-12-04 19:11:11 +08:00
yangdx
9cf7476dd4 Improve docling integration with macOS compatibility and CLI flag
- Add --docling CLI flag for easier setup
- Add numpy version constraints
- Exclude docling on macOS (fork-safety)

(cherry picked from commit c246eff725)
2025-12-04 19:11:10 +08:00
yangdx
95d47566c1 Improve docling integration with macOS compatibility and CLI flag
- Add --docling CLI flag for easier setup
- Add numpy version constraints
- Exclude docling on macOS (fork-safety)

(cherry picked from commit a24d8181c2)
2025-12-04 19:11:10 +08:00
yangdx
033ee5c0f5 Refactor keyword_extraction from kwargs to explicit parameter
• Add keyword_extraction param to functions
• Remove kwargs.pop() calls
• Update function signatures
• Improve parameter documentation
• Make parameter handling consistent

(cherry picked from commit 2f16065256)
2025-12-04 19:11:09 +08:00
yangdx
35dd68d767 Refine gitignore to only exclude root-level test files
(cherry picked from commit a790f081dc)
2025-12-04 19:11:09 +08:00
anouarbm
8650307e65 feat(evaluation): Add sample documents for reproducible RAGAS testing
Add 5 markdown documents that users can index to reproduce evaluation results.

Changes:
- Add sample_documents/ folder with 5 markdown files covering LightRAG features
- Update sample_dataset.json with 3 improved, specific test questions
- Shorten and correct evaluation README (removed outdated info about mock responses)
- Add sample_documents reference with expected ~95% RAGAS score

Test Results with sample documents:
- Average RAGAS Score: 95.28%
- Faithfulness: 100%, Answer Relevance: 96.67%
- Context Recall: 88.89%, Context Precision: 95.56%

(cherry picked from commit a172cf893d)
2025-12-04 19:11:09 +08:00
yangdx
cc33728c10 Improve Langfuse integration and stream response cleanup handling
• Check env vars before enabling Langfuse
• Move imports after env check logic
• Handle wrapper client aclose() issues
• Add debug logs for cleanup failures

(cherry picked from commit 10f6e6955f)
2025-12-04 19:11:08 +08:00
anouarbm
ccdd3c2786 fixed ruff format of csv path
(cherry picked from commit b12b693a81)
2025-12-04 19:11:08 +08:00
anouarbm
949bfc4228 fix: Apply ruff formatting and rename test_dataset to sample_dataset
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)

**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md

Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.

(cherry picked from commit 5cdb4b0ef2)
2025-12-04 19:11:08 +08:00
anouarbm
a934becfcc feat: add optional Langfuse observability integration
This contribution adds optional Langfuse support for LLM observability and tracing.
Langfuse provides a drop-in replacement for the OpenAI client that automatically
tracks all LLM interactions without requiring code changes.

Features:
- Optional Langfuse integration with graceful fallback
- Automatic LLM request/response tracing
- Token usage tracking
- Latency metrics
- Error tracking
- Zero code changes required for existing functionality

Implementation:
- Modified lightrag/llm/openai.py to conditionally use Langfuse's AsyncOpenAI
- Falls back to standard OpenAI client if Langfuse is not installed
- Logs observability status on import

Configuration:
To enable Langfuse tracing, install the observability extras and set environment variables:

```bash
pip install lightrag-hku[observability]

export LANGFUSE_PUBLIC_KEY="your_public_key"
export LANGFUSE_SECRET_KEY="your_secret_key"
export LANGFUSE_HOST="https://cloud.langfuse.com"  # or your self-hosted instance
```

If Langfuse is not installed or environment variables are not set, LightRAG
will use the standard OpenAI client without any functionality changes.

Changes:
- Modified lightrag/llm/openai.py (added optional Langfuse import)
- Updated pyproject.toml with optional 'observability' dependencies

Dependencies (optional):
- langfuse>=3.8.1

(cherry picked from commit 626b42bc40)
2025-12-04 19:11:08 +08:00
xiaojunxiang
355aa2593c fix(docs): correct typo "acivate" → "activate"
(cherry picked from commit 9e5004e24f)
2025-12-04 19:11:08 +08:00
yangdx
7404f76d8c Optimize chat performance by reducing animations in inactive tabs
• Add isTabActive prop to ChatMessage
• Disable spinner in inactive tabs
• Reduce opacity for inactive content
• Hide loading indicator when inactive
• Pass tab state from RetrievalTesting

(cherry picked from commit dab1c35834)
2025-12-04 19:11:08 +08:00
yangdx
e138c3a11e Add test script for aquery_data endpoint validation
(cherry picked from commit 91387628ff)
2025-12-04 19:11:07 +08:00
zl7261
cbdba2a744 web_ui: check node source and target
(cherry picked from commit 6a8de2edb2)
2025-12-04 19:11:07 +08:00
Raphaël MANSUY
ed73def994 fix: sync core modules with upstream for compatibility 2025-12-04 19:10:46 +08:00
yangdx
395b2d82de Add name to lint-and-format job in GitHub workflow
(cherry picked from commit 445adfc9cb)
2025-12-04 19:09:09 +08:00
yangdx
84cb2f792d Add testing workflow guidelines to basic development rules
* Define pytest marker patterns
* Document CI/CD test execution
* Specify offline vs integration tests
* Add test isolation best practices
* Reference testing guidelines doc

(cherry picked from commit a11912ffa5)
2025-12-04 19:09:09 +08:00
yangdx
64415ae187 Rename GitHub workflow from "Tests" to "Offline Unit Tests"
(cherry picked from commit 656025b75e)
2025-12-04 19:09:09 +08:00
yangdx
7386a21c0e Add reminder note to manual Docker build workflow
(cherry picked from commit e6332ce512)
2025-12-04 19:09:09 +08:00
yangdx
4e94f3a67c Improve Docker build workflow with automated multi-arch script and docs
(cherry picked from commit 0e0b4a94dc)
2025-12-04 19:09:08 +08:00
yangdx
7ce3680ca5 Add retry decorators to Neo4j read operations for resilience
(cherry picked from commit 7aaa51cda9)
2025-12-04 19:09:08 +08:00
yangdx
8bb23b9bfa Update qdrant-client minimum version from 1.7.0 to 1.11.0
• Bump qdrant-client to >=1.11.0
• Update pyproject.toml dependency
• Update requirements files
• Sync uv.lock with new version
• Maintain <2.0.0 upper bound

(cherry picked from commit e8f5f57ec7)
2025-12-04 19:09:08 +08:00
yangdx
00d51f9dba Fix dimension type comparison in Milvus vector field validation
• Convert dimensions to int for comparison
• Handle string vs int type mismatches

(cherry picked from commit 0fa9a2eee3)
2025-12-04 19:09:08 +08:00
yangdx
0594a5a049 Update pymilvus dependency from 2.5.2 to >=2.6.2
(cherry picked from commit baab992431)
2025-12-04 19:09:07 +08:00
yangdx
7525c7836c Update pymilvus to >=2.6.2 and add protobuf compatibility constraint
(cherry picked from commit 49197fbfc0)
2025-12-04 19:09:07 +08:00
yangdx
de011c99a4 Add CASCADE to AGE extension creation in PostgreSQL implementation
- Add CASCADE option to CREATE EXTENSION
- Ensure dependencies are installed
- Fix potential AGE setup issues

(cherry picked from commit d6019c82af)
2025-12-04 19:09:07 +08:00
yangdx
411290a013 Add mhchem extension support for chemistry formulas in ChatMessage
(cherry picked from commit aeaa0b32f9)
2025-12-04 19:09:07 +08:00
yangdx
bd93f13012 Refactor: Extract retry decorator to reduce code duplication in Neo4J storage
• Define READ_RETRY_EXCEPTIONS constant
• Create reusable READ_RETRY decorator
• Replace 11 duplicate retry decorators
• Improve code maintainability
• Add missing retry to edge_degrees_batch

(cherry picked from commit 8c4d7a00ad)
2025-12-04 19:09:07 +08:00
copilot-swe-agent[bot]
b28a701532 Improve edge case handling for max_tokens=1
Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com>
(cherry picked from commit 8835fc244a)
2025-12-04 19:09:07 +08:00
yangdx
26602f3e20 Update postgreSQL docker image link
(cherry picked from commit 1e415cff95)
2025-12-04 19:09:06 +08:00
yangdx
155cb2a1d2 Expand AGENTS.md with testing controls and automation guidelines
- Add pytest marker and CLI toggle docs
- Document automation workflow rules
- Clarify integration test setup
- Add agent-specific best practices
- Update testing command examples

(cherry picked from commit 5cc916861f)
2025-12-04 19:09:06 +08:00
wmsnp
ae5cd9262b fix: add logger to configure_vchordrq() and format code
(cherry picked from commit f4bf5d279c)
2025-12-04 19:09:06 +08:00
wmsnp
3954bb6579 feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic
(cherry picked from commit d07023c962)
2025-12-04 19:09:06 +08:00
yangdx
1cbe0ba885 Reduce log level and improve workspace mismatch message clarity
• Change warning to info level
• Simplify workspace mismatch wording

(cherry picked from commit 6cef8df159)
2025-12-04 19:09:06 +08:00
yangdx
0ac858d3e2 fix(postgres): allow vchordrq.epsilon config when probes is empty
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.

This fixes the documented epsilon setting being impossible to use in the
default configuration.

(cherry picked from commit 3096f844fb)
2025-12-04 19:09:06 +08:00
yangdx
5bd1320a1d Refactor storage classes to use namespace instead of final_namespace
(cherry picked from commit fd486bc922)
2025-12-04 19:09:06 +08:00
yangdx
ed46d375fb Auto-initialize pipeline status in LightRAG.initialize_storages()
• Remove manual initialize_pipeline_status calls
• Auto-init in initialize_storages method
• Update error messages for clarity
• Warn on workspace conflicts

(cherry picked from commit e22ac52ebc)
2025-12-04 19:09:05 +08:00
yangdx
961c87a6e5 Standardize empty workspace handling from "_" to "" across storage
* Unify empty workspace behavior by changing workspace from "_" to ""
* Fixed incorrect empty workspace detection in get_all_update_flags_status()

(cherry picked from commit d54d0d55d9)
2025-12-04 19:09:05 +08:00
yangdx
6b0c0ef815 Refactor namespace lock to support reusable async context manager
• Add NamespaceLock class wrapper
• Fix lock re-entrance issues
• Enable concurrent lock usage
• Fresh context per async with block
• Update get_namespace_lock API

(cherry picked from commit 7deb9a64b9)
2025-12-04 19:09:05 +08:00
yangdx
708f80f43d Add _default_workspace to shared storage finalization
- Add _default_workspace to global vars
- Set _default_workspace to None on cleanup
- Ensure complete resource cleanup
- Fix missing workspace finalization

(cherry picked from commit 6d6716e9f8)
2025-12-04 19:09:05 +08:00
BukeLy
c52c1aea69 test: Enhance workspace isolation test suite to 100% coverage
Why this enhancement is needed:
The initial test suite covered the 4 core scenarios from PR #2366, but
lacked comprehensive coverage of edge cases and implementation details.
This update adds 5 additional test scenarios to achieve complete validation
of the workspace isolation feature.

What was added:
Test 5 - NamespaceLock Re-entrance Protection (2 sub-tests):
  - Verifies re-entrance in same coroutine raises RuntimeError
  - Confirms same NamespaceLock instance works in concurrent coroutines

Test 6 - Different Namespace Lock Isolation:
  - Validates locks with same workspace but different namespaces are independent

Test 7 - Error Handling (2 sub-tests):
  - Tests None workspace conversion to empty string
  - Validates empty workspace creates correct namespace format

Test 8 - Update Flags Workspace Isolation (3 sub-tests):
  - set_all_update_flags isolation between workspaces
  - clear_all_update_flags isolation between workspaces
  - get_all_update_flags_status workspace filtering

Test 9 - Empty Workspace Standardization (2 sub-tests):
  - Empty workspace namespace format verification
  - Empty vs non-empty workspace independence

Test Results:
All 19 test cases passed (previously 9/9, now 19/19)
- 4 core PR requirements: 100% coverage
- 5 additional scenarios: 100% coverage
- Total coverage: 100% of workspace isolation implementation

Testing approach improvements:
- Proper initialization of update flags using get_update_flag()
- Correct handling of flag objects (.value property)
- Updated error handling tests to match actual implementation behavior
- All edge cases and boundary conditions validated

Impact:
Provides complete confidence in the workspace isolation feature with
comprehensive test coverage of all implementation details, edge cases,
and error handling paths.

(cherry picked from commit 436e41439e)
2025-12-04 19:09:05 +08:00
yangdx
67007ed9a6 Improve LightRAG initialization checker tool with better usage docs
• Add workspace param to get_namespace_data
• Update docstring with proper usage example
• Simplify demo to show correct workflow
• Remove confusing before/after comparison
• Clarify tool should run after init

(cherry picked from commit 393f880311)
2025-12-04 19:09:05 +08:00
yangdx
dcf88a8273 Refactor exception handling in MemgraphStorage label methods
(cherry picked from commit 4401f86f07)
2025-12-04 19:09:04 +08:00
yangdx
ed79218550 Optimize JSON write with fast/slow path to reduce memory usage
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage

(cherry picked from commit 777c987371)
2025-12-04 19:09:04 +08:00
yangdx
7632805cd0 Add concurrency warning for JsonKVStorage in cleanup tool
(cherry picked from commit 913fa1e415)
2025-12-04 19:09:04 +08:00
yangdx
db508954d1 Add uv package manager support to installation docs
(cherry picked from commit 7bc6ccea19)
2025-12-04 19:09:04 +08:00