Commit graph

5358 commits

Author SHA1 Message Date
yangdx
17a9771cfb Add chunk tracking support to entity merge functionality
- Pass chunk storages to merge function
- Merge relation chunk tracking data
- Merge entity chunk tracking data
- Delete old chunk tracking records
- Persist chunk storage updates

(cherry picked from commit 2c09adb8d3)
2025-12-04 19:11:19 +08:00
yangdx
450f969430 Add chunk tracking cleanup to entity/relation deletion and creation
• Clean up chunk storage on delete
• Track chunks in create operations
• Normalize relation keys consistently

(cherry picked from commit a3370b024d)
2025-12-04 19:11:19 +08:00
yangdx
7e0f12c28e Enhance entity/relation editing with chunk tracking synchronization
• Add chunk storage sync to edit ops
• Implement incremental chunk ID updates
• Support entity renaming migrations
• Normalize relation keys consistently
• Preserve chunk references on edits

(cherry picked from commit 3fbd704bf9)
2025-12-04 19:11:19 +08:00
yangdx
488f67e5b2 Fix entity and relation chunk cleanup in deletion pipeline
• Delete from entity_chunks storage
• Delete from relation_chunks storage

(cherry picked from commit 29bf593663)
2025-12-04 19:11:19 +08:00
yangdx
cb5451faf8 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage

(cherry picked from commit dc62c78f98)
2025-12-04 19:11:19 +08:00
yangdx
7248e09fc4 Allow related chunks missing in knowledge graph queries
(cherry picked from commit 35cd567c9e)
2025-12-04 19:11:18 +08:00
yangdx
851b45f726 Add pipeline status lock function for legacy compatibility
- Add get_pipeline_status_lock function
- Return NamespaceLock for consistency
- Support workspace parameter
- Enable logging option
- Legacy code compatibility

(cherry picked from commit 93d445dfdd)
2025-12-04 19:11:18 +08:00
yangdx
402d2f9a98 Fix namespace parsing when workspace contains colons
• Use rsplit instead of split
• Handle colons in workspace names

(cherry picked from commit f8dd2e0724)
2025-12-04 19:11:18 +08:00
yangdx
6ba35f81df Fix: auto-acquire pipeline when idle in document deletion
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation

(cherry picked from commit 4048fc4b89)
2025-12-04 19:11:18 +08:00
yangdx
7e7c86601e Improve workspace isolation tests with better parallelism checks and cleanup
• Add finalize_share_data cleanup
• Refactor lock timing measurement
• Add timeline overlap validation
• Include purpose/scope documentation
• Fix tokenizer integration

(cherry picked from commit 21ad990e36)
2025-12-04 19:11:18 +08:00
yangdx
5febb88824 Fix missing workspace parameter in update flags status call
(cherry picked from commit 1745b30a5f)
2025-12-04 19:11:18 +08:00
yangdx
dc4c10c346 Fix NamespaceLock context variable timing to prevent lock bricking
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly

(cherry picked from commit e8383df3b8)
2025-12-04 19:11:17 +08:00
yangdx
87561f8b28 Remove manual initialize_pipeline_status() calls across codebase
- Auto-init pipeline status in storages
- Remove redundant import statements
- Simplify initialization pattern
- Update docs and examples

(cherry picked from commit cdd53ee875)
2025-12-04 19:11:17 +08:00
yangdx
1e7bd654d8 Fix NamespaceLock concurrent coroutine safety with ContextVar
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check

(cherry picked from commit b6a5a90eaf)
2025-12-04 19:11:17 +08:00
yangdx
f6a45245bd Add pipeline status validation before document deletion
(cherry picked from commit 9d7b7981ce)
2025-12-04 19:11:17 +08:00
yangdx
94ae13a037 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking

(cherry picked from commit 926960e957)
2025-12-04 19:11:17 +08:00
yangdx
c01cfc3649 Fix workspace filtering logic in get_all_update_flags_status
• Handle namespaces with/without prefixes
• Fix workspace matching logic

(cherry picked from commit 7ed0eac4c9)
2025-12-04 19:11:16 +08:00
yangdx
50f8ddd933 Fix pipeline status namespace check to handle root case
- Add check for bare "pipeline_status"
- Handle namespace without prefix

(cherry picked from commit 78689e8837)
2025-12-04 19:11:16 +08:00
yangdx
dfab175c16 Fix workspace isolation for pipeline status across all operations
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls

(cherry picked from commit 52c812b9a0)
2025-12-04 19:11:16 +08:00
BukeLy
fe1576943f fix: Add default workspace support for backward compatibility
Fixes two compatibility issues in workspace isolation:

1. Problem: lightrag_server.py calls initialize_pipeline_status()
   without workspace parameter, causing pipeline to initialize in
   global namespace instead of rag's workspace.

   Solution: Add set_default_workspace() mechanism in shared_storage.
   LightRAG.initialize_storages() now sets default workspace, which
   initialize_pipeline_status() uses when called without parameters.

2. Problem: /health endpoint hardcoded to use "pipeline_status",
   cannot return workspace-specific status or support frontend
   workspace selection.

   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
   extracts workspace from header or falls back to server default,
   returning correct workspace-specific pipeline status.

Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
  update /health endpoint to support LIGHTRAG-WORKSPACE header

Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces

Related: #2353
(cherry picked from commit 18a4870229)
2025-12-04 19:11:16 +08:00
BukeLy
f7b500bca2 feat: Add workspace isolation support for pipeline status
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.

Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
  isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
  * lightrag.py: process_document_queue(), aremove_document()
  * document_routes.py: background_delete_documents(), clear_documents(),
    cancel_pipeline(), get_pipeline_status(), delete_documents()

Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants

Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag

Code changes: 4 files, 141 insertions(+), 28 deletions(-)

Testing: All syntax checks passed, comprehensive workspace isolation tests completed
(cherry picked from commit eb52ec94d7)
2025-12-04 19:11:16 +08:00
yangdx
4cc6388742 Add auto-refresh of popular labels when pipeline completes
• Monitor pipeline busy->idle transitions
• Reload labels on dropdown open if needed
• Add onBeforeOpen callback to AsyncSelect
• Clear refresh flags after processing
• Improve label sync with backend state

(cherry picked from commit 58c83f9da5)
2025-12-04 19:11:15 +08:00
yangdx
a7330f0b95 Remove redundant await call in file extraction pipeline
(cherry picked from commit c36afecba4)
2025-12-04 19:11:15 +08:00
yangdx
537db072e0 Add Qdrant legacy collection migration with workspace support
- Add QdrantMigrationError exception
- Implement automatic data migration
- Support workspace-based partitioning
- Add migration verification logic
- Update collection naming scheme

(cherry picked from commit 5f4a280458)
2025-12-04 19:11:15 +08:00
yangdx
46c13e23f0 Add confirmation dialog for pipeline cancellation
(cherry picked from commit 81e3496aa4)
2025-12-04 19:11:15 +08:00
yangdx
74d0a22020 Add pipeline cancellation feature with UI and i18n support
- Add cancelPipeline API endpoint
- Add cancel button to status dialog
- Update status response type
- Add cancellation UI translations
- Handle cancellation request states

(cherry picked from commit f89b5ab101)
2025-12-04 19:11:15 +08:00
yangdx
687d2b6b13 Improve error handling and add cancellation checks in pipeline
(cherry picked from commit 77336e50b6)
2025-12-04 19:11:15 +08:00
yangdx
a471f1ca0e Add pipeline cancellation feature for graceful processing termination
• Add cancel_pipeline API endpoint
• Implement PipelineCancelledException
• Add cancellation checks in main loop
• Handle task cancellation gracefully
• Mark cancelled docs as FAILED

(cherry picked from commit 743aefc655)
2025-12-04 19:11:15 +08:00
yangdx
37d48bafb6 Simplify skip logging and reduce pipeline status updates
(cherry picked from commit a5253244f9)
2025-12-04 19:11:14 +08:00
yangdx
d56b4c856e Fix trailing whitespace and update test mocking for rerank module
• Remove trailing whitespace
• Fix TiktokenTokenizer import patch
• Add async context manager mocks
• Update aiohttp.ClientSession patch
• Improve test reliability

(cherry picked from commit 561ba4e4b5)
2025-12-04 19:11:14 +08:00
yangdx
f6c20faa16 Configure Dependabot schedule with specific times and timezone
- Set Monday 2AM for GitHub Actions
- Set Wednesday 2AM for Python deps
- Set Friday 2AM for web UI deps
- Use Asia/Shanghai timezone
- Spread updates across weekdays

(cherry picked from commit 6476021619)
2025-12-04 19:11:14 +08:00
yangdx
a32d201f17 Refactor dependencies and add test extra in pyproject.toml
• Pin httpx version in api extra
• Extract test dependencies to new extra
• Move httpx pin from evaluation to api
• Add api dependency to evaluation extra
• Separate test from evaluation concerns

(cherry picked from commit 268e4ff6f1)
2025-12-04 19:11:14 +08:00
yangdx
ea421295a6 Drop Python 3.10 and 3.11 from CI test matrix
(cherry picked from commit 1f8751225d)
2025-12-04 19:11:14 +08:00
yangdx
9068c629c6 Configure comprehensive Dependabot for Python and frontend dependencies
- Add pip ecosystem with grouping
- Add bun ecosystem for webui
- Set weekly update schedule
- Configure cooldown periods
- Ignore numpy breaking changes

(cherry picked from commit 0f19f80fdb)
2025-12-04 19:11:13 +08:00
dependabot[bot]
a5ca3b13cc Bump the github-actions group with 7 updates
Bumps the github-actions group with 7 updates:

| Package | From | To |
| --- | --- | --- |
| [actions/checkout](https://github.com/actions/checkout) | `2` | `6` |
| [actions/setup-python](https://github.com/actions/setup-python) | `2` | `6` |
| [docker/build-push-action](https://github.com/docker/build-push-action) | `5` | `6` |
| [oven-sh/setup-bun](https://github.com/oven-sh/setup-bun) | `1` | `2` |
| [actions/upload-artifact](https://github.com/actions/upload-artifact) | `4` | `5` |
| [actions/download-artifact](https://github.com/actions/download-artifact) | `4` | `6` |
| [actions/stale](https://github.com/actions/stale) | `9` | `10` |

Updates `actions/checkout` from 2 to 6
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v6)

Updates `actions/setup-python` from 2 to 6
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v2...v6)

Updates `docker/build-push-action` from 5 to 6
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v5...v6)

Updates `oven-sh/setup-bun` from 1 to 2
- [Release notes](https://github.com/oven-sh/setup-bun/releases)
- [Commits](https://github.com/oven-sh/setup-bun/compare/v1...v2)

Updates `actions/upload-artifact` from 4 to 5
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v5)

Updates `actions/download-artifact` from 4 to 6
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v4...v6)

Updates `actions/stale` from 9 to 10
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/stale/compare/v9...v10)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: docker/build-push-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: oven-sh/setup-bun
  dependency-version: '2'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: actions/upload-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: actions/download-artifact
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: actions/stale
  dependency-version: '10'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
(cherry picked from commit 88357675ea)
2025-12-04 19:11:13 +08:00
Christian Clauss
ae84bd8ff5 Keep GitHub Actions up to date with GitHub's Dependabot
* [Keeping your software supply chain secure with Dependabot](https://docs.github.com/en/code-security/dependabot)
* [Keeping your actions up to date with Dependabot](https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot)
* [Configuration options for the `dependabot.yml` file - package-ecosystem](https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#package-ecosystem)

To see all GitHub Actions dependencies, type:
% `git grep 'uses: ' .github/workflows/`

(cherry picked from commit 90e38c20ca)
2025-12-04 19:11:13 +08:00
yangdx
97a2c8e8b9 Add ruff as dependency to pytest and evaluation extras
(cherry picked from commit 5f91063c7a)
2025-12-04 19:11:13 +08:00
yangdx
322ff19f72 Remove ascii_colors dependency and fix stream handling errors
• Remove ascii_colors.trace_exception calls
• Add SafeStreamHandler for closed streams
• Patch ascii_colors console handler
• Prevent ValueError on stream close
• Improve logging error handling

(cherry picked from commit 0fb2925c6a)
2025-12-04 19:11:13 +08:00
yangdx
fd76e0f7ce Enhance workspace isolation test with distinct mock data and persistence
• Use different mock LLM per workspace
• Add persistent test directory
• Create workspace-specific responses
• Skip cleanup for inspection

(cherry picked from commit 99262adaaa)
2025-12-04 19:11:13 +08:00
yangdx
4da291468d Rename test classes to prevent warning from pytest
• TestResult → ExecutionResult
• TestStats → ExecutionStats
• Update class docstrings
• Update type hints
• Update variable references

(cherry picked from commit 7e9c8ed1e8)
2025-12-04 19:11:12 +08:00
yangdx
60520e0188 test: add concurrent execution to workspace isolation test
• Add async sleep to mock functions
• Test concurrent ainsert operations
• Use asyncio.gather for parallel exec
• Measure concurrent execution time

(cherry picked from commit 6ae0c14438)
2025-12-04 19:11:12 +08:00
yangdx
9cf1629117 Add pre-commit to pytest dependencies and format test code
• Add pre-commit to pytest extra deps
• Update lock file dependencies

(cherry picked from commit 5da82bb096)
2025-12-04 19:11:12 +08:00
yangdx
668b842862 Standardize test directory creation and remove tempfile dependency
• Remove unused tempfile import
• Use consistent project temp/ structure
• Clean up existing directories first
• Create directories with os.makedirs
• Use descriptive test directory names

(cherry picked from commit 4fef731f37)
2025-12-04 19:11:12 +08:00
yangdx
660ccc7ada Add GitHub CI workflow and test markers for offline/integration tests
- Add GitHub Actions workflow for CI
- Mark integration tests requiring services
- Add offline test markers for isolated tests
- Skip integration tests by default
- Configure pytest markers and collection

(cherry picked from commit 4ea2124001)
2025-12-04 19:11:12 +08:00
yangdx
a6fc87d50e Replace pytest group reference with explicit dependencies in evaluation
• Remove pytest group dependency
• Add explicit pytest>=8.4.2
• Add pytest-asyncio>=1.2.0
• Add pre-commit directly
• Fix potential circular dependency

(cherry picked from commit 472b498ade)
2025-12-04 19:11:12 +08:00
yangdx
d790a660cd Fix test to use default workspace parameter behavior
(cherry picked from commit 41bf6d0283)
2025-12-04 19:11:12 +08:00
yangdx
d011a1c0e7 Refactor test configuration to use pytest fixtures and CLI options
• Add pytest command-line options
• Create session-scoped fixtures
• Remove hardcoded environment vars
• Update test function signatures
• Improve configuration priority

(cherry picked from commit 1fe05df211)
2025-12-04 19:11:12 +08:00
yangdx
97cf689dfb Remove unused variables from workspace isolation test
* Remove initial_ok check
* Remove both_set verification

(cherry picked from commit cf73cb4d24)
2025-12-04 19:11:11 +08:00
yangdx
a5b3be1f5a Refactor pytest dependencies into separate optional group
- Extract pytest deps to own group
- Reference pytest group in evaluation
- Add pytest config to pyproject.toml
- Update uv.lock with new structure

(cherry picked from commit b7b8d15632)
2025-12-04 19:11:11 +08:00
BukeLy
6559dc4fed test: Add comprehensive workspace isolation test suite for PR #2366
Why this change is needed:
PR #2366 introduces critical workspace isolation functionality to resolve
multi-instance concurrency issues, but lacks comprehensive automated tests
to validate the implementation. Without proper test coverage, we cannot
ensure the feature works correctly across all scenarios mentioned in the PR.

What this test suite covers:
1. Pipeline Status Isolation: Verifies different workspaces maintain
   independent pipeline status without interference
2. Lock Mechanism: Validates the new keyed lock system works correctly
   - Different workspaces can acquire locks in parallel
   - Same workspace locks serialize properly
   - No deadlocks occur
3. Backward Compatibility: Ensures legacy code without workspace parameters
   continues to work using default workspace
4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with
   different workspaces can run concurrently without data interference

Testing approach:
- All tests are automated and deterministic
- Uses timing assertions to verify parallel vs serial lock behavior
- Validates data isolation through direct namespace data inspection
- Comprehensive error handling and detailed test output

Test results:
All 9 test cases passed successfully, confirming the workspace isolation
feature is working correctly across all key scenarios.

Impact:
Provides confidence that PR #2366's workspace isolation feature is
production-ready and won't introduce regressions.

(cherry picked from commit 4742fc8efa)
2025-12-04 19:11:11 +08:00