Commit graph

631 commits

Author SHA1 Message Date
Raphaël MANSUY
cf75f71291 ci: fix lint/pre-commit issues and apply autoformat 2025-12-05 14:31:13 +08:00
Raphaël MANSUY
0f7b8ff0a3 Fix: reset document status endpoint (dict access) and add UI 'Reset to Pending' + error handler improvements and translations 2025-12-05 13:07:44 +08:00
Raphaël MANSUY
0a78b4273a cherry-pick 9c10c875 2025-12-04 19:19:05 +08:00
Raphaël MANSUY
b064daa2d2 cherry-pick 702cfd29 2025-12-04 19:19:05 +08:00
Raphaël MANSUY
026041a39c cherry-pick 04ed709b 2025-12-04 19:19:01 +08:00
Raphaël MANSUY
4a5924fe42 cherry-pick afb5e5c1 2025-12-04 19:19:00 +08:00
Raphaël MANSUY
f7c8804a52 cherry-pick 3fa79026 2025-12-04 19:18:40 +08:00
Raphaël MANSUY
2a451c4e22 cherry-pick 5155edd8 2025-12-04 19:18:39 +08:00
Raphaël MANSUY
14413cacbc cherry-pick a9bc3484 2025-12-04 19:18:38 +08:00
Raphaël MANSUY
803315e60c cherry-pick 97a2ee4e 2025-12-04 19:18:38 +08:00
Raphaël MANSUY
96715f92e1 cherry-pick d7e2527e 2025-12-04 19:18:38 +08:00
Raphaël MANSUY
0002bb63db cherry-pick b76350a3 2025-12-04 19:18:38 +08:00
Raphaël MANSUY
b38177de80 cherry-pick a9fec267 2025-12-04 19:18:36 +08:00
Raphaël MANSUY
9b1579f2df cherry-pick 29bac49f 2025-12-04 19:18:35 +08:00
Raphaël MANSUY
a3d7f4b985 cherry-pick 17c2a929 2025-12-04 19:18:35 +08:00
Raphaël MANSUY
0a6e4616b2 cherry-pick 130b4959 2025-12-04 19:18:16 +08:00
Raphaël MANSUY
d4f33ef9b2 cherry-pick 074f0c8b 2025-12-04 19:18:15 +08:00
Raphaël MANSUY
93778770ab fix: sync core modules with upstream after Wave 2 2025-12-04 19:14:52 +08:00
Raphaël MANSUY
7fa3cab355 cherry-pick 162370b6 2025-12-04 19:14:29 +08:00
Raphaël MANSUY
ccca95d392 cherry-pick f0254773 2025-12-04 19:14:27 +08:00
Raphaël MANSUY
bef82fde1c cherry-pick de4412dd 2025-12-04 19:14:27 +08:00
Raphaël MANSUY
759980e522 cherry-pick ab4d7ac2 2025-12-04 19:14:27 +08:00
Raphaël MANSUY
2fbc5972f8 cherry-pick 39b49e92 2025-12-04 19:14:27 +08:00
Raphaël MANSUY
1c00dbfa56 cherry-pick 2fb57e76 2025-12-04 19:14:27 +08:00
Raphaël MANSUY
c83a76786a cherry-pick 14a6c24e 2025-12-04 19:14:27 +08:00
Raphaël MANSUY
f7f9a9e6cf fix: sync all core modules with upstream after Wave 1 2025-12-04 19:13:48 +08:00
EightyOliveira
b8dc5de81a refactor(chunking): rename params and improve docstring for chunking_by_token_size
(cherry picked from commit dacca334e0)
2025-12-04 19:11:21 +08:00
yangdx
d769a446d1 Support async chunking functions in LightRAG processing pipeline
- Add Awaitable and Union type imports
- Update chunking_func type annotation
- Handle coroutine results with await
- Add return type validation
- Update docstring for async support

(cherry picked from commit 940bec0b31)
2025-12-04 19:11:21 +08:00
Tong Da
877f2c01d3 easier version: detect chunking_func result is coroutine or not
(cherry picked from commit 245df75d9c)
2025-12-04 19:11:21 +08:00
Tong Da
8a43e16f6e support async chunking func to improve processing performance when a heavy chunking_func is passed in by user
(cherry picked from commit 7740500693)
2025-12-04 19:11:20 +08:00
yangdx
70ba7cd787 Fix: Remove redundant entity/relation chunk deletions
(cherry picked from commit ea141e2779)
2025-12-04 19:11:20 +08:00
yangdx
8f16f6fe31 Fix entity and relationship deletion when no chunk references remain
(cherry picked from commit c81a56a113)
2025-12-04 19:11:19 +08:00
yangdx
17a9771cfb Add chunk tracking support to entity merge functionality
- Pass chunk storages to merge function
- Merge relation chunk tracking data
- Merge entity chunk tracking data
- Delete old chunk tracking records
- Persist chunk storage updates

(cherry picked from commit 2c09adb8d3)
2025-12-04 19:11:19 +08:00
yangdx
7e0f12c28e Enhance entity/relation editing with chunk tracking synchronization
• Add chunk storage sync to edit ops
• Implement incremental chunk ID updates
• Support entity renaming migrations
• Normalize relation keys consistently
• Preserve chunk references on edits

(cherry picked from commit 3fbd704bf9)
2025-12-04 19:11:19 +08:00
yangdx
488f67e5b2 Fix entity and relation chunk cleanup in deletion pipeline
• Delete from entity_chunks storage
• Delete from relation_chunks storage

(cherry picked from commit 29bf593663)
2025-12-04 19:11:19 +08:00
yangdx
cb5451faf8 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage

(cherry picked from commit dc62c78f98)
2025-12-04 19:11:19 +08:00
yangdx
6ba35f81df Fix: auto-acquire pipeline when idle in document deletion
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation

(cherry picked from commit 4048fc4b89)
2025-12-04 19:11:18 +08:00
yangdx
f6a45245bd Add pipeline status validation before document deletion
(cherry picked from commit 9d7b7981ce)
2025-12-04 19:11:17 +08:00
yangdx
94ae13a037 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking

(cherry picked from commit 926960e957)
2025-12-04 19:11:17 +08:00
yangdx
dfab175c16 Fix workspace isolation for pipeline status across all operations
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls

(cherry picked from commit 52c812b9a0)
2025-12-04 19:11:16 +08:00
BukeLy
fe1576943f fix: Add default workspace support for backward compatibility
Fixes two compatibility issues in workspace isolation:

1. Problem: lightrag_server.py calls initialize_pipeline_status()
   without workspace parameter, causing pipeline to initialize in
   global namespace instead of rag's workspace.

   Solution: Add set_default_workspace() mechanism in shared_storage.
   LightRAG.initialize_storages() now sets default workspace, which
   initialize_pipeline_status() uses when called without parameters.

2. Problem: /health endpoint hardcoded to use "pipeline_status",
   cannot return workspace-specific status or support frontend
   workspace selection.

   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
   extracts workspace from header or falls back to server default,
   returning correct workspace-specific pipeline status.

Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
  update /health endpoint to support LIGHTRAG-WORKSPACE header

Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces

Related: #2353
(cherry picked from commit 18a4870229)
2025-12-04 19:11:16 +08:00
BukeLy
f7b500bca2 feat: Add workspace isolation support for pipeline status
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.

Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
  isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
  * lightrag.py: process_document_queue(), aremove_document()
  * document_routes.py: background_delete_documents(), clear_documents(),
    cancel_pipeline(), get_pipeline_status(), delete_documents()

Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants

Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag

Code changes: 4 files, 141 insertions(+), 28 deletions(-)

Testing: All syntax checks passed, comprehensive workspace isolation tests completed
(cherry picked from commit eb52ec94d7)
2025-12-04 19:11:16 +08:00
yangdx
a7330f0b95 Remove redundant await call in file extraction pipeline
(cherry picked from commit c36afecba4)
2025-12-04 19:11:15 +08:00
yangdx
687d2b6b13 Improve error handling and add cancellation checks in pipeline
(cherry picked from commit 77336e50b6)
2025-12-04 19:11:15 +08:00
yangdx
a471f1ca0e Add pipeline cancellation feature for graceful processing termination
• Add cancel_pipeline API endpoint
• Implement PipelineCancelledException
• Add cancellation checks in main loop
• Handle task cancellation gracefully
• Mark cancelled docs as FAILED

(cherry picked from commit 743aefc655)
2025-12-04 19:11:15 +08:00
Raphaël MANSUY
ed73def994 fix: sync core modules with upstream for compatibility 2025-12-04 19:10:46 +08:00
yangdx
1cbe0ba885 Reduce log level and improve workspace mismatch message clarity
• Change warning to info level
• Simplify workspace mismatch wording

(cherry picked from commit 6cef8df159)
2025-12-04 19:09:06 +08:00
yangdx
ed46d375fb Auto-initialize pipeline status in LightRAG.initialize_storages()
• Remove manual initialize_pipeline_status calls
• Auto-init in initialize_storages method
• Update error messages for clarity
• Warn on workspace conflicts

(cherry picked from commit e22ac52ebc)
2025-12-04 19:09:05 +08:00
yangdx
a42222d7f9 Resolve lock leakage issue during user cancellation handling
• Change default log level to INFO
• Force enable error logging output
• Add lock cleanup rollback protection
• Handle LLM cache persistence errors
• Fix async task exception handling

(cherry picked from commit a9ec15e669)
2025-12-04 19:09:01 +08:00
yangdx
c0ca40e366 Add doc_name field to full docs storage
- Store file_path in full_docs storage
- Update PostgreSQL implementation by map file_path to doc_name
- Other storage implementation automatically handles the new field

(cherry picked from commit 457d51952e)
2025-12-04 19:05:14 +08:00