Commit graph

893 commits

Author SHA1 Message Date
Raphaël MANSUY
40209a5e76 cherry-pick 3276b7a4 2025-12-04 19:19:01 +08:00
Raphaël MANSUY
51170bcb4a cherry-pick 155f5975 2025-12-04 19:19:01 +08:00
Raphaël MANSUY
3f309105b0 cherry-pick d5bcd14c 2025-12-04 19:18:40 +08:00
Raphaël MANSUY
8ebd98ae8e cherry-pick 72b29659 2025-12-04 19:18:40 +08:00
Raphaël MANSUY
64050aeb1e cherry-pick 813f4af9 2025-12-04 19:18:36 +08:00
Raphaël MANSUY
3ae2043e7b cherry-pick fdcb034d 2025-12-04 19:18:14 +08:00
Raphaël MANSUY
0bc702127f cherry-pick d8a9617c 2025-12-04 19:18:14 +08:00
Raphaël MANSUY
1a167fb7f7 cherry-pick cca0800e 2025-12-04 19:15:03 +08:00
Raphaël MANSUY
107b32aa8d cherry-pick 95e1fb16 2025-12-04 19:14:31 +08:00
yangdx
211dbc3f78 Remove unused chunk-based node/edge retrieval methods
(cherry picked from commit 807d2461d3)
2025-12-04 19:11:20 +08:00
yangdx
cb5451faf8 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage

(cherry picked from commit dc62c78f98)
2025-12-04 19:11:19 +08:00
yangdx
851b45f726 Add pipeline status lock function for legacy compatibility
- Add get_pipeline_status_lock function
- Return NamespaceLock for consistency
- Support workspace parameter
- Enable logging option
- Legacy code compatibility

(cherry picked from commit 93d445dfdd)
2025-12-04 19:11:18 +08:00
yangdx
402d2f9a98 Fix namespace parsing when workspace contains colons
• Use rsplit instead of split
• Handle colons in workspace names

(cherry picked from commit f8dd2e0724)
2025-12-04 19:11:18 +08:00
yangdx
dc4c10c346 Fix NamespaceLock context variable timing to prevent lock bricking
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly

(cherry picked from commit e8383df3b8)
2025-12-04 19:11:17 +08:00
yangdx
1e7bd654d8 Fix NamespaceLock concurrent coroutine safety with ContextVar
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check

(cherry picked from commit b6a5a90eaf)
2025-12-04 19:11:17 +08:00
yangdx
94ae13a037 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking

(cherry picked from commit 926960e957)
2025-12-04 19:11:17 +08:00
yangdx
c01cfc3649 Fix workspace filtering logic in get_all_update_flags_status
• Handle namespaces with/without prefixes
• Fix workspace matching logic

(cherry picked from commit 7ed0eac4c9)
2025-12-04 19:11:16 +08:00
yangdx
50f8ddd933 Fix pipeline status namespace check to handle root case
- Add check for bare "pipeline_status"
- Handle namespace without prefix

(cherry picked from commit 78689e8837)
2025-12-04 19:11:16 +08:00
yangdx
dfab175c16 Fix workspace isolation for pipeline status across all operations
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls

(cherry picked from commit 52c812b9a0)
2025-12-04 19:11:16 +08:00
BukeLy
fe1576943f fix: Add default workspace support for backward compatibility
Fixes two compatibility issues in workspace isolation:

1. Problem: lightrag_server.py calls initialize_pipeline_status()
   without workspace parameter, causing pipeline to initialize in
   global namespace instead of rag's workspace.

   Solution: Add set_default_workspace() mechanism in shared_storage.
   LightRAG.initialize_storages() now sets default workspace, which
   initialize_pipeline_status() uses when called without parameters.

2. Problem: /health endpoint hardcoded to use "pipeline_status",
   cannot return workspace-specific status or support frontend
   workspace selection.

   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
   extracts workspace from header or falls back to server default,
   returning correct workspace-specific pipeline status.

Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
  update /health endpoint to support LIGHTRAG-WORKSPACE header

Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces

Related: #2353
(cherry picked from commit 18a4870229)
2025-12-04 19:11:16 +08:00
BukeLy
f7b500bca2 feat: Add workspace isolation support for pipeline status
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.

Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
  isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
  * lightrag.py: process_document_queue(), aremove_document()
  * document_routes.py: background_delete_documents(), clear_documents(),
    cancel_pipeline(), get_pipeline_status(), delete_documents()

Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants

Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag

Code changes: 4 files, 141 insertions(+), 28 deletions(-)

Testing: All syntax checks passed, comprehensive workspace isolation tests completed
(cherry picked from commit eb52ec94d7)
2025-12-04 19:11:16 +08:00
yangdx
537db072e0 Add Qdrant legacy collection migration with workspace support
- Add QdrantMigrationError exception
- Implement automatic data migration
- Support workspace-based partitioning
- Add migration verification logic
- Update collection naming scheme

(cherry picked from commit 5f4a280458)
2025-12-04 19:11:15 +08:00
yangdx
7ce3680ca5 Add retry decorators to Neo4j read operations for resilience
(cherry picked from commit 7aaa51cda9)
2025-12-04 19:09:08 +08:00
yangdx
00d51f9dba Fix dimension type comparison in Milvus vector field validation
• Convert dimensions to int for comparison
• Handle string vs int type mismatches

(cherry picked from commit 0fa9a2eee3)
2025-12-04 19:09:08 +08:00
yangdx
0594a5a049 Update pymilvus dependency from 2.5.2 to >=2.6.2
(cherry picked from commit baab992431)
2025-12-04 19:09:07 +08:00
yangdx
de011c99a4 Add CASCADE to AGE extension creation in PostgreSQL implementation
- Add CASCADE option to CREATE EXTENSION
- Ensure dependencies are installed
- Fix potential AGE setup issues

(cherry picked from commit d6019c82af)
2025-12-04 19:09:07 +08:00
yangdx
bd93f13012 Refactor: Extract retry decorator to reduce code duplication in Neo4J storage
• Define READ_RETRY_EXCEPTIONS constant
• Create reusable READ_RETRY decorator
• Replace 11 duplicate retry decorators
• Improve code maintainability
• Add missing retry to edge_degrees_batch

(cherry picked from commit 8c4d7a00ad)
2025-12-04 19:09:07 +08:00
wmsnp
ae5cd9262b fix: add logger to configure_vchordrq() and format code
(cherry picked from commit f4bf5d279c)
2025-12-04 19:09:06 +08:00
wmsnp
3954bb6579 feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic
(cherry picked from commit d07023c962)
2025-12-04 19:09:06 +08:00
yangdx
0ac858d3e2 fix(postgres): allow vchordrq.epsilon config when probes is empty
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.

This fixes the documented epsilon setting being impossible to use in the
default configuration.

(cherry picked from commit 3096f844fb)
2025-12-04 19:09:06 +08:00
yangdx
5bd1320a1d Refactor storage classes to use namespace instead of final_namespace
(cherry picked from commit fd486bc922)
2025-12-04 19:09:06 +08:00
yangdx
961c87a6e5 Standardize empty workspace handling from "_" to "" across storage
* Unify empty workspace behavior by changing workspace from "_" to ""
* Fixed incorrect empty workspace detection in get_all_update_flags_status()

(cherry picked from commit d54d0d55d9)
2025-12-04 19:09:05 +08:00
yangdx
6b0c0ef815 Refactor namespace lock to support reusable async context manager
• Add NamespaceLock class wrapper
• Fix lock re-entrance issues
• Enable concurrent lock usage
• Fresh context per async with block
• Update get_namespace_lock API

(cherry picked from commit 7deb9a64b9)
2025-12-04 19:09:05 +08:00
yangdx
708f80f43d Add _default_workspace to shared storage finalization
- Add _default_workspace to global vars
- Set _default_workspace to None on cleanup
- Ensure complete resource cleanup
- Fix missing workspace finalization

(cherry picked from commit 6d6716e9f8)
2025-12-04 19:09:05 +08:00
yangdx
dcf88a8273 Refactor exception handling in MemgraphStorage label methods
(cherry picked from commit 4401f86f07)
2025-12-04 19:09:04 +08:00
yangdx
ed79218550 Optimize JSON write with fast/slow path to reduce memory usage
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage

(cherry picked from commit 777c987371)
2025-12-04 19:09:04 +08:00
yangdx
5a5e583b9c Improve storage config validation and add config.ini fallback support
• Add MongoDB env requirements
• Support config.ini fallback
• Warn on missing env vars
• Check available storage count
• Show config source info

(cherry picked from commit 1a91bcdb5f)
2025-12-04 19:09:03 +08:00
yangdx
8c7b0017df Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage
(cherry picked from commit 0692175c7b)
2025-12-04 19:09:01 +08:00
Anush008
e86aa091f4 refactor: Qdrant Multi-tenancy (Include staged)
Signed-off-by: Anush008 <anushshetty90@gmail.com>
(cherry picked from commit 8584980e3a)
2025-12-04 19:09:01 +08:00
yangdx
a42222d7f9 Resolve lock leakage issue during user cancellation handling
• Change default log level to INFO
• Force enable error logging output
• Add lock cleanup rollback protection
• Handle LLM cache persistence errors
• Fix async task exception handling

(cherry picked from commit a9ec15e669)
2025-12-04 19:09:01 +08:00
yangdx
8b6fdef965 Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity
• Replace Cypher with native SQL queries
• Fix O(N²) to O(E) performance issue
• Add error handling for parse failures
• Use direct table access pattern
• Eliminate Cartesian product joins

(cherry picked from commit a97e5dad4c)
2025-12-04 19:09:01 +08:00
Yasiru Rangana
8a72135a32 Optimize PostgreSQL initialization performance
- Batch index existence checks into single query (16+ queries -> 1 query)
- Batch timestamp column checks into single query (8 queries -> 1 query)
- Batch field length checks into single query (5 queries -> 1 query)

Performance improvement: ~70-80% faster initialization (35s -> 5-10s)

Key optimizations:
1. check_tables(): Use ANY($1) to check all indexes at once
2. _migrate_timestamp_columns(): Batch all column type checks
3. _migrate_field_lengths(): Batch all field definition checks

All changes are backward compatible with no schema or API changes.
Reduces database round-trips by batching information_schema queries.

(cherry picked from commit 2f22336ace)
2025-12-04 19:09:00 +08:00
Lucky Verma
12ebc9f2a9 Refactor SQL queries and improve input handling in PGKVStorage and PGDocStatusStorage
(cherry picked from commit 917e41aa78)
2025-12-04 19:09:00 +08:00
yangdx
e19a4be0af Preserve ordering in get_by_ids methods across all storage implementations
- Fix result ordering in vector stores
- Update KV storage get_by_ids methods
- Maintain order in doc status storage
- Return None for missing IDs

(cherry picked from commit 9be22dd666)
2025-12-04 19:08:58 +08:00
yangdx
17106225dd Add PostgreSQL connection retry mechanism with comprehensive error handling
• Implement connection retry with backoff
• Add transient error detection
• Pool management with timeout guards

(cherry picked from commit e758204ab2)
2025-12-04 19:08:58 +08:00
yangdx
60a695539a Refactor PostgreSQL retry config to use centralized configuration
• Move retry config to ClientManager
• Remove env var parsing from PostgreSQLDB
• Add config params to test setup

(cherry picked from commit b3ed264707)
2025-12-04 19:08:57 +08:00
yangdx
c6433edb23 Make PostgreSQL statement_cache_size configuration optional
• Remove forced int conversion
• Allow None values for cache size
• Add conditional parameter setting

(cherry picked from commit f2c0b41e78)
2025-12-04 19:08:57 +08:00
kevinnkansah
c8c73ab114 fix: renamed PostGreSQL options env variable and allowed LRU cache to be an optional env variable
(cherry picked from commit 22a7b482c5)
2025-12-04 19:08:56 +08:00
kevinnkansah
7ce46bacb6 feat: add options for PostGres connection
(cherry picked from commit 108cdbe133)
2025-12-04 19:08:56 +08:00
Lucky Verma
80dcbc696a Refactor SQL queries and improve input handling in PGKVStorage and PGDocStatusStorage
(cherry picked from commit 917e41aa78)
2025-12-04 19:08:41 +08:00