Commit graph

901 commits

Author SHA1 Message Date
yangdx
d6019c82af Add CASCADE to AGE extension creation in PostgreSQL implementation
- Add CASCADE option to CREATE EXTENSION
- Ensure dependencies are installed
- Fix potential AGE setup issues
2025-12-02 00:17:41 +08:00
yangdx
93d445dfdd Add pipeline status lock function for legacy compatibility
- Add get_pipeline_status_lock function
- Return NamespaceLock for consistency
- Support workspace parameter
- Enable logging option
- Legacy code compatibility
2025-11-25 18:24:39 +08:00
yangdx
8c4d7a00ad Refactor: Extract retry decorator to reduce code duplication in Neo4J storage
• Define READ_RETRY_EXCEPTIONS constant
• Create reusable READ_RETRY decorator
• Replace 11 duplicate retry decorators
• Improve code maintainability
• Add missing retry to edge_degrees_batch
2025-11-25 01:35:21 +08:00
yangdx
7aaa51cda9 Add retry decorators to Neo4j read operations for resilience 2025-11-24 22:28:15 +08:00
yangdx
dbae327a17 Merge branch 'main' into dev-postgres-vchordrq 2025-11-18 22:13:27 +08:00
yangdx
3096f844fb fix(postgres): allow vchordrq.epsilon config when probes is empty
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.

This fixes the documented epsilon setting being impossible to use in the
default configuration.
2025-11-18 21:58:36 +08:00
yangdx
f8dd2e0724 Fix namespace parsing when workspace contains colons
• Use rsplit instead of split
• Handle colons in workspace names
2025-11-18 12:23:05 +08:00
wmsnp
d07023c962
feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic 2025-11-18 11:45:16 +08:00
yangdx
6d6716e9f8 Add _default_workspace to shared storage finalization
- Add _default_workspace to global vars
- Set _default_workspace to None on cleanup
- Ensure complete resource cleanup
- Fix missing workspace finalization
2025-11-17 13:46:46 +08:00
yangdx
e8383df3b8 Fix NamespaceLock context variable timing to prevent lock bricking
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
2025-11-17 12:54:33 +08:00
yangdx
95e1fb1612 Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
yangdx
7ed0eac4c9 Fix workspace filtering logic in get_all_update_flags_status
• Handle namespaces with/without prefixes
• Fix workspace matching logic
2025-11-17 12:54:33 +08:00
yangdx
78689e8837 Fix pipeline status namespace check to handle root case
- Add check for bare "pipeline_status"
- Handle namespace without prefix
2025-11-17 12:54:33 +08:00
yangdx
d54d0d55d9 Standardize empty workspace handling from "_" to "" across storage
* Unify empty workspace behavior by changing workspace from "_" to ""
* Fixed incorrect empty workspace detection in get_all_update_flags_status()
2025-11-17 12:54:33 +08:00
yangdx
b6a5a90eaf Fix NamespaceLock concurrent coroutine safety with ContextVar
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check
2025-11-17 12:54:33 +08:00
yangdx
fd486bc922 Refactor storage classes to use namespace instead of final_namespace 2025-11-17 12:54:33 +08:00
yangdx
01814bfc7a Fix missing function call parentheses in get_all_update_flags_status 2025-11-17 12:54:33 +08:00
yangdx
7deb9a64b9 Refactor namespace lock to support reusable async context manager
• Add NamespaceLock class wrapper
• Fix lock re-entrance issues
• Enable concurrent lock usage
• Fresh context per async with block
• Update get_namespace_lock API
2025-11-17 12:54:33 +08:00
yangdx
52c812b9a0 Fix workspace isolation for pipeline status across all operations
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls
2025-11-17 12:54:33 +08:00
yangdx
926960e957 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking
2025-11-17 12:54:33 +08:00
yangdx
8283c86bce Refactor exception handling in MemgraphStorage label methods 2025-11-17 12:54:32 +08:00
yangdx
423e4e927a Fix null reference errors in graph database error handling
- Initialize result vars to None
- Add null checks before consume calls
- Prevent crashes in except blocks
- Apply fix to both Neo4J and Memgraph
2025-11-17 12:54:32 +08:00
yangdx
a08bc72635 Fix empty dict handling after JSON sanitization
• Replace truthy checks with `is not None`
• Handle empty dict edge case properly
• Prevent data reload failures
• Add comprehensive test coverage
• Fix JsonKVStorage and DocStatusStorage
2025-11-17 12:54:32 +08:00
yangdx
cca0800ed4 Fix migration to reload sanitized data and prevent memory corruption
• Reload cleaned data after sanitization
• Update shared memory with clean data
• Add specific surrogate char tests
• Test migration sanitization flow
• Prevent dirty data in memory
2025-11-17 12:54:32 +08:00
yangdx
f289cf6225 Optimize JSON write with fast/slow path to reduce memory usage
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage
2025-11-17 12:54:32 +08:00
BukeLy
18a4870229 fix: Add default workspace support for backward compatibility
Fixes two compatibility issues in workspace isolation:

1. Problem: lightrag_server.py calls initialize_pipeline_status()
   without workspace parameter, causing pipeline to initialize in
   global namespace instead of rag's workspace.

   Solution: Add set_default_workspace() mechanism in shared_storage.
   LightRAG.initialize_storages() now sets default workspace, which
   initialize_pipeline_status() uses when called without parameters.

2. Problem: /health endpoint hardcoded to use "pipeline_status",
   cannot return workspace-specific status or support frontend
   workspace selection.

   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
   extracts workspace from header or falls back to server default,
   returning correct workspace-specific pipeline status.

Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
  update /health endpoint to support LIGHTRAG-WORKSPACE header

Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces

Related: #2353
2025-11-17 12:54:20 +08:00
BukeLy
eb52ec94d7 feat: Add workspace isolation support for pipeline status
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.

Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
  isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
  * lightrag.py: process_document_queue(), aremove_document()
  * document_routes.py: background_delete_documents(), clear_documents(),
    cancel_pipeline(), get_pipeline_status(), delete_documents()

Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants

Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag

Code changes: 4 files, 141 insertions(+), 28 deletions(-)

Testing: All syntax checks passed, comprehensive workspace isolation tests completed
2025-11-17 12:53:44 +08:00
yangdx
4401f86f07 Refactor exception handling in MemgraphStorage label methods 2025-11-14 11:01:26 +08:00
yangdx
1ccef2b932 Fix null reference errors in graph database error handling
- Initialize result vars to None
- Add null checks before consume calls
- Prevent crashes in except blocks
- Apply fix to both Neo4J and Memgraph
2025-11-14 10:39:04 +08:00
yangdx
70cc2419f2 Fix empty dict handling after JSON sanitization
• Replace truthy checks with `is not None`
• Handle empty dict edge case properly
• Prevent data reload failures
• Add comprehensive test coverage
• Fix JsonKVStorage and DocStatusStorage
2025-11-12 16:40:57 +08:00
yangdx
dcf1d28681 Fix migration to reload sanitized data and prevent memory corruption
• Reload cleaned data after sanitization
• Update shared memory with clean data
• Add specific surrogate char tests
• Test migration sanitization flow
• Prevent dirty data in memory
2025-11-12 16:16:28 +08:00
yangdx
777c987371 Optimize JSON write with fast/slow path to reduce memory usage
- Fast path for clean data (no sanitization)
- Slow path sanitizes during encoding
- Reload shared memory after sanitization
- Custom encoder avoids deep copies
- Comprehensive test coverage
2025-11-12 13:48:56 +08:00
yangdx
1a91bcdb5f Improve storage config validation and add config.ini fallback support
• Add MongoDB env requirements
• Support config.ini fallback
• Warn on missing env vars
• Check available storage count
• Show config source info
2025-11-08 22:48:49 +08:00
yangdx
3276b7a49d Fix linting 2025-11-06 20:48:51 +08:00
yangdx
155f59759b Fix node ID normalization and improve batch operation consistency
• Remove premature ID normalization
• Add lookup mapping for node resolution
• Filter results by requested nodes only
• Improve error logging with workspace
2025-11-06 20:34:53 +08:00
yangdx
807d2461d3 Remove unused chunk-based node/edge retrieval methods 2025-11-06 18:17:10 +08:00
yangdx
5f4a280458 Add Qdrant legacy collection migration with workspace support
- Add QdrantMigrationError exception
- Implement automatic data migration
- Support workspace-based partitioning
- Add migration verification logic
- Update collection naming scheme
2025-10-30 19:16:33 +08:00
yangdx
f610fdaf9b Merge branch 'main' into Anush008/main 2025-10-30 11:07:39 +08:00
yangdx
d5bcd14c6f Refactor service deployment to use direct process execution
- Remove bash wrapper script
- Update systemd service configuration
- Improve process management for gunicorn
- Simplify shared storage cleanup logic
- Update documentation for deployment
2025-10-29 18:55:47 +08:00
yangdx
6489aaa7f0 Remove worker_exit hook and improve cleanup logging
• Remove unreliable worker_exit function
• Add debug logs for cleanup modes
• Move DEBUG_LOCKS to top of file
2025-10-29 15:15:13 +08:00
yangdx
72b29659c9 Fix worker process cleanup to prevent shared resource conflicts
• Add worker_exit hook in gunicorn config
• Add shutdown_manager parameter in finalize_share_data of share_storage
• Prevent Manager shutdown in workers
• Remove custom signal handlers
2025-10-29 13:33:21 +08:00
yangdx
0692175c7b Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage 2025-10-29 09:49:59 +08:00
yangdx
411e92e6b9 Fix vector deletion logging to show actual deleted count 2025-10-27 14:22:16 +08:00
Anush008
8584980e3a
refactor: Qdrant Multi-tenancy (Include staged)
Signed-off-by: Anush008 <anushshetty90@gmail.com>
2025-10-26 09:58:24 +05:30
yangdx
a97e5dad4c Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity
• Replace Cypher with native SQL queries
• Fix O(N²) to O(E) performance issue
• Add error handling for parse failures
• Use direct table access pattern
• Eliminate Cartesian product joins
2025-10-25 14:37:18 +08:00
yangdx
083b163c1f Improve lock logging with consistent messaging and debug levels 2025-10-25 11:04:21 +08:00
yangdx
a9ec15e669 Resolve lock leakage issue during user cancellation handling
• Change default log level to INFO
• Force enable error logging output
• Add lock cleanup rollback protection
• Handle LLM cache persistence errors
• Fix async task exception handling
2025-10-25 03:06:45 +08:00
yangdx
0fa9a2eee3 Fix dimension type comparison in Milvus vector field validation
• Convert dimensions to int for comparison
• Handle string vs int type mismatches
2025-10-22 23:37:49 +08:00
Daniel.y
907204714b
Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization
Optimize PostgreSQL initialization performance
2025-10-21 22:17:46 +08:00
yangdx
e5e16b7bd1 Fix Redis data migration error
• Use proper Redis connection context
• Fix namespace pattern for key scanning
• Propagate storage check exceptions
• Remove defensive error swallowing
2025-10-21 16:27:04 +08:00