Commit graph

870 commits

Author SHA1 Message Date
chengjie
5d31412bd7 feat: add workspace isolation support to unified lock functions
Why this change is needed:
The current locking system uses global locks shared across all users
and workspaces, causing blocking issues in multi-tenant scenarios.
When one tenant performs document indexing, all other tenants are
blocked waiting for the same global lock. This severely limits
the system's ability to serve multiple users concurrently.

How it solves it:
- Add optional `workspace` parameter to 5 lock functions
- Implement lazy creation of workspace-specific locks with proper synchronization
- Store workspace locks in new `_sync_locks` dictionary
- Support both multi-process (RLock) and single-process (asyncio.Lock) modes
- Empty workspace parameter uses global lock for backward compatibility
- Extract common logic into `_get_workspace_lock()` to eliminate duplication

Impact:
- Enables concurrent operations across different workspaces
- Foundation for PR2 (pipeline status isolation)
- Zero impact on existing code (all parameters optional with defaults)
- Each workspace now has independent lock instances
- Thread-safe lazy creation using _registry_guard in multiprocess mode
- Automatic creation of async_locks for workspace locks in multiprocess mode

Code Quality Improvements (Linus review feedback):
- Fixed race condition: lazy creation protected by _registry_guard
- Eliminated code duplication: common logic extracted to _get_workspace_lock()
- Added async_lock support: workspace locks now have companion async_locks
- Handles None workspace parameter gracefully
- Clear separation of concerns: one function handles all workspace logic

Testing:
- 17 new test cases covering:
  - Basic functionality and naming
  - Workspace isolation and independence
  - Backward compatibility with empty workspace
  - Concurrent operations (3 workspaces in parallel)
  - Performance (1000 workspace lock creation <2s)
  - Edge cases (special characters, unicode, long names)
- All existing tests pass (21/21 excluding env issues)
- Verified lock serialization within workspace
- Verified lock independence across workspaces

Files modified:
- lightrag/kg/shared_storage.py: refactored lock functions + synchronization
- tests/test_workspace_locks.py: comprehensive test suite
2025-11-10 22:51:49 +08:00
yangdx
1a91bcdb5f Improve storage config validation and add config.ini fallback support
• Add MongoDB env requirements
• Support config.ini fallback
• Warn on missing env vars
• Check available storage count
• Show config source info
2025-11-08 22:48:49 +08:00
yangdx
3276b7a49d Fix linting 2025-11-06 20:48:51 +08:00
yangdx
155f59759b Fix node ID normalization and improve batch operation consistency
• Remove premature ID normalization
• Add lookup mapping for node resolution
• Filter results by requested nodes only
• Improve error logging with workspace
2025-11-06 20:34:53 +08:00
yangdx
807d2461d3 Remove unused chunk-based node/edge retrieval methods 2025-11-06 18:17:10 +08:00
yangdx
5f4a280458 Add Qdrant legacy collection migration with workspace support
- Add QdrantMigrationError exception
- Implement automatic data migration
- Support workspace-based partitioning
- Add migration verification logic
- Update collection naming scheme
2025-10-30 19:16:33 +08:00
yangdx
f610fdaf9b Merge branch 'main' into Anush008/main 2025-10-30 11:07:39 +08:00
yangdx
d5bcd14c6f Refactor service deployment to use direct process execution
- Remove bash wrapper script
- Update systemd service configuration
- Improve process management for gunicorn
- Simplify shared storage cleanup logic
- Update documentation for deployment
2025-10-29 18:55:47 +08:00
yangdx
6489aaa7f0 Remove worker_exit hook and improve cleanup logging
• Remove unreliable worker_exit function
• Add debug logs for cleanup modes
• Move DEBUG_LOCKS to top of file
2025-10-29 15:15:13 +08:00
yangdx
72b29659c9 Fix worker process cleanup to prevent shared resource conflicts
• Add worker_exit hook in gunicorn config
• Add shutdown_manager parameter in finalize_share_data of share_storage
• Prevent Manager shutdown in workers
• Remove custom signal handlers
2025-10-29 13:33:21 +08:00
yangdx
0692175c7b Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage 2025-10-29 09:49:59 +08:00
yangdx
411e92e6b9 Fix vector deletion logging to show actual deleted count 2025-10-27 14:22:16 +08:00
Anush008
8584980e3a
refactor: Qdrant Multi-tenancy (Include staged)
Signed-off-by: Anush008 <anushshetty90@gmail.com>
2025-10-26 09:58:24 +05:30
yangdx
a97e5dad4c Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity
• Replace Cypher with native SQL queries
• Fix O(N²) to O(E) performance issue
• Add error handling for parse failures
• Use direct table access pattern
• Eliminate Cartesian product joins
2025-10-25 14:37:18 +08:00
yangdx
083b163c1f Improve lock logging with consistent messaging and debug levels 2025-10-25 11:04:21 +08:00
yangdx
a9ec15e669 Resolve lock leakage issue during user cancellation handling
• Change default log level to INFO
• Force enable error logging output
• Add lock cleanup rollback protection
• Handle LLM cache persistence errors
• Fix async task exception handling
2025-10-25 03:06:45 +08:00
yangdx
0fa9a2eee3 Fix dimension type comparison in Milvus vector field validation
• Convert dimensions to int for comparison
• Handle string vs int type mismatches
2025-10-22 23:37:49 +08:00
Daniel.y
907204714b
Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization
Optimize PostgreSQL initialization performance
2025-10-21 22:17:46 +08:00
yangdx
e5e16b7bd1 Fix Redis data migration error
• Use proper Redis connection context
• Fix namespace pattern for key scanning
• Propagate storage check exceptions
• Remove defensive error swallowing
2025-10-21 16:27:04 +08:00
Yasiru Rangana
2f22336ace Optimize PostgreSQL initialization performance
- Batch index existence checks into single query (16+ queries -> 1 query)
- Batch timestamp column checks into single query (8 queries -> 1 query)
- Batch field length checks into single query (5 queries -> 1 query)

Performance improvement: ~70-80% faster initialization (35s -> 5-10s)

Key optimizations:
1. check_tables(): Use ANY($1) to check all indexes at once
2. _migrate_timestamp_columns(): Batch all column type checks
3. _migrate_field_lengths(): Batch all field definition checks

All changes are backward compatible with no schema or API changes.
Reduces database round-trips by batching information_schema queries.
2025-10-21 01:09:48 +11:00
yangdx
dc62c78f98 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
2025-10-20 15:24:15 +08:00
yangdx
813f4af9d7 Fix linting 2025-10-18 11:44:48 +08:00
Lucky Verma
917e41aa78 Refactor SQL queries and improve input handling in PGKVStorage and PGDocStatusStorage 2025-10-17 15:40:32 -05:00
yangdx
baab992431 Update pymilvus dependency from 2.5.2 to >=2.6.2 2025-10-11 22:42:02 +08:00
yangdx
e1e4f1b02c Fix get_by_ids to return None for missing records consistently 2025-10-11 13:34:26 +08:00
yangdx
9be22dd666 Preserve ordering in get_by_ids methods across all storage implementations
- Fix result ordering in vector stores
- Update KV storage get_by_ids methods
- Maintain order in doc status storage
- Return None for missing IDs
2025-10-11 12:37:59 +08:00
yangdx
b3ed264707 Refactor PostgreSQL retry config to use centralized configuration
• Move retry config to ClientManager
• Remove env var parsing from PostgreSQLDB
• Add config params to test setup
2025-10-10 03:44:13 +08:00
yangdx
e758204ab2 Add PostgreSQL connection retry mechanism with comprehensive error handling
• Implement connection retry with backoff
• Add transient error detection
• Pool management with timeout guards
2025-10-10 03:06:01 +08:00
yangdx
f1e0110716 Merge branch 'kevinnkansah/main' 2025-10-07 23:04:59 +08:00
yangdx
f2c0b41e78 Make PostgreSQL statement_cache_size configuration optional
• Remove forced int conversion
• Allow None values for cache size
• Add conditional parameter setting
2025-10-07 22:57:21 +08:00
Aleks Vujić
dd8f44e621
Fixed typo in log message when creating new graph file 2025-10-07 14:30:05 +02:00
kevinnkansah
fdcb034da0 chore: distinguish settings 2025-10-06 12:01:40 +02:00
kevinnkansah
22a7b482c5 fix: renamed PostGreSQL options env variable and allowed LRU cache to be an optional env variable 2025-10-06 11:56:09 +02:00
kevinnkansah
d8a9617c0e fix: fix: asyncpg bouncer connection pool error
Prepared statement caching is disabled by setting
`statement_cache_size=0` in the `asyncpg` connection pool parameters.
This is necessary to prevent
`asyncpg.exceptions.InvalidSQLStatementNameError` when using
transaction-level connection poolers like Supabase Supavisor or
pgbouncer, which do not support prepared statements.
2025-10-06 00:36:25 +02:00
kevinnkansah
108cdbe133 feat: add options for PostGres connection 2025-10-05 23:29:04 +02:00
yangdx
457d51952e Add doc_name field to full docs storage
- Store file_path in full_docs storage
- Update PostgreSQL implementation by map file_path to doc_name
- Other storage implementation automatically handles the new field
2025-10-05 11:44:27 +08:00
yangdx
f99c4a3738 Fix graph truncation logic for depth-limited traversals
• Only set truncated flag for node limit
• Keep depth limit info logging
• Improve log message clarity
• Fix false truncation detection
2025-09-24 18:03:11 +08:00
yangdx
2adb8efdc7 Add duplicate document detection and skip processed files in scanning
- Add get_doc_by_file_path to all storages
- Skip processed files in scan operation
- Check duplicates in upload endpoints
- Check duplicates in text insert APIs
- Return status info in duplicate responses
2025-09-23 17:30:54 +08:00
yangdx
6b3a341977 Increase default PostgreSQL max connections from 20 to 50 2025-09-22 18:11:28 +08:00
yangdx
040b0c8620 Fix Neo4J index creation to check state instead of analyzer
• Check index state not analyzer
• Skip if index is ONLINE
• Recreate if state not ONLINE
• Simplify recreation logic
2025-09-20 23:51:50 +08:00
yangdx
5da1df3b19 Fix linting 2025-09-20 15:30:27 +08:00
yangdx
8e2a1fa59e Enhance Neo4j fulltext search with Chinese language support
• Add CJK analyzer for Chinese text
• Auto-detect Chinese characters
• Recreate index if needed
• Separate Chinese/Latin search logic
• Improve fallback for Chinese queries
2025-09-20 15:19:22 +08:00
yangdx
9330ccb14e Fix graph truncation logging to correctly identify truncation cause 2025-09-20 13:33:19 +08:00
yangdx
1dd164a122 Fix graph truncation detection for depth-limited BFS
- Track unexplored neighbors at max depth
- Improve truncation flag accuracy
2025-09-20 13:12:25 +08:00
yangdx
3296bcb553 Add high-performance label search methods to PostgreSQL graph storage
- Add get_popular_labels() method
- Add search_labels() with fuzzy matching
- Use native SQL for better performance
- Include proper scoring and ranking
2025-09-20 12:39:53 +08:00
yangdx
6f85bd6b19 Add workspace-aware MongoDB indexing and Atlas Search support
• Add workspace attribute to storage classes
• Use workspace-specific index names
• Implement Atlas Search with fallbacks
• Add entity search and popular labels
• Improve index migration strategy
2025-09-20 12:38:41 +08:00
yangdx
223397a247 Add label search and popularity methods to MemgraphStorage
• Get popular labels by node degree
• Search labels with fuzzy matching
• Sort by relevance and connection count
2025-09-20 12:38:04 +08:00
yangdx
e14cee69a3 Fix Neo4j typo and add fulltext search with performance optimizations
- Fix NEO4J_DATABASE typo in env.example
- Add fulltext index for entity searches
- Implement get_popular_labels method
- Add search_labels with fuzzy matching
- Simplify B-Tree index creation logic
2025-09-20 12:37:13 +08:00
yangdx
9db8f2fce5 feat: Add popular labels and search APIs with history management
- Add popular/search label endpoints
- Implement SearchHistoryManager utility
- Replace client-side with server search
- Add graph data version tracking
- Update UI for better label discovery
2025-09-20 02:03:47 +08:00
yangdx
43f6fcea6c Fix linting 2025-09-12 17:00:53 +08:00