LightRAG

Author	SHA1	Message	Date
chengjie	9e3c64df03	fix: critical bugs in workspace lock multiprocess synchronization Bug 1a - RuntimeError when _registry_guard is None: - Added explicit check for _registry_guard initialization - Now raises clear RuntimeError instead of cryptic TypeError - Helps users understand they need to call initialize_share_data() first Bug 1b - Workspace async_locks not visible across processes: - Created new _workspace_async_locks dict for per-process storage - Fixed issue where async_locks modifications in one process were invisible to others - This is correct design since asyncio.Lock objects cannot be pickled/shared Why per-process async_locks: - asyncio.Lock objects cannot be shared across processes - Each process needs its own asyncio.Lock instances for coroutine sync - Cross-process sync is handled by Manager.RLock() in _sync_locks - Within-process async sync is handled by per-process asyncio.Lock Testing: - All 17 existing workspace lock tests pass - Added 3 new tests specifically for bug verification - Total 20 tests passing Impact: - Fixes potential race conditions in multiprocess scenarios - Ensures proper synchronization both across and within processes - Maintains backward compatibility	2025-11-11 00:15:06 +08:00
chengjie	27de78113d	style: apply code formatting to pass pre-commit checks - Split long function calls across multiple lines - Split long function definitions across multiple lines - Add blank line after docstring in test function These changes are purely formatting to comply with the project's linting standards (black/ruff). No functional changes.	2025-11-11 00:10:54 +08:00
chengjie	5d31412bd7	feat: add workspace isolation support to unified lock functions Why this change is needed: The current locking system uses global locks shared across all users and workspaces, causing blocking issues in multi-tenant scenarios. When one tenant performs document indexing, all other tenants are blocked waiting for the same global lock. This severely limits the system's ability to serve multiple users concurrently. How it solves it: - Add optional `workspace` parameter to 5 lock functions - Implement lazy creation of workspace-specific locks with proper synchronization - Store workspace locks in new `_sync_locks` dictionary - Support both multi-process (RLock) and single-process (asyncio.Lock) modes - Empty workspace parameter uses global lock for backward compatibility - Extract common logic into `_get_workspace_lock()` to eliminate duplication Impact: - Enables concurrent operations across different workspaces - Foundation for PR2 (pipeline status isolation) - Zero impact on existing code (all parameters optional with defaults) - Each workspace now has independent lock instances - Thread-safe lazy creation using _registry_guard in multiprocess mode - Automatic creation of async_locks for workspace locks in multiprocess mode Code Quality Improvements (Linus review feedback): - Fixed race condition: lazy creation protected by _registry_guard - Eliminated code duplication: common logic extracted to _get_workspace_lock() - Added async_lock support: workspace locks now have companion async_locks - Handles None workspace parameter gracefully - Clear separation of concerns: one function handles all workspace logic Testing: - 17 new test cases covering: - Basic functionality and naming - Workspace isolation and independence - Backward compatibility with empty workspace - Concurrent operations (3 workspaces in parallel) - Performance (1000 workspace lock creation <2s) - Edge cases (special characters, unicode, long names) - All existing tests pass (21/21 excluding env issues) - Verified lock serialization within workspace - Verified lock independence across workspaces Files modified: - lightrag/kg/shared_storage.py: refactored lock functions + synchronization - tests/test_workspace_locks.py: comprehensive test suite	2025-11-10 22:51:49 +08:00
yangdx	1a91bcdb5f	Improve storage config validation and add config.ini fallback support • Add MongoDB env requirements • Support config.ini fallback • Warn on missing env vars • Check available storage count • Show config source info	2025-11-08 22:48:49 +08:00
yangdx	3276b7a49d	Fix linting	2025-11-06 20:48:51 +08:00
yangdx	155f59759b	Fix node ID normalization and improve batch operation consistency • Remove premature ID normalization • Add lookup mapping for node resolution • Filter results by requested nodes only • Improve error logging with workspace	2025-11-06 20:34:53 +08:00
yangdx	807d2461d3	Remove unused chunk-based node/edge retrieval methods	2025-11-06 18:17:10 +08:00
yangdx	5f4a280458	Add Qdrant legacy collection migration with workspace support - Add QdrantMigrationError exception - Implement automatic data migration - Support workspace-based partitioning - Add migration verification logic - Update collection naming scheme	2025-10-30 19:16:33 +08:00
yangdx	f610fdaf9b	Merge branch 'main' into Anush008/main	2025-10-30 11:07:39 +08:00
yangdx	d5bcd14c6f	Refactor service deployment to use direct process execution - Remove bash wrapper script - Update systemd service configuration - Improve process management for gunicorn - Simplify shared storage cleanup logic - Update documentation for deployment	2025-10-29 18:55:47 +08:00
yangdx	6489aaa7f0	Remove worker_exit hook and improve cleanup logging • Remove unreliable worker_exit function • Add debug logs for cleanup modes • Move DEBUG_LOCKS to top of file	2025-10-29 15:15:13 +08:00
yangdx	72b29659c9	Fix worker process cleanup to prevent shared resource conflicts • Add worker_exit hook in gunicorn config • Add shutdown_manager parameter in finalize_share_data of share_storage • Prevent Manager shutdown in workers • Remove custom signal handlers	2025-10-29 13:33:21 +08:00
yangdx	0692175c7b	Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage	2025-10-29 09:49:59 +08:00
yangdx	411e92e6b9	Fix vector deletion logging to show actual deleted count	2025-10-27 14:22:16 +08:00
Anush008	8584980e3a	refactor: Qdrant Multi-tenancy (Include staged) Signed-off-by: Anush008 <anushshetty90@gmail.com>	2025-10-26 09:58:24 +05:30
yangdx	a97e5dad4c	Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity • Replace Cypher with native SQL queries • Fix O(N²) to O(E) performance issue • Add error handling for parse failures • Use direct table access pattern • Eliminate Cartesian product joins	2025-10-25 14:37:18 +08:00
yangdx	083b163c1f	Improve lock logging with consistent messaging and debug levels	2025-10-25 11:04:21 +08:00
yangdx	a9ec15e669	Resolve lock leakage issue during user cancellation handling • Change default log level to INFO • Force enable error logging output • Add lock cleanup rollback protection • Handle LLM cache persistence errors • Fix async task exception handling	2025-10-25 03:06:45 +08:00
yangdx	0fa9a2eee3	Fix dimension type comparison in Milvus vector field validation • Convert dimensions to int for comparison • Handle string vs int type mismatches	2025-10-22 23:37:49 +08:00
Daniel.y	907204714b	Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization Optimize PostgreSQL initialization performance	2025-10-21 22:17:46 +08:00
yangdx	e5e16b7bd1	Fix Redis data migration error • Use proper Redis connection context • Fix namespace pattern for key scanning • Propagate storage check exceptions • Remove defensive error swallowing	2025-10-21 16:27:04 +08:00
Yasiru Rangana	2f22336ace	Optimize PostgreSQL initialization performance - Batch index existence checks into single query (16+ queries -> 1 query) - Batch timestamp column checks into single query (8 queries -> 1 query) - Batch field length checks into single query (5 queries -> 1 query) Performance improvement: ~70-80% faster initialization (35s -> 5-10s) Key optimizations: 1. check_tables(): Use ANY($1) to check all indexes at once 2. _migrate_timestamp_columns(): Batch all column type checks 3. _migrate_field_lengths(): Batch all field definition checks All changes are backward compatible with no schema or API changes. Reduces database round-trips by batching information_schema queries.	2025-10-21 01:09:48 +11:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	813f4af9d7	Fix linting	2025-10-18 11:44:48 +08:00
Lucky Verma	917e41aa78	Refactor SQL queries and improve input handling in PGKVStorage and PGDocStatusStorage	2025-10-17 15:40:32 -05:00
yangdx	baab992431	Update pymilvus dependency from 2.5.2 to >=2.6.2	2025-10-11 22:42:02 +08:00
yangdx	e1e4f1b02c	Fix get_by_ids to return None for missing records consistently	2025-10-11 13:34:26 +08:00
yangdx	9be22dd666	Preserve ordering in get_by_ids methods across all storage implementations - Fix result ordering in vector stores - Update KV storage get_by_ids methods - Maintain order in doc status storage - Return None for missing IDs	2025-10-11 12:37:59 +08:00
yangdx	b3ed264707	Refactor PostgreSQL retry config to use centralized configuration • Move retry config to ClientManager • Remove env var parsing from PostgreSQLDB • Add config params to test setup	2025-10-10 03:44:13 +08:00
yangdx	e758204ab2	Add PostgreSQL connection retry mechanism with comprehensive error handling • Implement connection retry with backoff • Add transient error detection • Pool management with timeout guards	2025-10-10 03:06:01 +08:00
yangdx	f1e0110716	Merge branch 'kevinnkansah/main'	2025-10-07 23:04:59 +08:00
yangdx	f2c0b41e78	Make PostgreSQL statement_cache_size configuration optional • Remove forced int conversion • Allow None values for cache size • Add conditional parameter setting	2025-10-07 22:57:21 +08:00
Aleks Vujić	dd8f44e621	Fixed typo in log message when creating new graph file	2025-10-07 14:30:05 +02:00
kevinnkansah	fdcb034da0	chore: distinguish settings	2025-10-06 12:01:40 +02:00
kevinnkansah	22a7b482c5	fix: renamed PostGreSQL options env variable and allowed LRU cache to be an optional env variable	2025-10-06 11:56:09 +02:00
kevinnkansah	d8a9617c0e	fix: fix: asyncpg bouncer connection pool error Prepared statement caching is disabled by setting `statement_cache_size=0` in the `asyncpg` connection pool parameters. This is necessary to prevent `asyncpg.exceptions.InvalidSQLStatementNameError` when using transaction-level connection poolers like Supabase Supavisor or pgbouncer, which do not support prepared statements.	2025-10-06 00:36:25 +02:00
kevinnkansah	108cdbe133	feat: add options for PostGres connection	2025-10-05 23:29:04 +02:00
yangdx	457d51952e	Add doc_name field to full docs storage - Store file_path in full_docs storage - Update PostgreSQL implementation by map file_path to doc_name - Other storage implementation automatically handles the new field	2025-10-05 11:44:27 +08:00
yangdx	f99c4a3738	Fix graph truncation logic for depth-limited traversals • Only set truncated flag for node limit • Keep depth limit info logging • Improve log message clarity • Fix false truncation detection	2025-09-24 18:03:11 +08:00
yangdx	2adb8efdc7	Add duplicate document detection and skip processed files in scanning - Add get_doc_by_file_path to all storages - Skip processed files in scan operation - Check duplicates in upload endpoints - Check duplicates in text insert APIs - Return status info in duplicate responses	2025-09-23 17:30:54 +08:00
yangdx	6b3a341977	Increase default PostgreSQL max connections from 20 to 50	2025-09-22 18:11:28 +08:00
yangdx	040b0c8620	Fix Neo4J index creation to check state instead of analyzer • Check index state not analyzer • Skip if index is ONLINE • Recreate if state not ONLINE • Simplify recreation logic	2025-09-20 23:51:50 +08:00
yangdx	5da1df3b19	Fix linting	2025-09-20 15:30:27 +08:00
yangdx	8e2a1fa59e	Enhance Neo4j fulltext search with Chinese language support • Add CJK analyzer for Chinese text • Auto-detect Chinese characters • Recreate index if needed • Separate Chinese/Latin search logic • Improve fallback for Chinese queries	2025-09-20 15:19:22 +08:00
yangdx	9330ccb14e	Fix graph truncation logging to correctly identify truncation cause	2025-09-20 13:33:19 +08:00
yangdx	1dd164a122	Fix graph truncation detection for depth-limited BFS - Track unexplored neighbors at max depth - Improve truncation flag accuracy	2025-09-20 13:12:25 +08:00
yangdx	3296bcb553	Add high-performance label search methods to PostgreSQL graph storage - Add get_popular_labels() method - Add search_labels() with fuzzy matching - Use native SQL for better performance - Include proper scoring and ranking	2025-09-20 12:39:53 +08:00
yangdx	6f85bd6b19	Add workspace-aware MongoDB indexing and Atlas Search support • Add workspace attribute to storage classes • Use workspace-specific index names • Implement Atlas Search with fallbacks • Add entity search and popular labels • Improve index migration strategy	2025-09-20 12:38:41 +08:00
yangdx	223397a247	Add label search and popularity methods to MemgraphStorage • Get popular labels by node degree • Search labels with fuzzy matching • Sort by relevance and connection count	2025-09-20 12:38:04 +08:00
yangdx	e14cee69a3	Fix Neo4j typo and add fulltext search with performance optimizations - Fix NEO4J_DATABASE typo in env.example - Add fulltext index for entity searches - Implement get_popular_labels method - Add search_labels with fuzzy matching - Simplify B-Tree index creation logic	2025-09-20 12:37:13 +08:00

1 2 3 4 5 ...

872 commits