LightRAG

Author	SHA1	Message	Date
BukeLy	18a4870229	fix: Add default workspace support for backward compatibility Fixes two compatibility issues in workspace isolation: 1. Problem: lightrag_server.py calls initialize_pipeline_status() without workspace parameter, causing pipeline to initialize in global namespace instead of rag's workspace. Solution: Add set_default_workspace() mechanism in shared_storage. LightRAG.initialize_storages() now sets default workspace, which initialize_pipeline_status() uses when called without parameters. 2. Problem: /health endpoint hardcoded to use "pipeline_status", cannot return workspace-specific status or support frontend workspace selection. Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now extracts workspace from header or falls back to server default, returning correct workspace-specific pipeline status. Changes: - lightrag/kg/shared_storage.py: Add set/get_default_workspace() - lightrag/lightrag.py: Call set_default_workspace() in initialize_storages() - lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper, update /health endpoint to support LIGHTRAG-WORKSPACE header Testing: - Backward compatibility: Old code works without modification - Multi-instance safety: Explicit workspace passing preserved - /health endpoint: Supports both default and header-specified workspaces Related: #2353	2025-11-17 12:54:20 +08:00
BukeLy	eb52ec94d7	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed	2025-11-17 12:53:44 +08:00
yangdx	1a91bcdb5f	Improve storage config validation and add config.ini fallback support • Add MongoDB env requirements • Support config.ini fallback • Warn on missing env vars • Check available storage count • Show config source info	2025-11-08 22:48:49 +08:00
yangdx	3276b7a49d	Fix linting	2025-11-06 20:48:51 +08:00
yangdx	155f59759b	Fix node ID normalization and improve batch operation consistency • Remove premature ID normalization • Add lookup mapping for node resolution • Filter results by requested nodes only • Improve error logging with workspace	2025-11-06 20:34:53 +08:00
yangdx	807d2461d3	Remove unused chunk-based node/edge retrieval methods	2025-11-06 18:17:10 +08:00
yangdx	5f4a280458	Add Qdrant legacy collection migration with workspace support - Add QdrantMigrationError exception - Implement automatic data migration - Support workspace-based partitioning - Add migration verification logic - Update collection naming scheme	2025-10-30 19:16:33 +08:00
yangdx	f610fdaf9b	Merge branch 'main' into Anush008/main	2025-10-30 11:07:39 +08:00
yangdx	d5bcd14c6f	Refactor service deployment to use direct process execution - Remove bash wrapper script - Update systemd service configuration - Improve process management for gunicorn - Simplify shared storage cleanup logic - Update documentation for deployment	2025-10-29 18:55:47 +08:00
yangdx	6489aaa7f0	Remove worker_exit hook and improve cleanup logging • Remove unreliable worker_exit function • Add debug logs for cleanup modes • Move DEBUG_LOCKS to top of file	2025-10-29 15:15:13 +08:00
yangdx	72b29659c9	Fix worker process cleanup to prevent shared resource conflicts • Add worker_exit hook in gunicorn config • Add shutdown_manager parameter in finalize_share_data of share_storage • Prevent Manager shutdown in workers • Remove custom signal handlers	2025-10-29 13:33:21 +08:00
yangdx	0692175c7b	Remove enable_logging parameter from get_data_init_lock call in MilvusVectorDBStorage	2025-10-29 09:49:59 +08:00
yangdx	411e92e6b9	Fix vector deletion logging to show actual deleted count	2025-10-27 14:22:16 +08:00
Anush008	8584980e3a	refactor: Qdrant Multi-tenancy (Include staged) Signed-off-by: Anush008 <anushshetty90@gmail.com>	2025-10-26 09:58:24 +05:30
yangdx	a97e5dad4c	Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity • Replace Cypher with native SQL queries • Fix O(N²) to O(E) performance issue • Add error handling for parse failures • Use direct table access pattern • Eliminate Cartesian product joins	2025-10-25 14:37:18 +08:00
yangdx	083b163c1f	Improve lock logging with consistent messaging and debug levels	2025-10-25 11:04:21 +08:00
yangdx	a9ec15e669	Resolve lock leakage issue during user cancellation handling • Change default log level to INFO • Force enable error logging output • Add lock cleanup rollback protection • Handle LLM cache persistence errors • Fix async task exception handling	2025-10-25 03:06:45 +08:00
yangdx	0fa9a2eee3	Fix dimension type comparison in Milvus vector field validation • Convert dimensions to int for comparison • Handle string vs int type mismatches	2025-10-22 23:37:49 +08:00
Daniel.y	907204714b	Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization Optimize PostgreSQL initialization performance	2025-10-21 22:17:46 +08:00
yangdx	e5e16b7bd1	Fix Redis data migration error • Use proper Redis connection context • Fix namespace pattern for key scanning • Propagate storage check exceptions • Remove defensive error swallowing	2025-10-21 16:27:04 +08:00
Yasiru Rangana	2f22336ace	Optimize PostgreSQL initialization performance - Batch index existence checks into single query (16+ queries -> 1 query) - Batch timestamp column checks into single query (8 queries -> 1 query) - Batch field length checks into single query (5 queries -> 1 query) Performance improvement: ~70-80% faster initialization (35s -> 5-10s) Key optimizations: 1. check_tables(): Use ANY($1) to check all indexes at once 2. _migrate_timestamp_columns(): Batch all column type checks 3. _migrate_field_lengths(): Batch all field definition checks All changes are backward compatible with no schema or API changes. Reduces database round-trips by batching information_schema queries.	2025-10-21 01:09:48 +11:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	813f4af9d7	Fix linting	2025-10-18 11:44:48 +08:00
Lucky Verma	917e41aa78	Refactor SQL queries and improve input handling in PGKVStorage and PGDocStatusStorage	2025-10-17 15:40:32 -05:00
yangdx	baab992431	Update pymilvus dependency from 2.5.2 to >=2.6.2	2025-10-11 22:42:02 +08:00
yangdx	e1e4f1b02c	Fix get_by_ids to return None for missing records consistently	2025-10-11 13:34:26 +08:00
yangdx	9be22dd666	Preserve ordering in get_by_ids methods across all storage implementations - Fix result ordering in vector stores - Update KV storage get_by_ids methods - Maintain order in doc status storage - Return None for missing IDs	2025-10-11 12:37:59 +08:00
yangdx	b3ed264707	Refactor PostgreSQL retry config to use centralized configuration • Move retry config to ClientManager • Remove env var parsing from PostgreSQLDB • Add config params to test setup	2025-10-10 03:44:13 +08:00
yangdx	e758204ab2	Add PostgreSQL connection retry mechanism with comprehensive error handling • Implement connection retry with backoff • Add transient error detection • Pool management with timeout guards	2025-10-10 03:06:01 +08:00
yangdx	f1e0110716	Merge branch 'kevinnkansah/main'	2025-10-07 23:04:59 +08:00
yangdx	f2c0b41e78	Make PostgreSQL statement_cache_size configuration optional • Remove forced int conversion • Allow None values for cache size • Add conditional parameter setting	2025-10-07 22:57:21 +08:00
Aleks Vujić	dd8f44e621	Fixed typo in log message when creating new graph file	2025-10-07 14:30:05 +02:00
kevinnkansah	fdcb034da0	chore: distinguish settings	2025-10-06 12:01:40 +02:00
kevinnkansah	22a7b482c5	fix: renamed PostGreSQL options env variable and allowed LRU cache to be an optional env variable	2025-10-06 11:56:09 +02:00
kevinnkansah	d8a9617c0e	fix: fix: asyncpg bouncer connection pool error Prepared statement caching is disabled by setting `statement_cache_size=0` in the `asyncpg` connection pool parameters. This is necessary to prevent `asyncpg.exceptions.InvalidSQLStatementNameError` when using transaction-level connection poolers like Supabase Supavisor or pgbouncer, which do not support prepared statements.	2025-10-06 00:36:25 +02:00
kevinnkansah	108cdbe133	feat: add options for PostGres connection	2025-10-05 23:29:04 +02:00
yangdx	457d51952e	Add doc_name field to full docs storage - Store file_path in full_docs storage - Update PostgreSQL implementation by map file_path to doc_name - Other storage implementation automatically handles the new field	2025-10-05 11:44:27 +08:00
yangdx	f99c4a3738	Fix graph truncation logic for depth-limited traversals • Only set truncated flag for node limit • Keep depth limit info logging • Improve log message clarity • Fix false truncation detection	2025-09-24 18:03:11 +08:00
yangdx	2adb8efdc7	Add duplicate document detection and skip processed files in scanning - Add get_doc_by_file_path to all storages - Skip processed files in scan operation - Check duplicates in upload endpoints - Check duplicates in text insert APIs - Return status info in duplicate responses	2025-09-23 17:30:54 +08:00
yangdx	6b3a341977	Increase default PostgreSQL max connections from 20 to 50	2025-09-22 18:11:28 +08:00
yangdx	040b0c8620	Fix Neo4J index creation to check state instead of analyzer • Check index state not analyzer • Skip if index is ONLINE • Recreate if state not ONLINE • Simplify recreation logic	2025-09-20 23:51:50 +08:00
yangdx	5da1df3b19	Fix linting	2025-09-20 15:30:27 +08:00
yangdx	8e2a1fa59e	Enhance Neo4j fulltext search with Chinese language support • Add CJK analyzer for Chinese text • Auto-detect Chinese characters • Recreate index if needed • Separate Chinese/Latin search logic • Improve fallback for Chinese queries	2025-09-20 15:19:22 +08:00
yangdx	9330ccb14e	Fix graph truncation logging to correctly identify truncation cause	2025-09-20 13:33:19 +08:00
yangdx	1dd164a122	Fix graph truncation detection for depth-limited BFS - Track unexplored neighbors at max depth - Improve truncation flag accuracy	2025-09-20 13:12:25 +08:00
yangdx	3296bcb553	Add high-performance label search methods to PostgreSQL graph storage - Add get_popular_labels() method - Add search_labels() with fuzzy matching - Use native SQL for better performance - Include proper scoring and ranking	2025-09-20 12:39:53 +08:00
yangdx	6f85bd6b19	Add workspace-aware MongoDB indexing and Atlas Search support • Add workspace attribute to storage classes • Use workspace-specific index names • Implement Atlas Search with fallbacks • Add entity search and popular labels • Improve index migration strategy	2025-09-20 12:38:41 +08:00
yangdx	223397a247	Add label search and popularity methods to MemgraphStorage • Get popular labels by node degree • Search labels with fuzzy matching • Sort by relevance and connection count	2025-09-20 12:38:04 +08:00
yangdx	e14cee69a3	Fix Neo4j typo and add fulltext search with performance optimizations - Fix NEO4J_DATABASE typo in env.example - Add fulltext index for entity searches - Implement get_popular_labels method - Add search_labels with fuzzy matching - Simplify B-Tree index creation logic	2025-09-20 12:37:13 +08:00
yangdx	9db8f2fce5	feat: Add popular labels and search APIs with history management - Add popular/search label endpoints - Implement SearchHistoryManager utility - Replace client-side with server search - Add graph data version tracking - Update UI for better label discovery	2025-09-20 02:03:47 +08:00

1 2 3 4 5 ...

871 commits