Commit graph

790 commits

Author SHA1 Message Date
yangdx
2a781dfb91 Update Neo4j database naming in env.example 2025-08-15 19:14:38 +08:00
yangdx
3a227e37b8 Add get_vectors_by_ids method to MongoVectorDBStorage 2025-08-15 16:53:14 +08:00
yangdx
7a7385a200 Add efficient vector retrieval by IDs to PGVectorStorage 2025-08-15 16:51:41 +08:00
yangdx
8f7031b882 Add get_vectors_by_ids method to QdrantVectorDBStorage 2025-08-15 16:46:52 +08:00
yangdx
a71499a180 Add get_vectors_by_ids method to MilvusVectorDBStorage 2025-08-15 16:36:50 +08:00
yangdx
1e2d5252d7 Add get_vectors_by_ids method and filter out vector data from query results 2025-08-15 16:32:26 +08:00
yangdx
f85e2aa4bf Merge branch 'main' into pick-trunk-by-vector 2025-08-14 03:54:26 +08:00
yangdx
0b22ffb252 Refac: uniformly protected with the get_data_init_lock for all storage initializations 2025-08-14 03:46:19 +08:00
yangdx
f1dafa0d01 feat: KG related chunks selection by vector similarity
- Add env switch to toggle weighted polling vs vector-similarity strategy
- Implement similarity-based sorting with fallback to weighted
- Introduce batch vector read API for vector storage
- Implement vector store and retrive funtion for Nanovector DB
- Preserve default behavior (weighted polling selection method)
2025-08-13 18:16:42 +08:00
yangdx
578bdaa410 Pin pymilvus version to 2.5.2 to avoid Protobuf version warning 2025-08-12 18:22:00 +08:00
yangdx
5d1bc8b49d Relocate client creation to the initialize method to prevent race conditions in multi-process mode. 2025-08-12 18:20:56 +08:00
yangdx
74783d7781 Remove redundant debug logging for Qdrant operations 2025-08-12 17:29:05 +08:00
yangdx
41f8ef05b9 Restore thread safety to MongoDB client manager
- Protected client creation with lock
- Protected client release with lock
2025-08-12 16:42:53 +08:00
yangdx
0b2c3d06c7 - Remove redundant collection listing check 2025-08-12 15:24:06 +08:00
yangdx
fc8ca1a706 Fix: add muti-process lock for initialize and drop method for all storage 2025-08-12 04:25:09 +08:00
yangdx
ca00b9c8ee Fix: Resolve workspace isolation problem for PostgreSQL with multiple LightRAG instances 2025-08-12 01:27:05 +08:00
yangdx
d9c1f935f5 Fix: Resolve workspace isolation issues in in-memory database with multiple LightRAG instances 2025-08-12 01:26:09 +08:00
yangdx
095e0cbfa2 Refac: Add workspace infomation to all logger output for all storage type 2025-08-12 01:19:09 +08:00
yangdx
16c9a81f4c feat: support config.ini for PostgreSQL vector index settings
- Add support for reading vector_index_type, hnsw_m, hnsw_ef, and ivfflat_lists from config.ini
- Maintain backward compatibility with environment variables
- Update config.ini.example with new PostgreSQL vector index options
- Follow existing configuration priority: env vars > config.ini > defaults
2025-08-08 02:55:49 +08:00
yangdx
dec4148075 Merge branch 'main' into Matt23-star/main 2025-08-08 02:24:34 +08:00
yangdx
f38e10559e Update PostgreSQL vector index configuration
- Remove FLAT index support
- Standardize on HNSW as default
- Add dimension validation
- Improve error logging
- Clean up index creation code
2025-08-08 02:21:06 +08:00
yangdx
f4ef254de2 fix(neo4j): enhance connection lifecycle management to prevent timeout errors
- Add max_connection_lifetime, liveness_check_timeout, keep_alive parameters
- Extend retry mechanisms for connection reset scenarios
- Update config examples with new Neo4j connection options
- Resolves ClientTimeoutException during data insertion operations
2025-08-08 01:07:45 +08:00
Matt23-star
727ca43d3c feat: add vector index creation functionality for PostgreSQL 2025-08-07 23:07:18 +08:00
yangdx
a04c11a598 Remove deprecated storage 2025-08-06 00:02:50 +08:00
yangdx
cc1f7118e7 Remove deprecated cache_by_modes functionality from all storage 2025-08-05 23:20:26 +08:00
yangdx
8294d6d1b7 Remove deprecated mode field from LLM cache schema
- Drop mode column from LLM cache table
- Update primary key to exclude mode
- Remove mode from all SQL queries
- Deprecate mode-related methods
- Update schema migration logic
2025-08-05 23:18:54 +08:00
yangdx
0463963520 fix: include all query parameters in LLM cache hash key generation
- Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions
- Add queryparam field to CacheData structure and PostgreSQL storage for debugging
- Update PostgreSQL schema with automatic migration for queryparam JSONB column
- Prevent incorrect cache hits between queries with different parameters

Fixes issue where different query parameters incorrectly shared the same cached results.
2025-08-05 18:03:10 +08:00
yangdx
7b3a9c09ca Fix: add missing colume to LLM cache of PostgreSQL implementation 2025-08-04 11:12:59 +08:00
yangdx
5513155808 Fix namespace tablename translate error
- Reorder namespace table map for PostgreSQL
- Ensure specific namespaces come first
2025-08-04 00:21:20 +08:00
yangdx
952d1feb07 feat: Add support for KV_STORE_FULL_ENTITIES and KV_STORE_FULL_RELATIONS namespaces in PGKVStorage
- Add LIGHTRAG_FULL_ENTITIES and LIGHTRAG_FULL_RELATIONS table schemas
- Implement complete CRUD operations for both namespaces
- Add automatic table creation and migration support
- Add SQL templates and namespace mappings
- Ensure workspace isolation and proper indexing
2025-08-03 22:54:56 +08:00
yangdx
d2dd137f83 feat: implement get_all_nodes and get_all_edges methods for graph storage backends
Add get_all_nodes() and get_all_edges() methods to Neo4JStorage, PGGraphStorage, MongoGraphStorage, and MemgraphStorage classes. These methods return all nodes and edges in the graph with consistent formatting matching NetworkXStorage for compatibility across different storage backends.
2025-08-03 11:02:37 +08:00
yangdx
091f2b42c3 feat(performance): Optimize document deletion with entity/relation index
- Introduces an index mapping documents to their corresponding entities and relations. This significantly speeds up `adelete_by_doc_id` by replacing slow graph traversal with a fast key-value lookup.
- Refactors the ingestion pipeline (`merge_nodes_and_edges`) to populate this new index. Adds a one-time data migration script to backfill the index for existing data.
2025-08-03 09:19:02 +08:00
yangdx
2f0aa7ed12 Optimize graph query by simplifying MATCH pattern
- Simplify MATCH clause to ()-[r]-()
- Remove node type constraints
- Improve query performance
2025-08-02 12:54:22 +08:00
yangdx
e00690b41b Fix: workspace isolation problem for json KV storage
- Use workspace+namespace as final_namespace identifier
- Update all related storage operations
- Maintain backward compatibility
2025-08-02 11:30:19 +08:00
yangdx
9a8f58826d fix: Add safe handling for missing file_path and metadata in PostgreSQL doc status functions
- Add null-safe file_path handling with "no-file-path" fallback in get_docs_by_status and get_docs_by_track_id
- Enhance metadata validation to ensure dict type after JSON parsing
- Align PostgreSQL implementation with JSON implementation safety patterns
- Prevent KeyError exceptions when database records have missing fields
2025-07-31 18:07:53 +08:00
yangdx
41de51a4db fix: add missing await in MongoDB get_all_status_counts aggregation
Resolves 'coroutine' object has no attribute 'to_list' error in document pagination endpoint by adding missing await keyword before self._data.aggregate() call.
2025-07-31 02:27:16 +08:00
yangdx
2af8a93dc7 fix: resolve _sort_key error in Redis get_docs_paginated function 2025-07-31 02:16:56 +08:00
yangdx
0eac1a883a Feat: add file path sorting for document manager
- Add file_path sorting support to all database backends (JSON, Redis, PostgreSQL, MongoDB)
- Implement smart column header switching between "ID" and "File Name" based on display mode
- Add automatic sort field switching when toggling between ID and file name display
- Create composite indexes for workspace+file_path in PostgreSQL and MongoDB for better query performance
- Update frontend to maintain sort state when switching display modes
- Add internationalization support for "fileName" in English and Chinese locales

This enhancement improves user experience by providing intuitive file-based sorting
while maintaining performance through optimized database indexes.
2025-07-30 18:46:55 +08:00
yangdx
74eecc46e5 feat(pagination): Implement document list pagination backends and frontend UI
- Add pagination support to BaseDocStatusStorage interface and all implementations (PostgreSQL, MongoDB, Redis, JSON)
- Implement RESTful API endpoints for paginated document queries and status counts
- Create reusable pagination UI components with internationalization support
- Optimize performance with database-level pagination and efficient in-memory processing
- Maintain backward compatibility while adding configurable page sizes (10-200 items)
2025-07-30 17:58:32 +08:00
yangdx
30f71c8acf Remove _id field and improve index handling in MongoDB
- Remove MongoDB _id field from documents
- Improve index existence check and creation
2025-07-30 04:17:26 +08:00
yangdx
cfb7117dd6 Fix track_id missing for query in PostgreSQL 2025-07-30 03:44:20 +08:00
yangdx
75de799353 Remove deprecated content field from doc status storage
- Remove content field from JSON storage
- Remove content field from MongoDB storage
- Remove content field from Redis storage
2025-07-30 01:00:06 +08:00
yangdx
93afa7d8a7 feat: add processing time tracking to document status with metadata field
- Add metadata field to DocProcessingStatus with start_time and end_time tracking
- Record processing timestamps using Unix time format (seconds precision)
- Update all storage backends (JSON, MongoDB, Redis, PostgreSQL) for new field support
- Maintain backward compatibility with default values for existing data
- Add error_msg field for better error tracking during document processing
2025-07-29 23:42:33 +08:00
yangdx
7206c07468 Remove deprecated content field from doc status
- Drop content column from LIGHTRAG_DOC_STATUS
- Clean up doc status handling code
- Maintain backward compatibility
2025-07-29 23:19:36 +08:00
yangdx
1e1adcb64a Add index on track_id column in doc status table of PostgreSQL 2025-07-29 23:03:09 +08:00
yangdx
6014b9bf73 feat: add track_id support for document processing progress monitoring
- Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON)
- Implement automatic track_id generation with upload_/insert_ prefixes
- Add /track_status/{track_id} API endpoint for frontend progress queries
- Create database indexes for efficient track_id lookups
- Enable real-time document processing status tracking across all storage types
2025-07-29 22:24:21 +08:00
yangdx
dafdf92715 Remove content fallback logic in get_docs_by_status from Redis 2025-07-29 19:13:07 +08:00
yangdx
92bbb7a1b3 Remove content fallback and standardize doc status handling
- Remove content_summary fallback logic
- Standardize doc status processing
- Handle missing file_path consistently
2025-07-29 16:13:51 +08:00
yangdx
24c36d876c Remove content field from DocProcessingStatus, update MongoDB and PostgreSQL implementation 2025-07-29 14:52:45 +08:00
yangdx
c8c3545454 refactor: extract file path length limit to shared constant
• Add DEFAULT_MAX_FILE_PATH_LENGTH constant
• Replace hardcoded 4090 in Milvus impl
2025-07-26 10:45:03 +08:00