Commit graph

767 commits

Author SHA1 Message Date
yangdx
a04c11a598 Remove deprecated storage 2025-08-06 00:02:50 +08:00
yangdx
cc1f7118e7 Remove deprecated cache_by_modes functionality from all storage 2025-08-05 23:20:26 +08:00
yangdx
8294d6d1b7 Remove deprecated mode field from LLM cache schema
- Drop mode column from LLM cache table
- Update primary key to exclude mode
- Remove mode from all SQL queries
- Deprecate mode-related methods
- Update schema migration logic
2025-08-05 23:18:54 +08:00
yangdx
0463963520 fix: include all query parameters in LLM cache hash key generation
- Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions
- Add queryparam field to CacheData structure and PostgreSQL storage for debugging
- Update PostgreSQL schema with automatic migration for queryparam JSONB column
- Prevent incorrect cache hits between queries with different parameters

Fixes issue where different query parameters incorrectly shared the same cached results.
2025-08-05 18:03:10 +08:00
yangdx
7b3a9c09ca Fix: add missing colume to LLM cache of PostgreSQL implementation 2025-08-04 11:12:59 +08:00
yangdx
5513155808 Fix namespace tablename translate error
- Reorder namespace table map for PostgreSQL
- Ensure specific namespaces come first
2025-08-04 00:21:20 +08:00
yangdx
952d1feb07 feat: Add support for KV_STORE_FULL_ENTITIES and KV_STORE_FULL_RELATIONS namespaces in PGKVStorage
- Add LIGHTRAG_FULL_ENTITIES and LIGHTRAG_FULL_RELATIONS table schemas
- Implement complete CRUD operations for both namespaces
- Add automatic table creation and migration support
- Add SQL templates and namespace mappings
- Ensure workspace isolation and proper indexing
2025-08-03 22:54:56 +08:00
yangdx
d2dd137f83 feat: implement get_all_nodes and get_all_edges methods for graph storage backends
Add get_all_nodes() and get_all_edges() methods to Neo4JStorage, PGGraphStorage, MongoGraphStorage, and MemgraphStorage classes. These methods return all nodes and edges in the graph with consistent formatting matching NetworkXStorage for compatibility across different storage backends.
2025-08-03 11:02:37 +08:00
yangdx
091f2b42c3 feat(performance): Optimize document deletion with entity/relation index
- Introduces an index mapping documents to their corresponding entities and relations. This significantly speeds up `adelete_by_doc_id` by replacing slow graph traversal with a fast key-value lookup.
- Refactors the ingestion pipeline (`merge_nodes_and_edges`) to populate this new index. Adds a one-time data migration script to backfill the index for existing data.
2025-08-03 09:19:02 +08:00
yangdx
2f0aa7ed12 Optimize graph query by simplifying MATCH pattern
- Simplify MATCH clause to ()-[r]-()
- Remove node type constraints
- Improve query performance
2025-08-02 12:54:22 +08:00
yangdx
e00690b41b Fix: workspace isolation problem for json KV storage
- Use workspace+namespace as final_namespace identifier
- Update all related storage operations
- Maintain backward compatibility
2025-08-02 11:30:19 +08:00
yangdx
9a8f58826d fix: Add safe handling for missing file_path and metadata in PostgreSQL doc status functions
- Add null-safe file_path handling with "no-file-path" fallback in get_docs_by_status and get_docs_by_track_id
- Enhance metadata validation to ensure dict type after JSON parsing
- Align PostgreSQL implementation with JSON implementation safety patterns
- Prevent KeyError exceptions when database records have missing fields
2025-07-31 18:07:53 +08:00
yangdx
41de51a4db fix: add missing await in MongoDB get_all_status_counts aggregation
Resolves 'coroutine' object has no attribute 'to_list' error in document pagination endpoint by adding missing await keyword before self._data.aggregate() call.
2025-07-31 02:27:16 +08:00
yangdx
2af8a93dc7 fix: resolve _sort_key error in Redis get_docs_paginated function 2025-07-31 02:16:56 +08:00
yangdx
0eac1a883a Feat: add file path sorting for document manager
- Add file_path sorting support to all database backends (JSON, Redis, PostgreSQL, MongoDB)
- Implement smart column header switching between "ID" and "File Name" based on display mode
- Add automatic sort field switching when toggling between ID and file name display
- Create composite indexes for workspace+file_path in PostgreSQL and MongoDB for better query performance
- Update frontend to maintain sort state when switching display modes
- Add internationalization support for "fileName" in English and Chinese locales

This enhancement improves user experience by providing intuitive file-based sorting
while maintaining performance through optimized database indexes.
2025-07-30 18:46:55 +08:00
yangdx
74eecc46e5 feat(pagination): Implement document list pagination backends and frontend UI
- Add pagination support to BaseDocStatusStorage interface and all implementations (PostgreSQL, MongoDB, Redis, JSON)
- Implement RESTful API endpoints for paginated document queries and status counts
- Create reusable pagination UI components with internationalization support
- Optimize performance with database-level pagination and efficient in-memory processing
- Maintain backward compatibility while adding configurable page sizes (10-200 items)
2025-07-30 17:58:32 +08:00
yangdx
30f71c8acf Remove _id field and improve index handling in MongoDB
- Remove MongoDB _id field from documents
- Improve index existence check and creation
2025-07-30 04:17:26 +08:00
yangdx
cfb7117dd6 Fix track_id missing for query in PostgreSQL 2025-07-30 03:44:20 +08:00
yangdx
75de799353 Remove deprecated content field from doc status storage
- Remove content field from JSON storage
- Remove content field from MongoDB storage
- Remove content field from Redis storage
2025-07-30 01:00:06 +08:00
yangdx
93afa7d8a7 feat: add processing time tracking to document status with metadata field
- Add metadata field to DocProcessingStatus with start_time and end_time tracking
- Record processing timestamps using Unix time format (seconds precision)
- Update all storage backends (JSON, MongoDB, Redis, PostgreSQL) for new field support
- Maintain backward compatibility with default values for existing data
- Add error_msg field for better error tracking during document processing
2025-07-29 23:42:33 +08:00
yangdx
7206c07468 Remove deprecated content field from doc status
- Drop content column from LIGHTRAG_DOC_STATUS
- Clean up doc status handling code
- Maintain backward compatibility
2025-07-29 23:19:36 +08:00
yangdx
1e1adcb64a Add index on track_id column in doc status table of PostgreSQL 2025-07-29 23:03:09 +08:00
yangdx
6014b9bf73 feat: add track_id support for document processing progress monitoring
- Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON)
- Implement automatic track_id generation with upload_/insert_ prefixes
- Add /track_status/{track_id} API endpoint for frontend progress queries
- Create database indexes for efficient track_id lookups
- Enable real-time document processing status tracking across all storage types
2025-07-29 22:24:21 +08:00
yangdx
dafdf92715 Remove content fallback logic in get_docs_by_status from Redis 2025-07-29 19:13:07 +08:00
yangdx
92bbb7a1b3 Remove content fallback and standardize doc status handling
- Remove content_summary fallback logic
- Standardize doc status processing
- Handle missing file_path consistently
2025-07-29 16:13:51 +08:00
yangdx
24c36d876c Remove content field from DocProcessingStatus, update MongoDB and PostgreSQL implementation 2025-07-29 14:52:45 +08:00
yangdx
c8c3545454 refactor: extract file path length limit to shared constant
• Add DEFAULT_MAX_FILE_PATH_LENGTH constant
• Replace hardcoded 4090 in Milvus impl
2025-07-26 10:45:03 +08:00
xuewei
55e2678a1e Improve file_path FieldSchema 4090 2025-07-26 00:22:25 +08:00
yangdx
44b7ce222e feat: add default storage dependencies and optimize imports
- Add nano-vectordb and networkx to pyproject.toml dependencies
- Replace dynamic imports with direct imports for 4 default storage implementations
- Improve startup performance while maintaining backward compatibility
2025-07-24 16:14:26 +08:00
yangdx
f57ed21593 Merge branch 'main' into context-builder 2025-07-24 14:07:05 +08:00
yangdx
5574a30856 fix(postgres): handle ssl_mode="allow" in _create_ssl_context
Add "allow" to the list of recognized SSL modes in PostgreSQL connection helper. Previously, ssl_mode="allow" would fall through to "Unknown SSL mode" warning. Now it's properly handled alongside "require" and "prefer" modes.
2025-07-24 12:45:13 +08:00
yangdx
5a5d32dc32 Optimize logger message 2025-07-24 02:13:39 +08:00
yangdx
df8b4202f3 feat: Add SSL support for PostgreSQL database connections
- Add SSL configuration options (ssl_mode, ssl_cert, ssl_key, ssl_root_cert, ssl_crl)
- Support all PostgreSQL SSL modes (disable, allow, prefer, require, verify-ca, verify-full)
- Add SSL context creation with certificate validation
- Update initdb() method to handle SSL connection parameters
- Add SSL environment variables to env.example
- Maintain backward compatibility with existing non-SSL configurations
2025-07-21 02:03:06 +08:00
yangdx
19a38d9310 Feat: add PostgreSQL extensions for vector and AGE
- Ensure VECTOR extension is available when PostgreSQL init
- Ensure AGE extension is available when PGGraphStorage init
2025-07-21 01:46:41 +08:00
yangdx
2c7d2b3f5f Increase Neo4j connection pool size and timeouts
- Bump default connection pool size to 100
- Add new Neo4j timeout env variables to env.example
2025-07-19 13:27:34 +08:00
yangdx
9f5399c2f1 Replace tenacity retries with manual Memgraph transaction retries
- Implement manual retry logic
- Add exponential backoff with jitter
- Improve error handling for transient errors
2025-07-19 11:31:21 +08:00
yangdx
99e58ac752 fix: add retry mechanism for Memgraph transient errors
- Implement exponential backoff retry for transaction conflicts
- Add tenacity-based retry decorator with 5 attempts
- Handle TransientError in upsert_node and upsert_edge operations
- Resolve "Cannot resolve conflicting transactions" errors
- Improve system reliability under concurrent load
2025-07-19 10:34:35 +08:00
yangdx
96b94acc83 Enhance Redis connection handling with retries and timeouts
- Added Redis connection timeout configurations
- Implemented retry logic for Redis operations
- Updated error handling for timeout cases
- Improved connection pool management
- Added environment variable support
2025-07-19 10:15:26 +08:00
yangdx
f033fd6f87 fix(postgres): improve AGE agtype parsing and simplify error logging
- Fix JSON parsing errors caused by :: characters in data content
- Implement precise agtype string parsing using rfind() to separate JSON content from type identifiers
- Add robust error handling for malformed JSON in graph data
2025-07-18 08:50:47 +08:00
yangdx
14d9fe49b0 refactor(milvus): remove entity_type and weight fields from schema
- Remove entity_type field from entities collections
- Remove weight field from relationships collections
- Update schema definitions and index creation logic
- Maintain backward compatibility with existing data via dynamic fields
2025-07-17 12:08:35 +08:00
yangdx
7184c7b3ab fix: change default edge weight from 0.0 to 1.0 in entity extraction and graph storage
- Update extract_entities function in operate.py to use 1.0 as default weight
- Fix Neo4j implementation to use 1.0 instead of 0.0 for missing edge weights
- Fix Memgraph implementation to use 1.0 instead of 0.0 for missing edge weights
- Ensures consistent non-zero default weights across all graph storage backends
2025-07-17 11:30:49 +08:00
yangdx
57c8c19628 Add datetime format migration for doc status table 2025-07-16 22:21:51 +08:00
yangdx
c7b566f6d5 Fix cache migration MD5 error for PostgreSQL 2025-07-16 19:24:57 +08:00
yangdx
80f7e37168 Fix default workspace name for PostgreSQL AGE graph storage 2025-07-16 19:16:22 +08:00
yangdx
bab2803953 Optimize PostgreSQL database migrations for LLM cache
- Combine column migration into single operation
- Optimize LLM cache key migration query
- Improve migration error handling
- Add conflict detection for cache migration
2025-07-16 17:32:53 +08:00
yangdx
bd340fece6 Fix timestamp column migration comment typos
- Correct timezone-related comments
- Fix typo in debug log message
- Update migration success message
- Maintain same migration logic
2025-07-16 14:27:52 +08:00
yangdx
45d38fa083 Fix JSON error logging in Redis storage implementations 2025-07-16 01:35:07 +08:00
yangdx
6d66cde4ac Reorder query settings in web UI 2025-07-15 18:06:00 +08:00
yangdx
47341d3a71 Merge branch 'main' into rerank 2025-07-15 16:12:33 +08:00
Daniel.y
6d1260aafa
Merge pull request #1766 from HKUDS/fix-memgraph-max-nodes-issue
Fix Memgraph get_knowledge_graph issues
2025-07-15 16:07:04 +08:00