Commit graph

3459 commits

Author SHA1 Message Date
yangdx
a3370b024d Add chunk tracking cleanup to entity/relation deletion and creation
• Clean up chunk storage on delete
• Track chunks in create operations
• Normalize relation keys consistently
2025-10-26 17:06:16 +08:00
yangdx
bf1897a67e Normalize entity order for undirected graph consistency
• Normalize entity pairs for storage
• Update API docs for undirected edges
2025-10-26 15:53:31 +08:00
yangdx
3fbd704bf9 Enhance entity/relation editing with chunk tracking synchronization
• Add chunk storage sync to edit ops
• Implement incremental chunk ID updates
• Support entity renaming migrations
• Normalize relation keys consistently
• Preserve chunk references on edits
2025-10-26 14:34:56 +08:00
yangdx
29bf593663 Fix entity and relation chunk cleanup in deletion pipeline
• Delete from entity_chunks storage
• Delete from relation_chunks storage
2025-10-25 22:32:27 +08:00
yangdx
5ee9a2f8c6 Fix entity consistency in knowledge graph rebuilding and merging
• Sort src/tgt for consistent ordering
• Create missing nodes before edges
• Update entity chunks storage
• Pass entity_vdb to rebuild function
• Ensure entities exist in all storages
2025-10-25 21:37:03 +08:00
yangdx
a97e5dad4c Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity
• Replace Cypher with native SQL queries
• Fix O(N²) to O(E) performance issue
• Add error handling for parse failures
• Use direct table access pattern
• Eliminate Cartesian product joins
2025-10-25 14:37:18 +08:00
yangdx
a9bc348446 Remove enable_logging parameter from data init lock call 2025-10-25 11:48:14 +08:00
Daniel.y
c82485d94d
Merge pull request #2253 from Mobious/main
Allow users to provide keywords with QueryRequest
2025-10-25 11:26:54 +08:00
yangdx
97a2ee4ef1 Rename rebuild function name and improve relationship logging format 2025-10-25 11:17:43 +08:00
yangdx
083b163c1f Improve lock logging with consistent messaging and debug levels 2025-10-25 11:04:21 +08:00
yangdx
3eb3a07544 Bump core version to 1.4.9.5 and API version to 0245 2025-10-25 04:23:57 +08:00
yangdx
a9ec15e669 Resolve lock leakage issue during user cancellation handling
• Change default log level to INFO
• Force enable error logging output
• Add lock cleanup rollback protection
• Handle LLM cache persistence errors
• Fix async task exception handling
2025-10-25 03:06:45 +08:00
yangdx
77336e50b6 Improve error handling and add cancellation checks in pipeline 2025-10-24 17:54:17 +08:00
yangdx
78ad8873b8 Add cancellation check in delete loop 2025-10-24 14:47:20 +08:00
yangdx
743aefc655 Add pipeline cancellation feature for graceful processing termination
• Add cancel_pipeline API endpoint
• Implement PipelineCancelledException
• Add cancellation checks in main loop
• Handle task cancellation gracefully
• Mark cancelled docs as FAILED
2025-10-24 14:08:12 +08:00
Mobious
f24a261613 Allow users to provide keywords with QueryRequest 2025-10-23 12:53:19 -10:00
yangdx
fdf0fe048b Bump API version to 0244 2025-10-22 23:39:02 +08:00
yangdx
0fa9a2eee3 Fix dimension type comparison in Milvus vector field validation
• Convert dimensions to int for comparison
• Handle string vs int type mismatches
2025-10-22 23:37:49 +08:00
yangdx
8dc23eeff2 Fix RayAnything compatible problem
• Use "preprocessed" to indicate multimodal processing is required
• Update DocProcessingStatus to process status convertion automatically
• Remove multimodal_processed from DocStatus enum value
• Update UI filter logic
2025-10-22 20:15:29 +08:00
yangdx
00aa5e53a7 Improve entity identifier truncation warning message format 2025-10-22 15:56:19 +08:00
Daniel.y
cf2174b9d7
Merge pull request #2245 from danielaskdd/entity-name-len
Refact: Add Entity Identifier Length Truncation to Prevent Storage Failures
2025-10-22 15:02:02 +08:00
Daniel.y
3ba1d75c97
Merge pull request #2243 from xiaojunxiang2023/main
fix(docs): correct typo "acivate" → "activate"
2025-10-22 14:39:00 +08:00
yangdx
904b1f46f9 Add entity name length truncation with configurable limit 2025-10-22 14:02:30 +08:00
Daniel.y
20edd32950
Merge pull request #2244 from danielaskdd/del-doc-cache
Feat: Add Optional LLM Cache Deletion for Document Deletion
2025-10-22 12:58:09 +08:00
yangdx
b76350a3bc Fix linting 2025-10-22 12:53:42 +08:00
yangdx
d7e2527e1a Handle cache deletion errors gracefully instead of raising exceptions 2025-10-22 12:53:19 +08:00
yangdx
1101562eaf Bump API version to 0243 2025-10-22 12:30:22 +08:00
yangdx
162370b6e6 Add optional LLM cache deletion when deleting documents
• Add delete_llm_cache parameter to API
• Collect cache IDs from text chunks
• Delete cache after graph operations
• Update UI with new checkbox option
• Add i18n translations for cache option
2025-10-22 12:19:23 +08:00
Daniel.y
d392db7b4a
Fix typo in 'equipment' in prompt.py 2025-10-22 11:13:22 +08:00
xiaojunxiang
04d9fe0293
Merge branch 'HKUDS:main' into main 2025-10-22 11:01:36 +08:00
xiaojunxiang
9e5004e24f fix(docs): correct typo "acivate" → "activate" 2025-10-22 03:00:47 +00:00
Daniel.y
907204714b
Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization
Optimize PostgreSQL initialization performance
2025-10-21 22:17:46 +08:00
yangdx
a809245aed Preserve file path order by using lists instead of sets 2025-10-21 18:57:54 +08:00
yangdx
fe890fca15 Improve formatting of limit method info in rebuild functions 2025-10-21 18:34:06 +08:00
yangdx
88a45523e2 Increase default max file paths from 30 to 100 and improve documentation
- Bump DEFAULT_MAX_FILE_PATHS to 100
- Add clarifying comment about display
2025-10-21 17:33:00 +08:00
yangdx
e5e16b7bd1 Fix Redis data migration error
• Use proper Redis connection context
• Fix namespace pattern for key scanning
• Propagate storage check exceptions
• Remove defensive error swallowing
2025-10-21 16:27:04 +08:00
yangdx
3ed2abd82c Improve logging to show source ID ratios when skipping entities/edges 2025-10-21 16:20:34 +08:00
yangdx
3ad616be4f Change default source IDs limit method from KEEP to FIFO 2025-10-21 16:12:11 +08:00
yangdx
80668aae22 Improve file path truncation labels and UI consistency
• Standardize FIFO/KEEP truncation labels
• Update UI truncation text format
2025-10-21 15:39:31 +08:00
yangdx
be3d274a0b Refactor node and edge merging logic with improved code structure
• Add numbered steps for clarity
• Improve early return handling
• Enhance file path limiting logic
2025-10-21 15:16:47 +08:00
yangdx
a5253244f9 Simplify skip logging and reduce pipeline status updates 2025-10-21 06:33:34 +08:00
yangdx
1248b3ab04 Increase default limits for source IDs and file paths in metadata
• Entity source IDs: 3 → 300
• Relation source IDs: 3 → 300
• File paths: 2 → 30
2025-10-21 05:30:09 +08:00
yangdx
cd1c48beaf Standardize placeholder format to use colon separator consistently 2025-10-21 05:03:57 +08:00
yangdx
1154c5683f Refactor deduplication calculation and remove unused variables 2025-10-21 04:41:15 +08:00
yangdx
665f60b90f Refactor entity/relation merge to consolidate VDB operations within functions
• Move VDB upserts into merge functions
• Fix early return data structure issues
• Update status messages (IGNORE_NEW → KEEP)
• Consolidate error handling paths
• Improve relationship content format
2025-10-21 03:19:34 +08:00
yangdx
e01c998ee9 Track placeholders in file paths for accurate source count display
• Add has_placeholder tracking variable
• Detect placeholder patterns in paths
• Show + sign for truncated counts
2025-10-20 23:48:04 +08:00
yangdx
637b850ec5 Add truncation indicator and update property labels in graph view
• Add truncate tooltip to source_id field
• Add visual truncation indicator (†)
• Bump API version to 0242
2025-10-20 23:03:01 +08:00
Yasiru Rangana
2f22336ace Optimize PostgreSQL initialization performance
- Batch index existence checks into single query (16+ queries -> 1 query)
- Batch timestamp column checks into single query (8 queries -> 1 query)
- Batch field length checks into single query (5 queries -> 1 query)

Performance improvement: ~70-80% faster initialization (35s -> 5-10s)

Key optimizations:
1. check_tables(): Use ANY($1) to check all indexes at once
2. _migrate_timestamp_columns(): Batch all column type checks
3. _migrate_field_lengths(): Batch all field definition checks

All changes are backward compatible with no schema or API changes.
Reduces database round-trips by batching information_schema queries.
2025-10-21 01:09:48 +11:00
yangdx
e0fd31a60d Fix logging message formatting 2025-10-20 22:09:09 +08:00
yangdx
a9fec26798 Add file path limit configuration for entities and relations
• Add MAX_FILE_PATHS env variable
• Implement file path count limiting
• Support KEEP/FIFO strategies
• Add truncation placeholder
• Remove old build_file_path function
2025-10-20 20:12:53 +08:00