Commit graph

3445 commits

Author SHA1 Message Date
yangdx
78ad8873b8 Add cancellation check in delete loop 2025-10-24 14:47:20 +08:00
yangdx
743aefc655 Add pipeline cancellation feature for graceful processing termination
• Add cancel_pipeline API endpoint
• Implement PipelineCancelledException
• Add cancellation checks in main loop
• Handle task cancellation gracefully
• Mark cancelled docs as FAILED
2025-10-24 14:08:12 +08:00
yangdx
fdf0fe048b Bump API version to 0244 2025-10-22 23:39:02 +08:00
yangdx
0fa9a2eee3 Fix dimension type comparison in Milvus vector field validation
• Convert dimensions to int for comparison
• Handle string vs int type mismatches
2025-10-22 23:37:49 +08:00
yangdx
8dc23eeff2 Fix RayAnything compatible problem
• Use "preprocessed" to indicate multimodal processing is required
• Update DocProcessingStatus to process status convertion automatically
• Remove multimodal_processed from DocStatus enum value
• Update UI filter logic
2025-10-22 20:15:29 +08:00
yangdx
00aa5e53a7 Improve entity identifier truncation warning message format 2025-10-22 15:56:19 +08:00
Daniel.y
cf2174b9d7
Merge pull request #2245 from danielaskdd/entity-name-len
Refact: Add Entity Identifier Length Truncation to Prevent Storage Failures
2025-10-22 15:02:02 +08:00
Daniel.y
3ba1d75c97
Merge pull request #2243 from xiaojunxiang2023/main
fix(docs): correct typo "acivate" → "activate"
2025-10-22 14:39:00 +08:00
yangdx
904b1f46f9 Add entity name length truncation with configurable limit 2025-10-22 14:02:30 +08:00
Daniel.y
20edd32950
Merge pull request #2244 from danielaskdd/del-doc-cache
Feat: Add Optional LLM Cache Deletion for Document Deletion
2025-10-22 12:58:09 +08:00
yangdx
b76350a3bc Fix linting 2025-10-22 12:53:42 +08:00
yangdx
d7e2527e1a Handle cache deletion errors gracefully instead of raising exceptions 2025-10-22 12:53:19 +08:00
yangdx
1101562eaf Bump API version to 0243 2025-10-22 12:30:22 +08:00
yangdx
162370b6e6 Add optional LLM cache deletion when deleting documents
• Add delete_llm_cache parameter to API
• Collect cache IDs from text chunks
• Delete cache after graph operations
• Update UI with new checkbox option
• Add i18n translations for cache option
2025-10-22 12:19:23 +08:00
Daniel.y
d392db7b4a
Fix typo in 'equipment' in prompt.py 2025-10-22 11:13:22 +08:00
xiaojunxiang
04d9fe0293
Merge branch 'HKUDS:main' into main 2025-10-22 11:01:36 +08:00
xiaojunxiang
9e5004e24f fix(docs): correct typo "acivate" → "activate" 2025-10-22 03:00:47 +00:00
Daniel.y
907204714b
Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization
Optimize PostgreSQL initialization performance
2025-10-21 22:17:46 +08:00
yangdx
a809245aed Preserve file path order by using lists instead of sets 2025-10-21 18:57:54 +08:00
yangdx
fe890fca15 Improve formatting of limit method info in rebuild functions 2025-10-21 18:34:06 +08:00
yangdx
88a45523e2 Increase default max file paths from 30 to 100 and improve documentation
- Bump DEFAULT_MAX_FILE_PATHS to 100
- Add clarifying comment about display
2025-10-21 17:33:00 +08:00
yangdx
e5e16b7bd1 Fix Redis data migration error
• Use proper Redis connection context
• Fix namespace pattern for key scanning
• Propagate storage check exceptions
• Remove defensive error swallowing
2025-10-21 16:27:04 +08:00
yangdx
3ed2abd82c Improve logging to show source ID ratios when skipping entities/edges 2025-10-21 16:20:34 +08:00
yangdx
3ad616be4f Change default source IDs limit method from KEEP to FIFO 2025-10-21 16:12:11 +08:00
yangdx
80668aae22 Improve file path truncation labels and UI consistency
• Standardize FIFO/KEEP truncation labels
• Update UI truncation text format
2025-10-21 15:39:31 +08:00
yangdx
be3d274a0b Refactor node and edge merging logic with improved code structure
• Add numbered steps for clarity
• Improve early return handling
• Enhance file path limiting logic
2025-10-21 15:16:47 +08:00
yangdx
a5253244f9 Simplify skip logging and reduce pipeline status updates 2025-10-21 06:33:34 +08:00
yangdx
1248b3ab04 Increase default limits for source IDs and file paths in metadata
• Entity source IDs: 3 → 300
• Relation source IDs: 3 → 300
• File paths: 2 → 30
2025-10-21 05:30:09 +08:00
yangdx
cd1c48beaf Standardize placeholder format to use colon separator consistently 2025-10-21 05:03:57 +08:00
yangdx
1154c5683f Refactor deduplication calculation and remove unused variables 2025-10-21 04:41:15 +08:00
yangdx
665f60b90f Refactor entity/relation merge to consolidate VDB operations within functions
• Move VDB upserts into merge functions
• Fix early return data structure issues
• Update status messages (IGNORE_NEW → KEEP)
• Consolidate error handling paths
• Improve relationship content format
2025-10-21 03:19:34 +08:00
yangdx
e01c998ee9 Track placeholders in file paths for accurate source count display
• Add has_placeholder tracking variable
• Detect placeholder patterns in paths
• Show + sign for truncated counts
2025-10-20 23:48:04 +08:00
yangdx
637b850ec5 Add truncation indicator and update property labels in graph view
• Add truncate tooltip to source_id field
• Add visual truncation indicator (†)
• Bump API version to 0242
2025-10-20 23:03:01 +08:00
Yasiru Rangana
2f22336ace Optimize PostgreSQL initialization performance
- Batch index existence checks into single query (16+ queries -> 1 query)
- Batch timestamp column checks into single query (8 queries -> 1 query)
- Batch field length checks into single query (5 queries -> 1 query)

Performance improvement: ~70-80% faster initialization (35s -> 5-10s)

Key optimizations:
1. check_tables(): Use ANY($1) to check all indexes at once
2. _migrate_timestamp_columns(): Batch all column type checks
3. _migrate_field_lengths(): Batch all field definition checks

All changes are backward compatible with no schema or API changes.
Reduces database round-trips by batching information_schema queries.
2025-10-21 01:09:48 +11:00
yangdx
e0fd31a60d Fix logging message formatting 2025-10-20 22:09:09 +08:00
yangdx
a9fec26798 Add file path limit configuration for entities and relations
• Add MAX_FILE_PATHS env variable
• Implement file path count limiting
• Support KEEP/FIFO strategies
• Add truncation placeholder
• Remove old build_file_path function
2025-10-20 20:12:53 +08:00
yangdx
dc62c78f98 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
2025-10-20 15:24:15 +08:00
yangdx
bdadaa6750 Merge branch 'main' into limit-vdb-metadata-size 2025-10-18 11:47:10 +08:00
yangdx
c0f69395c7 Merge branch 'security/fix-sql-injection-postgres' 2025-10-18 11:45:13 +08:00
yangdx
813f4af9d7 Fix linting 2025-10-18 11:44:48 +08:00
yangdx
012aaada22 Update Swagger API key status description text 2025-10-18 09:40:44 +08:00
Lucky Verma
917e41aa78 Refactor SQL queries and improve input handling in PGKVStorage and PGDocStatusStorage 2025-10-17 15:40:32 -05:00
yangdx
03333d63f3 Merge branch 'main' into limit-vdb-metadata-size 2025-10-17 21:36:06 +08:00
yangdx
7bf9d1e8dc Bump API version to 0241 2025-10-17 21:19:03 +08:00
yangdx
f555824064 Fix tuple delimiter corruption handling in regex patterns 2025-10-17 18:43:45 +08:00
yangdx
46ac5dac53 Improve API description formatting and add ReDoc link 2025-10-17 16:24:01 +08:00
yangdx
9f49e56a44 Merge branch 'main' into feat-entity-size-caps 2025-10-17 15:59:44 +08:00
yangdx
c18762e34a Simplify Docker deployment documentation and improve clarity 2025-10-17 15:00:53 +08:00
yangdx
f45dce347a Fix cache control error of index.html
• Retrun no-cache for all HTML responses not just .html files
• Prevent force browser refresh action after front-end rebuild
2025-10-17 12:43:04 +08:00
yangdx
35cd567c9e Allow related chunks missing in knowledge graph queries 2025-10-17 00:19:30 +08:00