Commit graph

2666 commits

Author SHA1 Message Date
yangdx
ccc2a20071 feat: remove deprecated MAX_TOKEN_SUMMARY parameter to prevent LLM output truncation
- Remove MAX_TOKEN_SUMMARY parameter and related configurations
- Eliminate forced token-based truncation in entity/relationship descriptions
- Switch to fragment-count based summarization logic using FORCE_LLM_SUMMARY_ON_MERGE
- Update FORCE_LLM_SUMMARY_ON_MERGE default from 6 to 4 for better summarization
- Clean up documentation, environment examples, and API display code
- Preserve backward compatibility by graceful parameter removal

This change resolves issues where LLMs were forcibly truncating entity relationship
descriptions mid-sentence, leading to incomplete and potentially inaccurate knowledge
graph content. The new approach allows LLMs to generate complete descriptions while
still providing summarization when multiple fragments need to be merged.

Breaking Change: None - parameter removal is backward compatible
Fixes: Entity relationship description truncation issues
2025-07-15 12:26:33 +08:00
yangdx
7e988158a9 Fix: Resolve timezone handling problem in PostgreSQL storage
- Changed timestamp columns to naive UTC
- Added datetime formatting utilities
- Updated SQL templates for timestamp extraction
- Simplified timestamp migration logic
2025-07-14 04:12:52 +08:00
yangdx
b03bb48e24 feat: Refine summary logic and add dedicated Ollama num_ctx config
- Refactor the trigger condition for LLM-based summarization of entities and relations. Instead of relying on character length, the summary is now triggered when the number of merged description fragments exceeds a configured threshold. This provides a more robust and logical condition for consolidation.
- Introduce the `OLLAMA_NUM_CTX` environment variable to explicitly configure the context window size (`num_ctx`) for Ollama models. This decouples the model's context length from the `MAX_TOKENS` parameter, which is now specifically used to limit input for summary generation, making the configuration clearer and more flexible.
- Updated `README` files, `env.example`, and default values to reflect these changes.
2025-07-14 01:55:04 +08:00
yangdx
e8b3dfcf90 Bump api verion to 0182 2025-07-14 00:29:48 +08:00
yangdx
157fb4c871 Increase field lengths for entity and file paths for PostgreSQL
- Expand entity_name length to 512 chars
- Increase source/target ID lengths
- Convert file_path to TEXT type
- Add migration logic
2025-07-14 00:24:54 +08:00
yangdx
187a623125 Increase max length limits for Milvus storage fields
- Extended entity_name max_length to 512
- Increased entity_type max_length to 128
- Expanded file_path limits to 1024
- Raised src_id/tgt_id limits to 512
2025-07-13 23:13:45 +08:00
yangdx
6730a89d7c Hotfix: Resolves connection pool bugs for Redis
- The previous implementation of the shared Redis connection pool had a critical issue where any Redis storage instance would disconnect the global shared pool upon closing. This caused `ConnectionError` exceptions for other instances still using the pool.
- This commit resolves the issue by introducing a reference counting mechanism in `RedisConnectionManager`.
2025-07-13 22:54:34 +08:00
yangdx
f185b3fb38 Optimize async task limits for graph processing
- Increased concurrency for graph operations
- Renamed variables for clarity
- Updated status messages
2025-07-13 21:51:19 +08:00
yangdx
85cd1178a1 fix: prevent premature lock cleanup in multiprocess mode
- Change cleanup condition from count == 1 to count == 0 to properly
remove reused locks from cleanup list
- Fix RuntimeError: Attempting to release lock for xxxx more times than it was acquired
2025-07-13 13:51:48 +08:00
yangdx
03b40937f7 Reduce embedding concurrency limit from 16 to 8 2025-07-13 03:13:52 +08:00
yangdx
a2eeae9661 Fixes incorrect cleanup count 2025-07-13 02:38:36 +08:00
yangdx
582e952020 Disable direct logging by default for shared storage module 2025-07-13 01:58:50 +08:00
yangdx
cbf544b3c1 Remvoe redundant log message 2025-07-13 01:51:30 +08:00
yangdx
e4aef36977 Update webui assets 2025-07-13 01:36:25 +08:00
yangdx
efc359c411 Update webui assets 2025-07-13 00:57:41 +08:00
yangdx
0e3aaa318f Feat: Add keyed lock cleanup and status monitoring 2025-07-13 00:09:00 +08:00
yangdx
e4bf4d19a0 Optimize knowledge graph rebuild with parallel processing
- Add parallel processing for KG rebuild
- Implement keyed locks for data consistency
2025-07-12 13:22:56 +08:00
yangdx
a85d7054d4 fix: move node existence check inside lock to prevent race condition
Move knowledge_graph_inst.has_node check inside get_storage_keyed_lock
in _merge_edges_then_upsert to ensure atomic check-then-act operations
and prevent duplicate node creation during concurrent updates.
2025-07-12 12:22:32 +08:00
yangdx
2ade3067f8 Refac: Generalize keyed lock with namespace support
Refactored the `KeyedUnifiedLock` to be generic and support dynamic namespaces. This decouples the locking mechanism from a specific "GraphDB" implementation, allowing it to be reused across different components and workspaces safely.

Key changes:
- `KeyedUnifiedLock` now takes a `namespace` parameter on lock acquisition.
- Renamed `_graph_db_lock_keyed` to a more generic _storage_keyed_lock`
- Replaced `get_graph_db_lock_keyed` with get_storage_keyed_lock` to support namespaces
2025-07-12 12:10:12 +08:00
yangdx
f2d875f8ab Update comments 2025-07-12 11:05:25 +08:00
yangdx
943ead8b1d Bump api version to 0181 2025-07-12 05:59:13 +08:00
yangdx
5ee509e671 Fix linting 2025-07-12 05:17:44 +08:00
yangdx
964293f21b Optimize lock cleanup with time tracking and intervals
- Add cleanup time tracking variables
- Implement minimum cleanup intervals
- Track earliest cleanup times
- Handle time rollback cases
- Improve cleanup logging
2025-07-12 04:34:26 +08:00
yangdx
39965d7ded Move merging stage back controled by max parallel insert semhore 2025-07-12 03:32:08 +08:00
yangdx
7490a18481 Optimize lock cleanup parameters 2025-07-12 03:10:03 +08:00
yangdx
3d8e6924bc Show lock clean up message 2025-07-12 02:58:05 +08:00
yangdx
22c36f2fd2 Optimize log messages 2025-07-12 02:41:31 +08:00
yangdx
a64c767298 optimize: improve lock cleanup performance with threshold-based strategy
- Add CLEANUP_THRESHOLD constant (100) to control cleanup frequency
- Modify _release_shared_raw_mp_lock to only scan when cleanup list exceeds threshold
- Modify _release_async_lock to only scan when cleanup list exceeds threshold
2025-07-11 23:43:40 +08:00
yangdx
ad99d9ba5a Improve code organization and comments 2025-07-11 22:13:02 +08:00
yangdx
c52c451cf7 Fix linting 2025-07-11 20:40:50 +08:00
yangdx
3afdd1b67c Fix initial count error for multi-process lock with key 2025-07-11 20:39:08 +08:00
yangdx
c47747da9e Merge branch 'main' into merge_lock_with_key 2025-07-11 16:37:10 +08:00
yangdx
ef4870fda5 Combined entity and edge processing tasks and optimize merging with semaphore 2025-07-11 16:34:54 +08:00
zrguo
b0479c078a fix process_chunks_unified() 2025-07-09 15:55:38 +08:00
yangdx
9aa2ed0837 Merge branch 'main' into rerank 2025-07-09 15:33:39 +08:00
zrguo
e1541caea9 Update webui setting 2025-07-09 12:10:06 +08:00
yangdx
207f0a7f2a Merge branch 'main' into merge_lock_with_key 2025-07-09 09:25:28 +08:00
yangdx
cb3bfc0e5b Release semphore before merge stage 2025-07-09 09:24:44 +08:00
Anton Vice
b192f8c9a3 Fix: Handle NoneType error when processing documents without a file path
The document processing pipeline would crash with a TypeError when a document was submitted as raw text via the API, as the file_path attribute would be None. This change adds a check to handle the None case gracefully, preventing the crash and allowing text-based documents to be indexed correctly.
2025-07-08 19:35:22 -03:00
yangdx
4705a22861 Bump core version to 1.4.0 2025-07-09 04:43:20 +08:00
yangdx
2056c3c809 Increase default CHUNK_TOP_K from 5 to 15 2025-07-09 04:41:51 +08:00
yangdx
e9c3503f77 Update logger info 2025-07-09 04:36:52 +08:00
yangdx
5d4484882a Merge branch 'main' into rerank 2025-07-09 03:59:04 +08:00
yangdx
14d51518dd Merge branch 'add-Memgraph-graph-db' into memgraph 2025-07-09 03:38:07 +08:00
DavIvek
08eb68b8ed run pre-commit 2025-07-08 20:21:20 +02:00
yangdx
75ce636084 Merge branch 'main' into add-Memgraph-graph-db 2025-07-09 02:09:35 +08:00
DavIvek
4438897b6b add changes based on review 2025-07-08 16:28:06 +02:00
zrguo
d4651d59c1 Add rerank to server 2025-07-08 21:44:20 +08:00
yangdx
b6ab69e25d Merge branch 'main' into fix-issue-1746 2025-07-08 18:20:02 +08:00
yangdx
2a0cff3ed6 Fix linting 2025-07-08 18:17:21 +08:00