yangdx
b5ae84fac6
fix: Add data consistency validation to document processing pipeline
...
- Add _validate_and_fix_document_consistency() method to detect and fix documents with missing content in full_docs storage
- Integrate consistency check into apipeline_process_enqueue_documents() to automatically mark inconsistent documents as FAILED before processing
- Prevent processing errors caused by documents having status records but missing actual content data
2025-08-14 06:18:34 +08:00
Daniel.y
dc76ae02d6
Merge pull request #1952 from danielaskdd/fix-pipeline
...
Fixes crash when processing files with UTF-8 encoding error
2025-08-14 05:33:08 +08:00
yangdx
fd0ae4646f
Fixes crash when processing files with UTF-8 encoding error
...
- Fix TypeError "cannot unpack non-iterable bool object" in document processing
- Change all error returns from `False` to `(False, "")` for consistency
- Ensure pipeline_enqueue_file always returns tuple (bool, str)
- Add missing return statement for no-content-extracted case
- Improve error handling for UTF-8 encoding issues and unsupported file types
2025-08-14 05:31:38 +08:00
yangdx
3ccd10f1e4
Update webui assets
2025-08-14 05:03:43 +08:00
yangdx
6969038fd5
Update mermaid version to 11.9.0
2025-08-14 05:02:53 +08:00
yangdx
160a40dc04
Bump api version to 0201
2025-08-14 05:02:20 +08:00
yangdx
ae517181ad
Bump api version to 0200
2025-08-14 05:01:13 +08:00
Daniel.y
2bbb19143a
Merge pull request #1951 from danielaskdd/main
...
Refac: uniformly protected with the get_data_init_lock for all storage initializations
2025-08-14 03:52:37 +08:00
yangdx
0b22ffb252
Refac: uniformly protected with the get_data_init_lock for all storage initializations
2025-08-14 03:46:19 +08:00
Daniel.y
1be1649f75
Merge pull request #1949 from danielaskdd/main
...
Fix: remove query params from cache key generation for keyword extraction
2025-08-14 03:09:09 +08:00
yangdx
7fb11193b0
Fix linting
2025-08-14 03:07:29 +08:00
yangdx
331dcf0509
Remove query params from cache key generation for keyword extration
2025-08-14 02:57:39 +08:00
yangdx
9a62101e9d
Add OpenAI frequency penalty sample env params
2025-08-14 02:57:23 +08:00
Daniel.y
5b0e26d9da
Merge pull request #1941 from HKUDS/add-final-namespace
...
Fix: Resolve workspace isolation issues across multiple storage implementations
2025-08-12 20:17:53 +08:00
Daniel.y
203e420b51
Merge pull request #1931 from danielaskdd/fix-first-stage-tasks-missing
...
Fix: Initialize first_stage_tasks and entity_relation_task to prevent empty-task cancel errors
2025-08-12 19:19:04 +08:00
yangdx
578bdaa410
Pin pymilvus version to 2.5.2 to avoid Protobuf version warning
2025-08-12 18:22:00 +08:00
yangdx
5d1bc8b49d
Relocate client creation to the initialize method to prevent race conditions in multi-process mode.
2025-08-12 18:20:56 +08:00
yangdx
74783d7781
Remove redundant debug logging for Qdrant operations
2025-08-12 17:29:05 +08:00
zrguo
f1c7233763
Avoid UTF-8 BOM
2025-08-12 17:06:54 +08:00
yangdx
41f8ef05b9
Restore thread safety to MongoDB client manager
...
- Protected client creation with lock
- Protected client release with lock
2025-08-12 16:42:53 +08:00
yangdx
0b2c3d06c7
- Remove redundant collection listing check
2025-08-12 15:24:06 +08:00
yangdx
fc8ca1a706
Fix: add muti-process lock for initialize and drop method for all storage
2025-08-12 04:25:09 +08:00
yangdx
ca00b9c8ee
Fix: Resolve workspace isolation problem for PostgreSQL with multiple LightRAG instances
2025-08-12 01:27:05 +08:00
yangdx
d9c1f935f5
Fix: Resolve workspace isolation issues in in-memory database with multiple LightRAG instances
2025-08-12 01:26:09 +08:00
yangdx
095e0cbfa2
Refac: Add workspace infomation to all logger output for all storage type
2025-08-12 01:19:09 +08:00
yangdx
44204abef7
Fix linting
2025-08-10 10:59:32 +08:00
yangdx
eb2320e556
Fix: Initialize first_stage_tasks and entity_relation_task to prevent empty-task cancel errors
...
- Initialize first_stage_tasks = [] and entity_relation_task = None at coroutine start
- Ensure cancel block safely handles no-op when tasks lists are empty
2025-08-10 10:45:41 +08:00
Daniel.y
f1c6a4ed94
Merge pull request #1928 from danielaskdd/main
...
Fix: Update OpenAI embedding handling for both list and base64 embeddings
2025-08-09 08:44:21 +08:00
yangdx
ffb642a5ce
Fix linting
2025-08-09 08:41:41 +08:00
yangdx
ecd7777e61
Update OpenAI embedding handling for both list and base64 embeddings
...
- Fix OpenAI embedding array parsing
- Improve embedding data type safety
2025-08-09 08:40:33 +08:00
yangdx
cf064579ce
Remove deprecated keyword extraction query methods
...
- Delete query_with_keywords function
- Remove kg_query_with_keywords helper
- Drop separate keyword extraction methods
2025-08-08 14:59:39 +08:00
yangdx
f5ac6a9f4b
Add default Ollama embedding context length
...
- Set default context length to 8192
- Overide the default context lenght for LLM in binding_options.py
2025-08-08 13:51:25 +08:00
yangdx
c2eefec707
Merge branch 'postgres-vector-index'
2025-08-08 03:01:34 +08:00
yangdx
16c9a81f4c
feat: support config.ini for PostgreSQL vector index settings
...
- Add support for reading vector_index_type, hnsw_m, hnsw_ef, and ivfflat_lists from config.ini
- Maintain backward compatibility with environment variables
- Update config.ini.example with new PostgreSQL vector index options
- Follow existing configuration priority: env vars > config.ini > defaults
2025-08-08 02:55:49 +08:00
yangdx
dec4148075
Merge branch 'main' into Matt23-star/main
2025-08-08 02:24:34 +08:00
yangdx
f38e10559e
Update PostgreSQL vector index configuration
...
- Remove FLAT index support
- Standardize on HNSW as default
- Add dimension validation
- Improve error logging
- Clean up index creation code
2025-08-08 02:21:06 +08:00
Daniel.y
2f289f6e25
Merge pull request #1924 from danielaskdd/neo4j-connection-lifetime
...
Refact:Enhanced Neo4j Connection Lifecycle Management
2025-08-08 01:16:42 +08:00
yangdx
f4ef254de2
fix(neo4j): enhance connection lifecycle management to prevent timeout errors
...
- Add max_connection_lifetime, liveness_check_timeout, keep_alive parameters
- Extend retry mechanisms for connection reset scenarios
- Update config examples with new Neo4j connection options
- Resolves ClientTimeoutException during data insertion operations
2025-08-08 01:07:45 +08:00
Daniel.y
c8a44f5657
Merge pull request #1923 from danielaskdd/fix-context-format
...
Fix: Unify document chunks context format in only_need_context query
2025-08-08 00:05:26 +08:00
yangdx
eded6d1187
Unify document chunks context format in only_need_context query
...
- Update Document Chunks label to include (DC) abbreviation
2025-08-08 00:02:53 +08:00
Matt23-star
727ca43d3c
feat: add vector index creation functionality for PostgreSQL
2025-08-07 23:07:18 +08:00
yangdx
7780776af6
Update env.example
2025-08-06 18:50:58 +08:00
Daniel.y
a6ef29cef6
Merge pull request #1915 from danielaskdd/optimize-llm-cache
...
Refact: Optimized LLM Cache Hash Key Generation by Including All Query Parameters
2025-08-06 01:04:02 +08:00
yangdx
2dab4e321d
Bump api version to 0199
2025-08-06 01:03:35 +08:00
yangdx
a04c11a598
Remove deprecated storage
2025-08-06 00:02:50 +08:00
yangdx
c22315ea6d
refactor: remove selective LLM cache clearing functionality
...
- Remove optional 'modes' parameter from aclear_cache() and clear_cache() methods
- Replace deprecated drop_cache_by_modes() with drop() method for complete cache clearing
- Update API endpoint to ignore mode-specific parameters and clear all cache
- Simplify frontend clearCache() function to send empty request body
This change ensures all LLM cache is cleared together.
2025-08-05 23:51:51 +08:00
yangdx
cc1f7118e7
Remove deprecated cache_by_modes functionality from all storage
2025-08-05 23:20:26 +08:00
yangdx
8294d6d1b7
Remove deprecated mode field from LLM cache schema
...
- Drop mode column from LLM cache table
- Update primary key to exclude mode
- Remove mode from all SQL queries
- Deprecate mode-related methods
- Update schema migration logic
2025-08-05 23:18:54 +08:00
yangdx
0b5c708660
Update storage implementation documentation
...
- Add detailed storage type descriptions
- Remove Chroma from vector storage options
- Include recommended PostgreSQL version
- Add Memgraph to graph storage options
- Update performance comparison notes
2025-08-05 18:03:51 +08:00
yangdx
0463963520
fix: include all query parameters in LLM cache hash key generation
...
- Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions
- Add queryparam field to CacheData structure and PostgreSQL storage for debugging
- Update PostgreSQL schema with automatic migration for queryparam JSONB column
- Prevent incorrect cache hits between queries with different parameters
Fixes issue where different query parameters incorrectly shared the same cached results.
2025-08-05 18:03:10 +08:00