yangdx
5885637ebf
Add specialized JSON string sanitizer to prevent UTF-8 encoding errors
...
• Remove surrogate characters (U+D800-DFFF)
• Filter Unicode non-characters
• Direct char-by-char filtering
2025-11-17 12:54:32 +08:00
yangdx
23cbb9c9b2
Add data sanitization to JSON writing to prevent UTF-8 encoding errors
...
• Add _sanitize_json_data helper function
• Recursively clean strings in data
• Sanitize before JSON serialization
• Prevent encoding-related crashes
• Use existing sanitize_text_for_encoding
2025-11-17 12:54:32 +08:00
yangdx
c14f25b7f8
Add mandatory dimension parameter handling for Jina API compliance
2025-11-07 21:08:34 +08:00
yangdx
33a1482f7f
Add optional embedding dimension parameter control via env var
...
* Add EMBEDDING_SEND_DIM environment variable
* Update Jina/OpenAI embed functions
* Add send_dimensions to EmbeddingFunc
* Auto-inject embedding_dim when enabled
* Add parameter validation warnings
2025-11-07 20:46:40 +08:00
yangdx
5f49cee20f
Merge branch 'main' into VOXWAVE-FOUNDRY/main
2025-11-06 15:37:35 +08:00
yangdx
3fbd704bf9
Enhance entity/relation editing with chunk tracking synchronization
...
• Add chunk storage sync to edit ops
• Implement incremental chunk ID updates
• Support entity renaming migrations
• Normalize relation keys consistently
• Preserve chunk references on edits
2025-10-26 14:34:56 +08:00
yangdx
a9fec26798
Add file path limit configuration for entities and relations
...
• Add MAX_FILE_PATHS env variable
• Implement file path count limiting
• Support KEEP/FIFO strategies
• Add truncation placeholder
• Remove old build_file_path function
2025-10-20 20:12:53 +08:00
Humphry
0b3d31507e
extended to use gemini, sswitched to use gemini-flash-latest
2025-10-20 13:17:16 +03:00
yangdx
dc62c78f98
Add entity/relation chunk tracking with configurable source ID limits
...
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
2025-10-20 15:24:15 +08:00
yangdx
03333d63f3
Merge branch 'main' into limit-vdb-metadata-size
2025-10-17 21:36:06 +08:00
yangdx
f555824064
Fix tuple delimiter corruption handling in regex patterns
2025-10-17 18:43:45 +08:00
DivinesLight
c06522b927
Get max source Id config from .env and lightRAG init
2025-10-15 18:24:38 +05:00
haseebuchiha
d52c3377b4
Import from env and use default if none and removed useless import
2025-10-14 16:14:03 +05:00
DivinesLight
54f0a7d1ca
Quick fix to limit source_id ballooning while inserting nodes
2025-10-14 14:47:04 +05:00
NeelM0906
f6d1fb98ac
Fix Linting errors
2025-10-09 16:52:22 -04:00
yangdx
a528213210
Fix logging filter logic
...
• Fix boolean operator precedence in filter
• Consolidate GET/POST condition logic
2025-09-26 19:42:33 +08:00
yangdx
8cd4139cbf
refactor: fix double query problem by add aquery_llm function for consistent response handling
...
- Add new aquery_llm/query_llm methods providing structured responses
- Consolidate /query and /query/stream endpoints to use unified aquery_llm
- Optimize cache handling by moving cache checks before LLM calls
2025-09-26 19:05:03 +08:00
yangdx
5eb4a4b799
feat: simplify citations, add reference merging, and restructure API response format
2025-09-24 14:30:10 +08:00
yangdx
6e2eab5c23
Add ID fields to entities, relations, and chunks in raw data query results
2025-09-21 23:31:35 +08:00
yangdx
18e886d7e9
Improve context item identification with meaningful IDs
...
- Add EN prefix to entitie IDs
- Add RE prefix to relation IDs
-Add DC prefix chunk IDs
- Enhance traceability across contexts
2025-09-21 20:19:14 +08:00
yangdx
37d01e2df8
fix: Ensures complete metadata (source_id, created_at, file_path) is preserved in aquery_data responses
2025-09-15 03:45:09 +08:00
yangdx
e71229698d
refactor: centralize metadata generation in query functions
...
- Remove processing_info generation from _convert_to_user_format function
- Move all metadata generation (keywords, processing_info) to kg_query and naive_query functions
- Simplify _convert_to_user_format to focus only on data format conversion
2025-09-15 03:11:07 +08:00
yangdx
b1c8206346
Add aquery_data endpoint for structured retrieval without LLM generation
...
- Add QueryDataResponse model
- Implement /query/data endpoint
- Add aquery_data method to LightRAG
- Return entities, relationships, chunks
2025-09-15 02:15:14 +08:00
yangdx
82a67354d0
Code formatting improvements and style consistency fixes
...
* Remove trailing whitespace
* Fix function signature ellipsis style
2025-09-14 17:49:02 +08:00
yangdx
87bb8a023b
Fix tuple delimiter regex patterns and add debug logging
...
- Add debug logs for malformed records
- Fix regex for consecutive delimiters
- Handle missing closing brackets
2025-09-14 17:29:27 +08:00
yangdx
70fee5bbeb
Fix syntax warning by removin examples from fix_tuple_delimiter_corruption docstring
2025-09-14 12:37:21 +08:00
yangdx
20c5127c7c
Merge branch 'optimize-extraction' into return-data-only
2025-09-14 12:33:37 +08:00
yangdx
ff705a2323
Fix tuple delimiter corruption when missing closing bracket, Handle <|#: -> <|#|> pattern
2025-09-14 11:44:21 +08:00
yangdx
1dc96f3959
Merge branch 'optimize-extraction' into return-data-only
2025-09-14 05:37:48 +08:00
yangdx
2686fc526e
Change entity type from CreativeWork to Content and update delimiter
...
• Replace CreativeWork with Content type
• Improve LLM output error messages
• Update prompt for binary relationships
• Fix delimiter corruption examples
2025-09-14 00:55:15 +08:00
yangdx
0ffb5d5f2d
Replace search API with aquery_data for consistent raw data retrieval, mirroring aquery results
...
• Reuse existing query logic paths and remove kg_search function entirely
• Update kg_query/naive_query to return raw data as needed
2025-09-13 15:30:29 +08:00
yangdx
8088b7e07a
Fix tuple delimiter corruption handling and update documentation
2025-09-12 18:03:37 +08:00
yangdx
8a3e2c03a9
Fix tuple delimiter corruption patterns with pipes and brackets
...
- Handle <||S||> malformed delimiters
- Fix <||> empty pipe sequences
- Repair <|| incomplete patterns
- Process ||S|| missing brackets
- Improve delimiter normalization
2025-09-12 17:45:32 +08:00
yangdx
0221213b9b
Improve entity summarization with JSONL format and fix tuple delimiters
...
• Convert descriptions to JSONL format
• Add token-based truncation helper
• Enhance entity name consistency rules
• Improve summarization prompt clarity
• Fix tuple delimiter corruption patterns
2025-09-12 12:32:08 +08:00
yangdx
1892ed23cc
Change tuple delimiter from <|SEP|> to <|S|> across codebase
...
• Update prompt instruction clarity
• Correct utility function examples
• Update regex pattern comments
2025-09-12 08:57:46 +08:00
yangdx
c07bcbff44
Fix tuple delimiter corruption patterns and add missing edge cases
2025-09-12 08:35:37 +08:00
yangdx
8660bf34e4
Add timestamp tracking for LLM responses and entity/relationship data
...
- Track timestamps for cache hits/misses
- Add timestamp to entity/relationship objects
- Sort descriptions by timestamp order
- Preserve temporal ordering in merges
2025-09-12 04:34:12 +08:00
yangdx
40688def20
Refactor tuple delimiter corruption fix into reusable utility function
...
- Extract regex fixes to utils module
- Add case-insensitive delimiter handling
2025-09-12 04:10:14 +08:00
yangdx
a49c8e4a0d
Refactor JSON serialization to use newline-separated format
...
- Replace json.dumps with line-by-line format
- Apply to entities, relations, text units
- Update truncation key functions
- Maintain ensure_ascii=False setting
- Improve context readability
2025-09-10 11:59:25 +08:00
yangdx
2dd143c935
Refactor conversation history handling to use LLM native message format
...
• Remove get_conversation_turns utility
• Pass history_messages to LLM directly
• Clean up prompt template formatting
2025-09-10 11:56:58 +08:00
yangdx
09abb656b8
Improve log message formatting for better readability
2025-09-09 17:41:09 +08:00
yangdx
d218f15a62
Refactor entity extraction with system prompts and output limits
...
- Add system/user prompt separation
- Set max tokens for endless output fix
- Improve extraction error logging
- Update cache type from extract to summary
2025-09-08 15:20:45 +08:00
yangdx
c87eb2cfcf
Increase timeout buffers for async function calls
...
• Extend execution timeout buffer to 150s
• Extend task duration buffer to 180s
• Account for low-level retry delays
• Improve health check phase handling
• Reduce timeout-related failures
2025-09-06 23:56:24 +08:00
yangdx
6be462511f
Add error prefixing for better debugging context in async operations
...
* Add create_prefixed_exception utility
* Prefix entity processing errors
* Prefix relationship processing errors
* Prefix chunk extraction progress info
* Maintain original exception chains
2025-09-05 21:28:00 +08:00
yangdx
2c551cb5db
Add support for Chinese book title marks in normalize_extracted_info
2025-09-04 18:51:57 +08:00
yangdx
9b516a8a53
Hot Fix: Preserve whitespace chars in text sanitization
...
• Keep \t, \n, \r in control char removal
2025-09-04 10:58:29 +08:00
yangdx
a25ce7f078
Fix linting
2025-09-03 21:58:30 +08:00
yangdx
7ef2f0dff6
Add VDB error handling with retries for data consistency
...
- Add safe_vdb_operation_with_exception util
- Wrap VDB ops in entity/relationship code
- Ensure exceptions propagate on failure
- Add retry logic with configurable delays
2025-09-03 21:15:09 +08:00
yangdx
5b2deccbef
Improve text normalization and add entity type capitalization
...
- Capitalize entity types with .title()
- Add non-breaking space handling
- Add narrow non-breaking space regex
2025-09-02 02:51:41 +08:00
yangdx
e95622ca7b
fix(utils): enhance remove_think_tags to handle orphaned </think> closing tags
...
The function now properly handles cases where text contains </think> closing tags
without corresponding <think> opening tags, which can occur due to content
truncation or processing errors.
2025-09-01 07:17:30 +08:00