LightRAG/lightrag
yangdx 99e28e815b fix: prevent document processing failures from UTF-8 surrogate characters
- Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders
- Add strict UTF-8 cleaning pipeline to entity/relationship extraction
- Skip problematic entities/relationships instead of corrupting data

Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF)
2025-08-27 23:52:39 +08:00
..
api Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method 2025-08-27 12:23:22 +08:00
kg fix mismatch of 'error' and 'error_msg' in MongoDB 2025-08-26 10:43:56 +08:00
llm feat: Add extra_body parameter support for OpenRouter/vLLM compatibility 2025-08-21 13:06:28 +08:00
tools Fix linting 2025-08-23 02:39:12 +08:00
__init__.py Bump core version to 1.4.7 and api version to 0198 2025-08-04 10:55:41 +08:00
base.py Rename ENABLE_RERANK to RERANK_BY_DEFAULT and update default to true 2025-08-23 09:46:51 +08:00
constants.py Restore default entity types 2025-08-27 12:51:18 +08:00
exceptions.py Rename allow_create to first_initialization for clarity 2025-08-23 02:34:39 +08:00
lightrag.py Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method 2025-08-27 12:23:22 +08:00
namespace.py Refac: Add workspace infomation to all logger output for all storage type 2025-08-12 01:19:09 +08:00
operate.py fix: prevent document processing failures from UTF-8 surrogate characters 2025-08-27 23:52:39 +08:00
prompt.py Unify SUMMARY_LANGUANGE and ENTITY_TYPES implementation method 2025-08-27 12:23:22 +08:00
rerank.py fix: Resolve default rerank config problem when env var missing 2025-08-23 01:07:59 +08:00
types.py
utils.py fix: prevent document processing failures from UTF-8 surrogate characters 2025-08-27 23:52:39 +08:00
utils_graph.py Fix GRAPH_FIELD_SEP import typo 2025-06-29 01:28:39 +05:00