- Change sanitize_text_for_encoding to fail-fast instead of returning error placeholders - Add strict UTF-8 cleaning pipeline to entity/relationship extraction - Skip problematic entities/relationships instead of corrupting data Fixes document processing crashes when encountering surrogate characters (U+D800-U+DFFF) |
||
|---|---|---|
| .. | ||
| api | ||
| kg | ||
| llm | ||
| tools | ||
| __init__.py | ||
| base.py | ||
| constants.py | ||
| exceptions.py | ||
| lightrag.py | ||
| namespace.py | ||
| operate.py | ||
| prompt.py | ||
| rerank.py | ||
| types.py | ||
| utils.py | ||
| utils_graph.py | ||