ragflow/rag
kaiyuan Zhang ead5f7aba9
Fix infinite recursion in RagTokenizer when processing repetitive characters (#6109)
### What problem does this PR solve?
fix #6085 
RagTokenizer's dfs_() function falls into infinite recursion when
processing text with repetitive Chinese characters (e.g.,
"一一一一一十一十一十一..." or "一一一一一一十十十十十十十二十二十二..."), causing memory leaks.
### Type of change
Implemented three optimizations to the dfs_() function:
1.Added memoization with _memo dictionary to cache computed results
2.Added recursion depth limiting with _depth parameter (max 10 levels)
3.Implemented special handling for repetitive character sequences
- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-01 13:59:52 +08:00
..
app Fix: point in tag issue. (#6436) 2025-03-24 10:45:29 +08:00
llm Feat: support vision llm for gpustack (#6636) 2025-03-31 15:33:52 +08:00
nlp Fix infinite recursion in RagTokenizer when processing repetitive characters (#6109) 2025-04-01 13:59:52 +08:00
res
svr Optimize graphrag again (#6513) 2025-03-26 15:34:42 +08:00
utils Feat: extend S3 storage compatibility and add knowledge base ID prefix (#6355) 2025-03-31 16:09:43 +08:00
__init__.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
benchmark.py Refactor embedding batch_size (#3825) 2024-12-03 16:22:39 +08:00
prompts.py Feat: add VLM-boosted PDF parser (#6278) 2025-03-20 09:39:32 +08:00
raptor.py Refactor graphrag to remove redis lock (#5828) 2025-03-10 15:15:06 +08:00
settings.py Fix: optimize setting config initialization to resolve Minio initialization error (#6282) 2025-03-20 10:45:40 +08:00