ragflow/rag/nlp
kaiyuan Zhang ead5f7aba9
Fix infinite recursion in RagTokenizer when processing repetitive characters (#6109)
### What problem does this PR solve?
fix #6085 
RagTokenizer's dfs_() function falls into infinite recursion when
processing text with repetitive Chinese characters (e.g.,
"一一一一一十一十一十一..." or "一一一一一一十十十十十十十二十二十二..."), causing memory leaks.
### Type of change
Implemented three optimizations to the dfs_() function:
1.Added memoization with _memo dictionary to cache computed results
2.Added recursion depth limiting with _depth parameter (max 10 levels)
3.Implemented special handling for repetitive character sequences
- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-01 13:59:52 +08:00
..
__init__.py Feat: text file support position retaining. (#6231) 2025-03-18 16:55:11 +08:00
query.py Refa: token similarity calculations. (#6614) 2025-03-28 09:33:08 +08:00
rag_tokenizer.py Fix infinite recursion in RagTokenizer when processing repetitive characters (#6109) 2025-04-01 13:59:52 +08:00
search.py Refa: token similarity calculations. (#6614) 2025-03-28 09:33:08 +08:00
surname.py Update info (#1005) 2024-05-31 09:53:04 +08:00
synonym.py Fix too many clause while searching. (#5119) 2025-02-19 13:18:39 +08:00
term_weight.py Fix errors detected by Ruff (#3918) 2024-12-08 14:21:12 +08:00