ragflow

History

liuzhenghua 2f768b96e8 perf: optimze figure parser (#7392 ) ### What problem does this PR solve? When parsing documents containing images, the current code uses a single-threaded approach to call the VL model, resulting in extremely slow parsing speed (e.g., parsing a Word document with dozens of images takes over 20 minutes). By switching to a multithreaded approach to call the VL model, the parsing speed can be improved to an acceptable level. ### Type of change - [x] Performance Improvement --------- Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>		2025-05-06 14:39:45 +08:00
..
resume	Fix:when start with source code not in docker env report 'UnicodeDec… (#5802 )	2025-03-10 11:22:06 +08:00
__init__.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00
docx_parser.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00
excel_parser.py	Fix: When Excel is a formula, the parsed result is a formula, but cannot be correctly parsed as a value type (#6613 )	2025-03-28 09:33:49 +08:00
figure_parser.py	perf: optimze figure parser (#7392 )	2025-05-06 14:39:45 +08:00
html_parser.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00
json_parser.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00
markdown_parser.py	Feat：Optimize the table extraction logic in the Markdown parser: (#5663 )	2025-03-07 17:02:35 +08:00
pdf_parser.py	fix RAGFlowPdfParser AttributeError: 'PdfReader' object has no attribute 'close' err (#6859 )	2025-04-14 09:40:13 +08:00
ppt_parser.py	Refa: Optimize pptx shape extraction to reduce content loss (#6703 )	2025-04-22 10:16:24 +08:00
txt_parser.py	Fix: delimiter issue. (#5720 )	2025-03-06 17:51:22 +08:00
utils.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00