ragflow/deepdoc/parser
liuzhenghua 2f768b96e8
perf: optimze figure parser (#7392)
### What problem does this PR solve?

When parsing documents containing images, the current code uses a
single-threaded approach to call the VL model, resulting in extremely
slow parsing speed (e.g., parsing a Word document with dozens of images
takes over 20 minutes).

By switching to a multithreaded approach to call the VL model, the
parsing speed can be improved to an acceptable level.

### Type of change

- [x] Performance Improvement

---------

Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
2025-05-06 14:39:45 +08:00
..
resume Fix:when start with source code not in docker env report 'UnicodeDec… (#5802) 2025-03-10 11:22:06 +08:00
__init__.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
docx_parser.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
excel_parser.py Fix: When Excel is a formula, the parsed result is a formula, but cannot be correctly parsed as a value type (#6613) 2025-03-28 09:33:49 +08:00
figure_parser.py perf: optimze figure parser (#7392) 2025-05-06 14:39:45 +08:00
html_parser.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
json_parser.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
markdown_parser.py Feat:Optimize the table extraction logic in the Markdown parser: (#5663) 2025-03-07 17:02:35 +08:00
pdf_parser.py fix RAGFlowPdfParser AttributeError: 'PdfReader' object has no attribute 'close' err (#6859) 2025-04-14 09:40:13 +08:00
ppt_parser.py Refa: Optimize pptx shape extraction to reduce content loss (#6703) 2025-04-22 10:16:24 +08:00
txt_parser.py Fix: delimiter issue. (#5720) 2025-03-06 17:51:22 +08:00
utils.py Update comments (#4569) 2025-01-21 20:52:28 +08:00