### What problem does this PR solve? When parsing documents containing images, the current code uses a single-threaded approach to call the VL model, resulting in extremely slow parsing speed (e.g., parsing a Word document with dozens of images takes over 20 minutes). By switching to a multithreaded approach to call the VL model, the parsing speed can be improved to an acceptable level. ### Type of change - [x] Performance Improvement --------- Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com> |
||
|---|---|---|
| .. | ||
| resume | ||
| __init__.py | ||
| docx_parser.py | ||
| excel_parser.py | ||
| figure_parser.py | ||
| html_parser.py | ||
| json_parser.py | ||
| markdown_parser.py | ||
| pdf_parser.py | ||
| ppt_parser.py | ||
| txt_parser.py | ||
| utils.py | ||