ragflow/rag
liuzhenghua 2f768b96e8
perf: optimze figure parser (#7392)
### What problem does this PR solve?

When parsing documents containing images, the current code uses a
single-threaded approach to call the VL model, resulting in extremely
slow parsing speed (e.g., parsing a Word document with dozens of images
takes over 20 minutes).

By switching to a multithreaded approach to call the VL model, the
parsing speed can be improved to an acceptable level.

### Type of change

- [x] Performance Improvement

---------

Co-authored-by: liuzhenghua-jk <liuzhenghua-jk@360shuke.com>
2025-05-06 14:39:45 +08:00
..
app Fix: Add title_tks for Pictures (#7365) 2025-04-28 13:35:34 +08:00
llm Fix:Set CUDA_VISIBLE_DEVICES In DefaultEmbedding (#7465) 2025-05-06 14:38:36 +08:00
nlp Refa: similarity calculations. (#7381) 2025-04-28 19:17:11 +08:00
res Format file format from Windows/dos to Unix (#1949) 2024-08-15 09:17:36 +08:00
svr perf: optimze figure parser (#7392) 2025-05-06 14:39:45 +08:00
utils Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151) 2025-04-30 09:43:17 +08:00
__init__.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
benchmark.py Refactor embedding batch_size (#3825) 2024-12-03 16:22:39 +08:00
prompts.py Fix: LLM generated tag issue. (#7301) 2025-04-25 14:38:34 +08:00
raptor.py <think> tag is missing. (#7256) 2025-04-24 11:44:10 +08:00
settings.py Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140) 2025-04-24 16:03:31 +08:00