ragflow/rag/app
Yongteng Lei 5200711441
Feat: add support for multi-column PDF parsing (#10475)
### What problem does this PR solve?

Add support for multi-columns PDF parsing. #9878, #9919.

Two-column sample:
<img width="1885" height="1020" alt="image"
src="https://github.com/user-attachments/assets/0270c028-2db8-4ca6-a4b7-cd5830882d28"
/>

Three-column sample: 
<img width="1881" height="992" alt="image"
src="https://github.com/user-attachments/assets/9ee88844-d5b1-4927-9e4e-3bd810d6e03a"
/>

Single-column sample:
<img width="1883" height="1042" alt="image"
src="https://github.com/user-attachments/assets/e93d3d18-43c3-4067-b5fa-e454ed0ab093"
/>



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-10-11 18:46:09 +08:00
..
__init__.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
audio.py Refa: OpenAI whisper-1 (#9552) 2025-08-19 16:41:18 +08:00
book.py Feat: Redesign and refactor agent module (#9113) 2025-07-30 19:41:09 +08:00
email.py Feat: Use data pipeline to visualize the parsing configuration of the knowledge base (#10423) 2025-10-09 12:36:19 +08:00
laws.py Add tree_merge for law parsers, significantly outperforming hierarchical_merge (#10202) 2025-09-22 16:33:21 +08:00
manual.py Feat: Redesign and refactor agent module (#9113) 2025-07-30 19:41:09 +08:00
naive.py Feat: add support for multi-column PDF parsing (#10475) 2025-10-11 18:46:09 +08:00
one.py Feat: Redesign and refactor agent module (#9113) 2025-07-30 19:41:09 +08:00
paper.py Feat: Redesign and refactor agent module (#9113) 2025-07-30 19:41:09 +08:00
picture.py Refactor: Improve the buffer close for vision_llm_chunk (#9845) 2025-09-02 10:31:37 +08:00
presentation.py Fix: PlainParser using fix in presentation (#9239) 2025-08-05 17:48:18 +08:00
qa.py Fix: Solve the OOM issue when passing large PDF files while using QA chunking method. (#8464) 2025-06-25 10:25:45 +08:00
resume.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
table.py Support the case of one cell split by multiple columns. (#9225) 2025-08-11 17:17:56 +08:00
tag.py Fix typos: retrievaler -> retriever (#10372) 2025-10-10 09:17:36 +08:00