ragflow

History

zhudongwork 10432a1be7 Refa: Optimize pptx shape extraction to reduce content loss (#6703 ) ### What problem does this PR solve? When parsing pptx files, some shapes do not contain the `shape_type` attribute, which causes the original code to throw an exception during extraction, leading to failure in content extraction. This optimization introduces handling logic for such anomalous shapes, providing a safer and more robust processing mechanism. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [x] Refactoring - [x] Performance Improvement - [ ] Other (please describe):		2025-04-22 10:16:24 +08:00
..
resume	Fix:when start with source code not in docker env report 'UnicodeDec… (#5802 )	2025-03-10 11:22:06 +08:00
__init__.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00
docx_parser.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00
excel_parser.py	Fix: When Excel is a formula, the parsed result is a formula, but cannot be correctly parsed as a value type (#6613 )	2025-03-28 09:33:49 +08:00
figure_parser.py	Feat: add VLM-boosted DocX parser (#6307 )	2025-03-20 11:24:44 +08:00
html_parser.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00
json_parser.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00
markdown_parser.py	Feat：Optimize the table extraction logic in the Markdown parser: (#5663 )	2025-03-07 17:02:35 +08:00
pdf_parser.py	fix RAGFlowPdfParser AttributeError: 'PdfReader' object has no attribute 'close' err (#6859 )	2025-04-14 09:40:13 +08:00
ppt_parser.py	Refa: Optimize pptx shape extraction to reduce content loss (#6703 )	2025-04-22 10:16:24 +08:00
txt_parser.py	Fix: delimiter issue. (#5720 )	2025-03-06 17:51:22 +08:00
utils.py	Update comments (#4569 )	2025-01-21 20:52:28 +08:00