ragflow/deepdoc/parser
zhudongwork 10432a1be7
Refa: Optimize pptx shape extraction to reduce content loss (#6703)
### What problem does this PR solve?

When parsing pptx files, some shapes do not contain the `shape_type`
attribute, which causes the original code to throw an exception during
extraction, leading to failure in content extraction. This optimization
introduces handling logic for such anomalous shapes, providing a safer
and more robust processing mechanism.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):
2025-04-22 10:16:24 +08:00
..
resume
__init__.py
docx_parser.py
excel_parser.py Fix: When Excel is a formula, the parsed result is a formula, but cannot be correctly parsed as a value type (#6613) 2025-03-28 09:33:49 +08:00
figure_parser.py Feat: add VLM-boosted DocX parser (#6307) 2025-03-20 11:24:44 +08:00
html_parser.py
json_parser.py
markdown_parser.py
pdf_parser.py fix RAGFlowPdfParser AttributeError: 'PdfReader' object has no attribute 'close' err (#6859) 2025-04-14 09:40:13 +08:00
ppt_parser.py Refa: Optimize pptx shape extraction to reduce content loss (#6703) 2025-04-22 10:16:24 +08:00
txt_parser.py
utils.py