Claude
|
1dcc9a870b
|
docs: Add detailed PDF parser processing steps documentation
Created comprehensive documentation for RAGFlowPdfParser processing pipeline:
- 10 major processing steps with code references
- Complete data flow diagrams
- Algorithm explanations (K-Means column detection, text merging)
- Box data structure evolution through pipeline
- Position tag format specification
- Line-by-line code analysis for key methods:
- __init__ (model loading)
- __images__ (OCR processing)
- _layouts_rec (layout detection)
- _table_transformer_job (table structure)
- _assign_column (column detection)
- _text_merge (horizontal merge)
- _naive_vertical_merge (vertical merge)
- _filter_forpages (cleanup)
- _extract_table_figure (extraction)
- __filterout_scraps (final output)
|
2025-11-27 06:29:12 +00:00 |
|