docs: Add detailed PDF parser processing steps documentation

Created comprehensive documentation for RAGFlowPdfParser processing pipeline:

- 10 major processing steps with code references
- Complete data flow diagrams
- Algorithm explanations (K-Means column detection, text merging)
- Box data structure evolution through pipeline
- Position tag format specification
- Line-by-line code analysis for key methods:
  - __init__ (model loading)
  - __images__ (OCR processing)
  - _layouts_rec (layout detection)
  - _table_transformer_job (table structure)
  - _assign_column (column detection)
  - _text_merge (horizontal merge)
  - _naive_vertical_merge (vertical merge)
  - _filter_forpages (cleanup)
  - _extract_table_figure (extraction)
  - __filterout_scraps (final output)
This commit is contained in:
Claude 2025-11-27 06:29:12 +00:00
parent 6d4dbbfe2c
commit 1dcc9a870b
No known key found for this signature in database

File diff suppressed because it is too large Load diff