Commit graph

1 commit

Author SHA1 Message Date
hsparks.codes
e2404d728b feat: Add image extraction capability to Excel parser
Implements image extraction from Excel files with metadata.

Features:
- Extract embedded images from all sheets in Excel workbook
- Capture image metadata: format, position (anchor cell), description, size
- Base64 encode images for easy storage and transmission
- Support multiple image formats (PNG, JPEG, GIF, BMP, etc.)
- Handle images across multiple sheets
- Include comprehensive unit tests (6 tests, all passing)

Implementation:
- Add extract_images() method to RAGFlowExcelParser
- Use openpyxl's built-in image handling (_images property)
- Convert column numbers to Excel letters (A, B, AA, etc.)
- Extract alt text/descriptions when available
- Return structured image data with position information

Tests:
- test_extract_images_from_excel: Basic extraction
- test_extract_images_from_excel_without_images: Empty file handling
- test_extract_images_multiple_sheets: Multi-sheet support
- test_column_letter_conversion: Position calculation
- test_extract_images_with_description: Metadata extraction
- test_extract_images_with_size: Size information

Fixes #11618
2025-12-03 11:51:07 +01:00