fix: Remove custom pdf handling and rely on filetype library (#1694)

<!-- .github/pull_request_template.md -->

## Description
Remove custom PDF handling and let filetype handle PDF documents

## Type of Change
<!-- Please check the relevant option -->
- [x] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
This commit is contained in:
Vasilije 2025-10-29 14:48:29 +01:00 committed by GitHub
commit 76396d5d27
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -58,53 +58,6 @@ txt_file_type = TxtFileType()
filetype.add_type(txt_file_type) filetype.add_type(txt_file_type)
class CustomPdfMatcher(filetype.Type):
"""
Match PDF file types based on MIME type and extension.
Public methods:
- match
Instance variables:
- MIME: The MIME type of the PDF.
- EXTENSION: The file extension of the PDF.
"""
MIME = "application/pdf"
EXTENSION = "pdf"
def __init__(self):
super(CustomPdfMatcher, self).__init__(
mime=CustomPdfMatcher.MIME, extension=CustomPdfMatcher.EXTENSION
)
def match(self, buf):
"""
Determine if the provided buffer is a PDF file.
This method checks for the presence of the PDF signature in the buffer.
Raises:
- TypeError: If the buffer is not of bytes type.
Parameters:
-----------
- buf: The buffer containing the data to be checked.
Returns:
--------
Returns True if the buffer contains a PDF signature, otherwise returns False.
"""
return b"PDF-" in buf
custom_pdf_matcher = CustomPdfMatcher()
filetype.add_type(custom_pdf_matcher)
def guess_file_type(file: BinaryIO) -> filetype.Type: def guess_file_type(file: BinaryIO) -> filetype.Type:
""" """
Guess the file type from the given binary file stream. Guess the file type from the given binary file stream.