• Enhanced UTF-8 validation for text files • Added content validation checks • Better handling of binary data • Added logging for ignored document IDs • Improved document ID filtering