ragflow/rag/app
chrysanthemum-boy 72384b191d
Add .doc file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO

def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chrysanthemum-boy <fannc@qq.com>
2024-04-23 15:31:43 +08:00
..
__init__.py Some document API refined. (#53) 2024-02-02 19:21:37 +08:00
book.py Add .doc file parser. (#497) 2024-04-23 15:31:43 +08:00
laws.py Add .doc file parser. (#497) 2024-04-23 15:31:43 +08:00
manual.py enlarge docker memory usage (#501) 2024-04-23 14:41:10 +08:00
naive.py Add .doc file parser. (#497) 2024-04-23 15:31:43 +08:00
one.py Add .doc file parser. (#497) 2024-04-23 15:31:43 +08:00
paper.py enlarge docker memory usage (#501) 2024-04-23 14:41:10 +08:00
picture.py refine OpenAi Api (#159) 2024-03-27 17:55:45 +08:00
presentation.py apply pep8 formalize (#155) 2024-03-27 11:33:46 +08:00
qa.py Fit a lot of encodings for text file. (#458) 2024-04-19 18:02:53 +08:00
resume.py apply pep8 formalize (#155) 2024-03-27 11:33:46 +08:00
table.py Fit a lot of encodings for text file. (#458) 2024-04-19 18:02:53 +08:00