Commit graph

21 commits

Author SHA1 Message Date
Zhedong Cen
b3f782b3d3
Fix dependency conflict (#1293)
### What problem does this PR solve?

Fix dependency conflict

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-27 14:36:49 +08:00
Zhedong Cen
b75bb1d8d3
Support displaying tables in the chunks of pdf file when using QA parser (#1263)
### What problem does this PR solve?

Support displaying tables in the chunks of pdf file when using QA parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-24 19:02:18 +08:00
KevinHuSh
81d1c5a695
Update requirements.txt 2024-06-19 08:50:01 +08:00
rickywu
f04fb36c26
upgrade version fix security bug (#1173)
### What problem does this PR solve?

due to security problem, need updagre to fix, see bellow


### Type of change

- [x] Other (please describe):

Name| version | CVE | upgrade version
-- | -- | -- | --
PyMySQL | 1.1.0 | CVE-2024-36039 | 1.1.1
Werkzeug | 3.0.1 | CVE-2024-34069 | 3.0.3
aiohttp | 3.9.3 | CVE-2024-30251 | 3.9.4
pillow | 10.2.0 | CVE-2024-28219 | 10.3.0
2024-06-17 10:51:48 +08:00
Zhedong Cen
90975460af
Add pdf support for QA parser (#1155)
### What problem does this PR solve?

Support extracting questions and answers from PDF files

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-06-14 15:12:39 +08:00
Fakai Zhao
7eb69fe6d9
Supports obtaining PDF documents from web pages (#1107)
### What problem does this PR solve?

Knowledge base management supports crawling information from web pages
and generating PDF documents

### Type of change
- [x] New Feature (Support document from web pages)
2024-06-11 10:45:19 +08:00
KevinHuSh
b6980d8a16
add version to package volcengine (#1062)
### What problem does this PR solve?

#992 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-06-05 12:18:36 +08:00
Zhedong Cen
8dd45459be
Add support for HTML file (#973)
### What problem does this PR solve?

Add support for HTML file

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-05-30 09:12:55 +08:00
KevinHuSh
7eee193956
fix #917 #915 (#946)
### What problem does this PR solve?

#917 
#915

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-28 11:13:02 +08:00
KevinHuSh
46454362d7
fix raptor bugs (#928)
### What problem does this PR solve?

#922 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-27 11:01:20 +08:00
dashi6174
9e3a0e4d03
The fasttext library is missing, and it is used in the operators.py file. (#925)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-27 08:18:47 +08:00
KevinHuSh
a6e4b74d94
remove unused dependency (#664)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-05-07 19:46:17 +08:00
Fakai Zhao
de839fc3f0
optimize srv broker and executor logic (#630)
### What problem does this PR solve?

Optimize task broker and executor for reduce memory usage and deployment
complexity.

### Type of change
- [x] Performance Improvement
- [x] Refactoring

### Change Log
- Enhance redis utils for message queue(use stream)
- Modify task broker logic via message queue (1.get parse event from
message queue 2.use ThreadPoolExecutor async executor )
- Modify the table column name of document and task (process_duation ->
process_duration maybe just a spelling mistake)
- Reformat some code style(just what i see)
- Add requirement_dev.txt for developer
- Add redis container on docker compose

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-05-07 11:43:33 +08:00
Moonlit
c89f3c3cdb
Fix missing 'ollama' package in requirements.txt (#621)
### What problem does this PR solve?

This commit resolves an issue where the 'ollama' package was
inadvertently omitted from the requirements.txt file. The package has
now been added to ensure all dependencies are correctly installed for
the project.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-30 16:29:46 +08:00
Moonlit
5d7f573379
Fix: missing 'redis' package in requirements.txt (#622)
### What problem does this PR solve?

This commit resolves an issue where the 'redis' package was
inadvertently omitted from the requirements.txt file. The package has
now been added to ensure all dependencies are correctly installed for
the project.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-04-30 16:29:27 +08:00
KevinHuSh
cab274f560
remove PyMuPDF (#618)
### What problem does this PR solve?
#613 

### Type of change


- [x] Other (please describe):
2024-04-30 12:38:09 +08:00
chrysanthemum-boy
72384b191d
Add .doc file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO

def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chrysanthemum-boy <fannc@qq.com>
2024-04-23 15:31:43 +08:00
KevinHuSh
890561703b
Add bce-embedding and fastembed (#383)
### What problem does this PR solve?


Issue link:#326

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 16:42:19 +08:00
jie yang
a7be5d4e8b
build ragflow image from scratch (#376)
### What problem does this PR solve?

issue: #205 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-04-16 12:29:58 +08:00
Anush
826ad6a33a
feat: FastEmbed embedding support (#291)
### Description

Following up on https://github.com/infiniflow/ragflow/pull/275, this PR
adds support for FastEmbed model configurations.

The options are not exhaustive. You can find the full list
[here](https://qdrant.github.io/fastembed/examples/Supported_Models/).

P.S. I ran into OOM issues when building the image.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>
2024-04-15 15:58:06 +08:00
KevinHuSh
71fe314955
refine page ranges (#147) 2024-03-25 13:11:57 +08:00