Commit graph

69 commits

Author SHA1 Message Date
YngvarHuang
81eb03d230
Support uploading encrypted files to object storage (#11837) (#11838)
### What problem does this PR solve?

Support uploading encrypted files to object storage.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: virgilwong <hyhvirgil@gmail.com>
2025-12-15 09:45:18 +08:00
Yongteng Lei
0f0fb53256
Refa: refactor metadata filter (#11907)
### What problem does this PR solve?

Refactor metadata filter.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-12 17:12:38 +08:00
Magicbook1108
7db9045b74
Feat: Add box connector (#11845)
### What problem does this PR solve?

Feat: Add box connector

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-12 10:23:40 +08:00
buua436
e3cfe8e848
Fix:async issue and sensitive logging (#11895)
### What problem does this PR solve?

change:
async issue and sensitive logging

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-11 13:54:47 +08:00
buua436
3cb72377d7
Refa:remove sensitive information (#11873)
### What problem does this PR solve?

change:
remove sensitive information

### Type of change

- [x] Refactoring
2025-12-10 19:08:45 +08:00
Lynn
a1164b9c89
Feat/memory (#11812)
### What problem does this PR solve?

Manage and display memory datasets.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-12-10 13:34:08 +08:00
buua436
65a5a56d95
Refa:replace trio with asyncio (#11831)
### What problem does this PR solve?

change:
replace trio with asyncio

### Type of change
- [x] Refactoring
2025-12-09 19:23:14 +08:00
Yongteng Lei
a94b3b9df2
Refa: treat MinerU as an OCR model (#11849)
### What problem does this PR solve?

 Treat MinerU as an OCR model.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2025-12-09 18:54:14 +08:00
Zhichang Yu
bb6022477e
Bump infinity to v0.6.11. Requires python>=3.11 (#11814)
### What problem does this PR solve?

Bump infinity to v0.6.11. Requires python>=3.11

### Type of change

- [x] Refactoring
2025-12-09 16:23:37 +08:00
sjIlll
1777620ea5
fix: set default embedding model for TEI profile in Docker deployment (#11824)
## What's changed
fix: unify embedding model fallback logic for both TEI and non-TEI
Docker deployments

> This fix targets **Docker / `docker-compose` deployments**, ensuring a
valid default embedding model is always set—regardless of the compose
profile used.

##  Changes

| Scenario | New Behavior |
|--------|--------------|
| **Non-`tei-` profile** (e.g., default deployment) | `EMBEDDING_MDL` is
now correctly initialized from `EMBEDDING_CFG` (derived from
`user_default_llm`), ensuring custom defaults like `bge-m3@Ollama` are
properly applied to new tenants. |
| **`tei-` profile** (`COMPOSE_PROFILES` contains `tei-`) | Still
respects the `TEI_MODEL` environment variable. If unset, falls back to
`EMBEDDING_CFG`. Only when both are empty does it use the built-in
default (`BAAI/bge-small-en-v1.5`), preventing an empty embedding model.
|

##  Why This Change?

- **In non-TEI mode**: The previous logic would reset `EMBEDDING_MDL` to
an empty string, causing pre-configured defaults (e.g., `bge-m3@Ollama`
in the Docker image) to be ignored—leading to tenant initialization
failures or silent misconfigurations.
- **In TEI mode**: Users need the ability to override the model via
`TEI_MODEL`, but without a safe fallback, missing configuration could
break the system. The new logic adopts a **“config-first,
env-var-override”** strategy for robustness in containerized
environments.

##  Implementation

- Updated the assignment logic for `EMBEDDING_MDL` in
`rag/common/settings.py` to follow a unified fallback chain:

EMBEDDING_CFG → TEI_MODEL (if tei- profile active) → built-in default


##  Testing

Verified in Docker deployments:

1. **`COMPOSE_PROFILES=`** (no TEI)  
 → New tenants get `bge-m3@Ollama` as the default embedding model 
2. **`COMPOSE_PROFILES=tei-gpu` with no `TEI_MODEL` set**  
 → Falls back to `BAAI/bge-small-en-v1.5`   
3. **`COMPOSE_PROFILES=tei-gpu` with `TEI_MODEL=my-model`**  
 → New tenants use `my-model` as the embedding model   

Closes #8916 
fix #11522 
fix #11306
2025-12-09 09:38:44 +08:00
Levi
f3a03b06b2
fix: align http client proxy kwarg (#11818)
### What problem does this PR solve?

Our HTTP wrapper still passed proxies to httpx.Client/AsyncClient, which
expect proxy. As a result, configured proxies were ignored and calls
could fail with ValueError("Failed to fetch OIDC metadata:
Client.__init__() got an unexpected keyword argument 'proxies'"). This
PR switches to the correct proxy kwarg so proxies are honored and the
runtime error is resolved.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
---

Contribution during my time at RAGcon GmbH.
2025-12-09 09:35:03 +08:00
Jin Hai
43f51baa96
Fix errors (#11804)
### What problem does this PR solve?

1. typos
2. grammar errors.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-08 12:21:18 +08:00
Jin Hai
6546f86b4e
Fix errors (#11795)
### What problem does this PR solve?

- typos
- IDE warnings

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-08 09:42:10 +08:00
Jonah Hartmann
6587acef88
Feat: use filepath for files with the same name (#11752)
### What problem does this PR solve?

When there are multiple files with the same name the file would just
duplicate, making it hard to distinguish between the different files.
Now if there are multiple files with the same name, they will be named
after their folder path in the webdav storage unit.

The same could be done for the other connectors, too, since most of them
will have similars issues, when iterating through the folder paths.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Contribution by RAGcon GmbH, visit us [here](https://www.ragcon.ai/)
2025-12-05 10:10:26 +08:00
Wiratama
cbdacf21f6
feat(gcs): Add support for Google Cloud Storage (GCS) integration (#11718)
### What problem does this PR solve?

This Pull Request introduces native support for Google Cloud Storage
(GCS) as an optional object storage backend.

Currently, RAGFlow relies on a limited set of storage options. This
feature addresses the need for seamless integration with GCP
environments, allowing users to leverage a fully managed, highly
durable, and scalable storage service (GCS) instead of needing to deploy
and maintain third-party object storage solutions. This simplifies
deployment, especially for users running on GCP infrastructure like GKE
or Cloud Run.

The implementation uses a single GCS bucket defined via configuration,
mapping RAGFlow's internal logical storage units (or "buckets") to
folder prefixes within that GCS container to maintain data separation.
This architectural choice avoids the operational complexities associated
with dynamically creating and managing unique GCS buckets for every
logical unit.

### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-12-04 10:44:05 +08:00
Jin Hai
3c50c7d3ac
Refactor code (#11694)
### What problem does this PR solve?

Rename function and refactor log message

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-03 15:15:00 +08:00
Billy Bao
6fc7def562
Feat: optimize the information displayed when .doc preview is unavailable (#11684)
### What problem does this PR solve?

Feat: optimize the information displayed when .doc preview is
unavailable #11605

### Type of change

- [X] New Feature (non-breaking change which adds functionality)


#### Performance (Before)
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/15cf69ee-3698-4e18-8e8f-bb75c321334d"
/>

#### Performance (After)

![img_v3_02sk_c0fcaf74-4a26-4b6c-b0e0-8f8929426d9g](https://github.com/user-attachments/assets/8c8eea3e-2c8e-457c-ab2b-5ef205806f42)
2025-12-03 12:22:01 +08:00
Levi
962bd5f5df
feat: improve Moodle connector functionality (#11665)
### What problem does this PR solve?

Add metadata from moodle data source.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-02 19:12:43 +08:00
Billy Bao
c946858328
Feat: add mineru auto installer (#11649)
### What problem does this PR solve?

Feat: add mineru auto installer

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-02 17:29:26 +08:00
Yongteng Lei
b6c4722687
Refa: make RAGFlow more asynchronous (#11601)
### What problem does this PR solve?

Try to make this more asynchronous. Verified in chat and agent
scenarios, reducing blocking behavior. #11551, #11579.

However, the impact of these changes still requires further
investigation to ensure everything works as expected.

### Type of change

- [x] Refactoring
2025-12-01 14:24:06 +08:00
Billy Bao
cf7fdd274b
Feat: add gmail connector (#11549)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-28 13:09:40 +08:00
天海蒼灆
a9259917c6
fix(files): replace hard coded status codes with constants (#11544)
### What problem does this PR solve?

To solve the problem of error reporting caused by type errors when
various types of exception returns are triggered

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-27 09:41:24 +08:00
Levi
12979a3f21
feat: improve metadata handling in connector service (#11421)
### What problem does this PR solve?

- Update sync data source to handle metadata properly

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-11-26 19:55:48 +08:00
Jonah Hartmann
2fd5ac1031
Feat: Add Webdav storage as data source (#11422)
### What problem does this PR solve?

This PR adds webdav storage as data source for data sync service.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-26 14:14:42 +08:00
Zhichang Yu
40e84ca41a
Use Infinity single-field-multi-index (#11444)
### What problem does this PR solve?

Use Infinity single-field-multi-index

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-11-26 11:06:37 +08:00
Kinopio
a793dd2ea8
Feat: add addressing style config for S3-compatible storage (#11510)
### Type of change
* [x]  New Feature (non-breaking change which adds functionality)


Add support for Virtual Hosted Style and Path Style URL addressing in
S3_COMPATIBLE storage connector. Default to Virtual Hosted Style for
better compatibility with COS and other S3-compatible services.

- Add addressing_style field to credentials (virtual/path)
- Update frontend form with selection dropdown
- Add validation and tooltips for S3 Compatible endpoint URL

<img width="703" height="875" alt="image"
src="https://github.com/user-attachments/assets/af5ba7ca-f160-47fa-8ba1-32eace8f5fdf"
/>

<img width="1620" height="788" alt="image"
src="https://github.com/user-attachments/assets/6012b5ce-8bcb-478e-a9cb-425f886d5046"
/>

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-11-25 16:24:14 +08:00
Yongteng Lei
d1744aaaf3
Feat: add datasource Dropbox (#11488)
### What problem does this PR solve?

Add datasource Dropbox.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-25 09:40:03 +08:00
FoolFu
3c41159d26
Update logging for auto-generated SECRET_KEY (#11458)
Remove the code that exposes the generated key in the log, as it poses a
security risk.
 
<img width="1170" height="269" alt="image"
src="https://github.com/user-attachments/assets/03c42516-af1a-49a4-ade2-4ef3ee4b3cdd"
/>

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-11-24 10:21:06 +08:00
Levi
f0a14f5fce
Add Moodle data source integration (#11325)
### What problem does this PR solve?

This PR adds a native Moodle connector to sync content (courses,
resources, forums, assignments, pages, books) into RAGFlow.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-11-21 19:58:49 +08:00
Yongteng Lei
065917bf1c
Feat: enriches Notion connector (#11414)
### What problem does this PR solve?

Enriches rich text (links, mentions, equations), flags to-do blocks with
[x]/[ ], captures block-level equations, builds table HTML, downloads
attachments.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 19:51:37 +08:00
张哲芳
ecf0322165
fix(llm): handle None response in total_token_count_from_response (#10941)
### What problem does this PR solve?

Fixes #10933

This PR fixes a `TypeError` in the Gemini model provider where the
`total_token_count_from_response()` function could receive a `None`
response object, causing the error:

TypeError: argument of type 'NoneType' is not iterable

**Root Cause:**
The function attempted to use the `in` operator to check dictionary keys
(lines 48, 54, 60) without first validating that `resp` was not `None`.
When Gemini's `chat_streamly()` method returns `None`, this triggers the
error.

**Solution:**
1. Added a null check at the beginning of the function to return `0` if
`resp is None`
2. Added `isinstance(resp, dict)` checks before all `in` operations to
ensure type safety
3. This defensive programming approach prevents the TypeError while
maintaining backward compatibility

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes Made

**File:** `rag/utils/__init__.py`

- Line 36-38: Added `if resp is None: return 0` check
- Line 52: Added `isinstance(resp, dict)` before `'usage' in resp`
- Line 58: Added `isinstance(resp, dict)` before `'usage' in resp`  
- Line 64: Added `isinstance(resp, dict)` before `'meta' in resp`

### Testing

- [x] Code compiles without errors
- [x] Follows existing code style and conventions
- [x] Change is minimal and focused on the specific issue

### Additional Notes

This fix ensures robust handling of various response types from LLM
providers, particularly Gemini, w

---------

Signed-off-by: Zhang Zhefang <zhangzhefang@example.com>
2025-11-20 10:04:03 +08:00
He Wang
38234aca53
feat: add OceanBase doc engine (#11228)
### What problem does this PR solve?

Add OceanBase doc engine. Close #5350

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 10:00:14 +08:00
Kevin Hu
c43bf1dcf5
Fix: refine error msg. (#11380)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 19:10:45 +08:00
Kevin Hu
d1716d865a
Feat: Alter flask to Quart for async API serving. (#11275)
### What problem does this PR solve?

#11277

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 17:05:16 +08:00
Jonah Hartmann
89e8818dda
Feat: add s3-compatible storage boxes (#11313)
### What problem does this PR solve?

PR for implementing s3 compatible storage units #11240 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 09:39:25 +08:00
Jin Hai
bd4bc57009
Refactor: move mcp connection utilities to common (#11304)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-17 15:34:17 +08:00
Yongteng Lei
13e212c856
Feat: add Jira connector (#11285)
### What problem does this PR solve?

Add Jira connector.

<img width="978" height="925" alt="image"
src="https://github.com/user-attachments/assets/78bb5c77-2710-4569-a76e-9087ca23b227"
/>

---

<img width="1903" height="489" alt="image"
src="https://github.com/user-attachments/assets/193bc5c5-f751-4bd5-883a-2173282c2b96"
/>

---

<img width="1035" height="925" alt="image"
src="https://github.com/user-attachments/assets/1a0aec19-30eb-4ada-9283-61d1c915f59d"
/>

---

<img width="1905" height="601" alt="image"
src="https://github.com/user-attachments/assets/3dde1062-3f27-4717-8e09-fd5fd5e64171"
/>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-17 09:38:04 +08:00
Yongteng Lei
908450509f
Feat: add fault-tolerant mechanism to RAPTOR (#11206)
### What problem does this PR solve?

Add fault-tolerant mechanism to RAPTOR.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-13 18:48:07 +08:00
Yongteng Lei
2bd7abadd3
Fix: Confluence cannot retrieve updated files (#11182)
### What problem does this PR solve?

Confluence cannot retrieve updated files。

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-12 09:37:32 +08:00
Yongteng Lei
d81e4095de
Feat: Google drive supports web-based credentials (#11173)
### What problem does this PR solve?

 Google drive supports web-based credentials.

<img width="1204" height="612" alt="image"
src="https://github.com/user-attachments/assets/70291c63-a2dd-4a80-ae20-807fe034cdbc"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 17:21:08 +08:00
Yongteng Lei
df16a80f25
Feat: add initial Google Drive connector support (#11147)
### What problem does this PR solve?

This feature is primarily ported from the
[Onyx](https://github.com/onyx-dot-app/onyx) project with necessary
modifications. Thanks for such a brilliant project.

Minor: consistently use `google_drive` rather than `google_driver`.

<img width="566" height="731" alt="image"
src="https://github.com/user-attachments/assets/6f64e70e-881e-42c7-b45f-809d3e0024a4"
/>

<img width="904" height="830" alt="image"
src="https://github.com/user-attachments/assets/dfa7d1ef-819a-4a82-8c52-0999f48ed4a6"
/>

<img width="911" height="869" alt="image"
src="https://github.com/user-attachments/assets/39e792fb-9fbe-4f3d-9b3c-b2265186bc22"
/>

<img width="947" height="323" alt="image"
src="https://github.com/user-attachments/assets/27d70e96-d9c0-42d9-8c89-276919b6d61d"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 19:15:02 +08:00
Kevin Hu
dd1c8c5779
Feat: add auto parse to connector. (#11099)
### What problem does this PR solve?

#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-07 16:49:29 +08:00
Kevin Hu
34283d4db4
Feat: add data source to pipleline logs . (#11075)
### What problem does this PR solve?

#10953

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-07 11:43:59 +08:00
Jin Hai
af98763e27
Admin: add 'show version' (#11079)
### What problem does this PR solve?

```
admin> show version;
show_version
+-----------------------+
| version               |
+-----------------------+
| v0.21.0-241-gc6cf58d5 |
+-----------------------+
admin> \q
Goodbye!

```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 19:24:46 +08:00
Kevin Hu
3bd1fefe1f
Feat: debug sync data. (#11073)
### What problem does this PR solve?

#10953 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-06 16:48:04 +08:00
Yongteng Lei
23b81eae77
Feat: GraphRAG handle cancel gracefully (#11061)
### What problem does this PR solve?

 GraghRAG handle cancel gracefully. #10997.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-06 16:12:20 +08:00
Jin Hai
f98b24c9bf
Move api.settings to common.settings (#11036)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 09:36:38 +08:00
Jin Hai
02d10f8eda
Move var from rag.settings to common.globals (#11022)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 15:48:50 +08:00
Jin Hai
1a9215bc6f
Move some vars to globals (#11017)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 14:14:38 +08:00
Jin Hai
96c015fb85
Fix and refactor imports (#11010)
### What problem does this PR solve?

1. Move EMBEDDING_CFG to common.globals
2. Fix error imports
3. Move signal handles to common/signal_utils.py

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 11:07:54 +08:00