Commit graph

1205 commits

Author SHA1 Message Date
buua436
65a5a56d95
Refa:replace trio with asyncio (#11831)
### What problem does this PR solve?

change:
replace trio with asyncio

### Type of change
- [x] Refactoring
2025-12-09 19:23:14 +08:00
Yongteng Lei
a94b3b9df2
Refa: treat MinerU as an OCR model (#11849)
### What problem does this PR solve?

 Treat MinerU as an OCR model.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2025-12-09 18:54:14 +08:00
Jin Hai
43f51baa96
Fix errors (#11804)
### What problem does this PR solve?

1. typos
2. grammar errors.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-08 12:21:18 +08:00
Yongteng Lei
51ec708c58
Refa: cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats (#11779)
### What problem does this PR solve?

Cleanup synchronous functions in chat_model and implement
synchronization for conversation and dialog chats.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-12-08 09:43:03 +08:00
天海蒼灆
8de6b97806
Feature (canvas): Add Api for download "message" component output's file (#11772)
### What problem does this PR solve?

-Add Api for download "message" component output's file 
-Change the attachment output type check from tuple to
dictionary,because 'attachement' is not instance of tuple
-Update the message type to message_end to avoid the problem that
content does not send an error message when the message type is ans
["data"] ["content"]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-12-05 19:42:35 +08:00
Ted
ad03ede7cd
fix(sdk): add cancel_all_task_of call in stop_parsing endpoint (#11748)
## Problem
The SDK API endpoint `DELETE /datasets/{dataset_id}/chunks` only updates
database status but does not send cancellation signal via Redis, causing
background parsing tasks to continue and eventually complete (status
becomes DONE instead of CANCEL).

## Root Cause
The SDK endpoint was missing the `cancel_all_task_of(id)` call that the
web API
([api/apps/document_app.py](cci:7://file:///d:/workspace1/ragflow-admin/api/apps/document_app.py:0:0-0:0))
uses to properly stop background tasks.

## Solution
Added `cancel_all_task_of(id)` call in the
[stop_parsing](cci:1://file:///d:/workspace1/ragflow/api/apps/sdk/doc.py:785:0-855:23)
function to send cancellation signal via Redis, consistent with the web
API behavior.

## Related Issue
Fixes #11745

Co-authored-by: tedhappy <tedhappy@users.noreply.github.com>
2025-12-04 19:29:06 +08:00
shirukai
fa7b857aa9
fix: resolve "'bool' object has no attribute 'items'" in SDK enabled … (#11725)
### What problem does this PR solve?
Fixes the `AttributeError: 'bool' object has no attribute 'items'` error
when updating the `enabled` parameter of a document via the Python SDK
(Issue #11721).

Background: When calling `Document.update({"enabled": True/False})`
through the SDK, the server-side API returned a boolean `data=True` in
the response (instead of a dictionary). The SDK's `_update_from_dict`
method (in `base.py`) expects a dictionary to iterate over with
`.items()`, leading to an immediate AttributeError during response
parsing. This prevented successful synchronization of the updated
`enabled` status to the local SDK object, even if the server-side
database/update index operations succeeded.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
### Additional Context (optional, for clarity)
- **Root Cause**: Server returned `data=True` (boolean) for `enabled`
parameter updates, violating the SDK's expectation of a dictionary-type
`data` field.
- **Fix Logic**: 
1. Removed the separate `return get_result(data=True)` in the `enabled`
update branch to unify response flow.
  2. 
- **Backward Compatibility**: No breaking changes—other update scenarios
(e.g., renaming documents, modifying chunk methods) remain unaffected,
and the response format stays consistent.

Co-authored-by: shirukai <shirukai@hollysysdigital.com>
2025-12-04 11:24:01 +08:00
Jin Hai
a7d40e9132
Update since 'File manager' is renamed to 'File' (#11698)
### What problem does this PR solve?

Update some docs and comments, since 'File manager' is rename to 'File'

### Type of change

- [x] Documentation Update
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-12-03 18:32:15 +08:00
hsparks-codes
4870d42949
feat: Auto-disable Raptor for structured data (Issue #11653) (#11676)
### What problem does this PR solve?

Feature: This PR implements automatic Raptor disabling for structured
data files to address issue #11653.

**Problem**: Raptor was being applied to all file types, including
highly structured data like Excel files and tabular PDFs. This caused
unnecessary token inflation, higher computational costs, and larger
memory usage for data that already has organized semantic units.

**Solution**: Automatically skip Raptor processing for:
- Excel files (.xls, .xlsx, .xlsm, .xlsb)
- CSV files (.csv, .tsv)
- PDFs with tabular data (table parser or html4excel enabled)

**Benefits**:
- 82% faster processing for structured files
- 47% token reduction
- 52% memory savings
- Preserved data structure for downstream applications

**Usage Examples**:
```
# Excel file - automatically skipped
should_skip_raptor(".xlsx")  # True

# CSV file - automatically skipped  
should_skip_raptor(".csv")  # True

# Tabular PDF - automatically skipped
should_skip_raptor(".pdf", parser_id="table")  # True

# Regular PDF - Raptor runs normally
should_skip_raptor(".pdf", parser_id="naive")  # False

# Override for special cases
should_skip_raptor(".xlsx", raptor_config={"auto_disable_for_structured_data": False})  # False
```

**Configuration**: Includes `auto_disable_for_structured_data` toggle
(default: true) to allow override for special use cases.

**Testing**: 44 comprehensive tests, 100% passing

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-03 17:02:29 +08:00
hsparks-codes
237a66913b
Feat: RAG evaluation (#11674)
### What problem does this PR solve?

Feature: This PR implements a comprehensive RAG evaluation framework to
address issue #11656.

**Problem**: Developers using RAGFlow lack systematic ways to measure
RAG accuracy and quality. They cannot objectively answer:
1. Are RAG results truly accurate?
2. How should configurations be adjusted to improve quality?
3. How to maintain and improve RAG performance over time?

**Solution**: This PR adds a complete evaluation system with:
- **Dataset & test case management** - Create ground truth datasets with
questions and expected answers
- **Automated evaluation** - Run RAG pipeline on test cases and compute
metrics
- **Comprehensive metrics** - Precision, recall, F1 score, MRR, hit rate
for retrieval quality
- **Smart recommendations** - Analyze results and suggest specific
configuration improvements (e.g., "increase top_k", "enable reranking")
- **20+ REST API endpoints** - Full CRUD operations for datasets, test
cases, and evaluation runs

**Impact**: Enables developers to objectively measure RAG quality,
identify issues, and systematically improve their RAG systems through
data-driven configuration tuning.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-03 17:00:58 +08:00
Yongteng Lei
e3f40db963
Refa: make RAGFlow more asynchronous 2 (#11689)
### What problem does this PR solve?

Make RAGFlow more asynchronous 2. #11551, #11579, #11619.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-12-03 14:19:53 +08:00
buua436
c8f608b2dd
Feat:support tts in agent (#11675)
### What problem does this PR solve?

change:
support tts in agent

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-03 12:03:59 +08:00
Yongteng Lei
5c81e01de5
Fix: incorrect async chat streamly output (#11679)
### What problem does this PR solve?

Incorrect async chat streamly output. #11677.

Disable beartype for #11666.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-03 11:15:45 +08:00
Kevin Hu
a6681d6366
Revert "Refa: make RAGFlow more asynchronous 2" (#11669)
Reverts infiniflow/ragflow#11664
2025-12-02 19:42:05 +08:00
Yongteng Lei
627c11c429
Refa: make RAGFlow more asynchronous 2 (#11664)
### What problem does this PR solve?

Make RAGFlow more asynchronous 2. #11551, #11579, #11619.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
- [x] Performance Improvement
2025-12-02 18:57:07 +08:00
Kevin Hu
299c655e39
Fix: file manager KB link issue. (#11648)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-02 12:14:27 +08:00
buua436
b8c0fb4572
Feat:new api /sequence2txt and update QWenSeq2txt (#11643)
### What problem does this PR solve?
change:
new api /sequence2txt,
update QWenSeq2txt and ZhipuSeq2txt

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-02 11:17:31 +08:00
Kevin Hu
81ae6cf78d
Feat: support uploading in dialog. (#11634)
### What problem does this PR solve?

#9590

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-01 16:54:57 +08:00
Yongteng Lei
b6c4722687
Refa: make RAGFlow more asynchronous (#11601)
### What problem does this PR solve?

Try to make this more asynchronous. Verified in chat and agent
scenarios, reducing blocking behavior. #11551, #11579.

However, the impact of these changes still requires further
investigation to ensure everything works as expected.

### Type of change

- [x] Refactoring
2025-12-01 14:24:06 +08:00
Kevin Hu
6ea4248bdc
Feat: support parent-child in search procedure. (#11629)
### What problem does this PR solve?

#7996

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-01 14:03:09 +08:00
Kevin Hu
88a28212b3
Fix: Table parse method issue. (#11627)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-01 12:42:35 +08:00
dzikus
9a8ce9d3e2
fix: increase Quart RESPONSE_TIMEOUT and BODY_TIMEOUT for slow LLM responses (#11612)
### What problem does this PR solve?

Quart framework has default RESPONSE_TIMEOUT and BODY_TIMEOUT of 60
seconds.
This causes the frontend chat to hang exactly after 60 seconds when
using
slow LLM backends (e.g., Ollama on CPU, or remote APIs with high
latency).

This fix adds configurable timeout settings via environment variables
with
sensible defaults (600 seconds = 10 minutes) to match other timeout
configurations in RAGFlow.

Fixes issues with chat timeout when:
- Using local Ollama on CPU (response time ~2 minutes)
- Using remote LLM APIs with high latency
- Processing complex RAG queries with many chunks

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Grzegorz Sterniczuk <grzegorz@sternicz.uk>
2025-12-01 11:26:34 +08:00
Billy Bao
fa9b7b259c
Feat: create datasets from http api supports ingestion pipeline (#11597)
### What problem does this PR solve?

Feat: create datasets from http api supports ingestion pipeline

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-28 19:55:24 +08:00
Kevin Hu
14616cf845
Feat: add child parent chunking method in backend. (#11598)
### What problem does this PR solve?

#7996

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-28 19:25:32 +08:00
darion-yaphet
918d5a9ff8
[issue-11572]fix:metadata_condition filtering failed (#11573)
### What problem does this PR solve?

When using the 'metadata_condition' for metadata filtering, if no
documents match the filtering criteria, the system will return the
search results of all documents instead of returning an empty result.

When the metadata_condition has conditions but no matching documents,
simply return an empty result.
#11572

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Chenguang Wang <chenguangwang@deepglint.com>
2025-11-28 14:04:14 +08:00
Billy Bao
cf7fdd274b
Feat: add gmail connector (#11549)
### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-28 13:09:40 +08:00
Yongteng Lei
9d8b96c1d0
Feat: add context for figure and table (#11547)
### What problem does this PR solve?

Add context for figure table.



![demo_figure_table_context](https://github.com/user-attachments/assets/61b37fac-e22e-40a4-9665-9396c7b4103e)


`==================()` for demonstrating purpose. 
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-27 10:21:44 +08:00
天海蒼灆
a9259917c6
fix(files): replace hard coded status codes with constants (#11544)
### What problem does this PR solve?

To solve the problem of error reporting caused by type errors when
various types of exception returns are triggered

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-27 09:41:24 +08:00
Levi
12979a3f21
feat: improve metadata handling in connector service (#11421)
### What problem does this PR solve?

- Update sync data source to handle metadata properly

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-11-26 19:55:48 +08:00
Zhichang Yu
40e84ca41a
Use Infinity single-field-multi-index (#11444)
### What problem does this PR solve?

Use Infinity single-field-multi-index

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-11-26 11:06:37 +08:00
Kevin Hu
f5faf0c94f
Feat: support operator in/not in for metadata filter. (#11503)
### What problem does this PR solve?

#11376 #11378

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-25 12:44:26 +08:00
zhipeng
d5f8548200
Allow create super user when start rag server. (#10634)
### What problem does this PR solve?

New options for rag server scripts to create the super admin user when
start server.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2025-11-24 19:02:08 +08:00
Billy Bao
1009819801
Fix: coroutine object has no attribute get (#11472)
### What problem does this PR solve?

Fix: coroutine object has no attribute get

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-24 12:21:33 +08:00
Kevin Hu
249296e417
Feat: API supports toc_enhance. (#11437)
### What problem does this PR solve?

Close #11433

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-21 14:51:58 +08:00
Kevin Hu
820934fc77
Fix: no result if metadata returns none. (#11412)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 19:51:25 +08:00
Kevin Hu
06cef71ba6
Feat: add or logic operations for meta data filters. (#11404)
### What problem does this PR solve?

#11376 #11387

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 14:31:12 +08:00
buua436
7c6d30f4c8
Fix:RagFlow not starting with Postgres DB (#11398)
### What problem does this PR solve?
issue:
#11293 
change:
RagFlow not starting with Postgres DB
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 12:49:13 +08:00
天海蒼灆
9f715d6bc2
Feature (canvas): Add mind tagging support (#11359)
### What problem does this PR solve?
Resolve the issue of missing thinking labels when viewing pre-existing
conversations
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 10:11:28 +08:00
Kevin Hu
c43bf1dcf5
Fix: refine error msg. (#11380)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 19:10:45 +08:00
Kevin Hu
1c201c4d54
Fix: circle imports issue. (#11374)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-19 16:13:21 +08:00
Jin Hai
d1dcf3b43c
Refactor /stats API (#11363)
### What problem does this PR solve?

One loop to get better performance

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-19 12:27:45 +08:00
Kevin Hu
d1716d865a
Feat: Alter flask to Quart for async API serving. (#11275)
### What problem does this PR solve?

#11277

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-18 17:05:16 +08:00
Yongteng Lei
341e5904c8
Fix: No results can be found through the API /api/v1/dify/retrieval (#11338)
### What problem does this PR solve?

No results can be found through the API /api/v1/dify/retrieval. #11307 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 15:42:31 +08:00
buua436
ded9bf80c5
Fix:limit random sampling range in check_embedding (#11337)
### What problem does this PR solve?
issue:
[#11319](https://github.com/infiniflow/ragflow/issues/11319)
change:
limit random sampling range in check_embedding

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 15:24:27 +08:00
buua436
912b6b023e
fix: update check_embedding failed info (#11321)
### What problem does this PR solve?
change:
update check_embedding failed info

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 09:39:45 +08:00
Jin Hai
bd4bc57009
Refactor: move mcp connection utilities to common (#11304)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-17 15:34:17 +08:00
Billy Bao
0569b50fed
Fix: create dataset return type inconsistent (#11272)
### What problem does this PR solve?

Fix: create dataset return type inconsistent #11167 
 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-17 15:27:19 +08:00
Scott Davidson
6b64641042
Fix: default model base url extraction logic (#11263)
### What problem does this PR solve?

Fixes an issue where default models which used the same factory but
different base URLs would all be initialised with the default chat
model's base URL and would ignore e.g. the embedding model's base URL
config.

For example, with the following service config, the embedding and
reranker models would end up using the base URL for the default chat
model (i.e. `llm1.example.com`):

```yaml
ragflow:
  service_conf:
    user_default_llm:
      factory: OpenAI-API-Compatible
      api_key: not-used
      default_models:
        chat_model:
          name: llm1
          base_url: https://llm1.example.com/v1
        embedding_model:
          name: llm2
          base_url: https://llm2.example.com/v1
        rerank_model:
          name: llm3
          base_url: https://llm3.example.com/v1/rerank

  llm_factories:
    factory_llm_infos:
    - name: OpenAI-API-Compatible
      logo: ""
      tags: "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION"
      status: "1"
      llm:
        - llm_name: llm1
          base_url: 'https://llm1.example.com/v1'
          api_key: not-used
          tags: "LLM,CHAT,IMAGE2TEXT"
          max_tokens: 100000
          model_type: chat
          is_tools: false

        - llm_name: llm2
          base_url: https://llm2.example.com/v1
          api_key: not-used
          tags: "TEXT EMBEDDING"
          max_tokens: 10000
          model_type: embedding

        - llm_name: llm3
          base_url: https://llm3.example.com/v1/rerank
          api_key: not-used
          tags: "RERANK,1k"
          max_tokens: 10000
          model_type: rerank
```

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2025-11-17 14:21:27 +08:00
Jin Hai
61cf430dbb
Minor tweats (#11271)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-16 19:29:20 +08:00
Billy Bao
68e3b33ae4
Feat: extract message output to file (#11251)
### What problem does this PR solve?

Feat: extract message output to file

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-14 19:52:11 +08:00