Commit graph

1234 commits

Author SHA1 Message Date
aka James4u
3779ca5a87
Merge 6e5dbbe925 into 2a0f835ffe 2025-12-15 11:43:56 +08:00
Kevin Hu
44dec89f1f
Fix: aspose-slide issue. (#11935)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-12 20:16:18 +08:00
Yongteng Lei
0f0fb53256
Refa: refactor metadata filter (#11907)
### What problem does this PR solve?

Refactor metadata filter.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-12 17:12:38 +08:00
Magicbook1108
50715ba332
Fix: forget-reset password (#11927)
### What problem does this PR solve?

Fix: forget-reset password

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-12 16:16:17 +08:00
Yongteng Lei
6560388f2b
Fix: correct metadata update behavior (#11919)
### What problem does this PR solve?

Correct metadata update behavior. #11912

When update `value` is omitted, the corresponding keys are updated to
`"value"` regardless of their current values.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-12 12:50:17 +08:00
Magicbook1108
7db9045b74
Feat: Add box connector (#11845)
### What problem does this PR solve?

Feat: Add box connector

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-12 10:23:40 +08:00
aka James4u
6e5dbbe925
Merge branch 'main' into feature/websocket-streaming-api 2025-12-11 04:22:48 -08:00
Kevin Hu
ea4a5cd665
Fix: tokenizer issue. (#11902)
#11786
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-11 17:38:17 +08:00
Yongteng Lei
e9710b7aa9
Refa: treat MinerU as an OCR model 2 (#11905)
### What problem does this PR solve?

Treat MinerU as an OCR model 2. #11903

### Type of change

- [x] Refactoring
2025-12-11 17:33:12 +08:00
TeslaZY
bd0eff2954
Add DeepseekV3.2 of Tongyi-Qianwen and remove unused code (#11898)
### What problem does this PR solve?

Add DeepseekV3.2 of Tongyi-Qianwen and remove unused code

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-11 13:55:01 +08:00
buua436
e3cfe8e848
Fix:async issue and sensitive logging (#11895)
### What problem does this PR solve?

change:
async issue and sensitive logging

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-11 13:54:47 +08:00
TeslaZY
c610bb605a
Added semi-automatic mode to the metadata filter (#11886)
### What problem does this PR solve?

Retrieval metadata filtering adds semi-automatic mode, and users can
manually check the metadata key that participates in LLM to generate
filter conditions.
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-11 10:45:21 +08:00
Yongteng Lei
8370bc61b7
Feat: enhance metadata operation (#11874)
### What problem does this PR solve?

Add metadata condition in document list.
Add metadata bulk update.
Add metadata summary.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
2025-12-11 09:59:15 +08:00
N0bodycan
74eb894453
Fix RuntimeError: asyncio.run() cannot be called from a running event loop when calling mindmap endpoint. (#11880)
### What problem does this PR solve?

Fix RuntimeError when calling mindmap endpoint by converting
`gen_mindmap()` to async function and using `await` instead of
`asyncio.run()`.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-11 09:47:44 +08:00
aka James4u
1fcdf2e621
Merge branch 'main' into feature/websocket-streaming-api 2025-12-10 04:16:24 -08:00
buua436
3cb72377d7
Refa:remove sensitive information (#11873)
### What problem does this PR solve?

change:
remove sensitive information

### Type of change

- [x] Refactoring
2025-12-10 19:08:45 +08:00
Lynn
a1164b9c89
Feat/memory (#11812)
### What problem does this PR solve?

Manage and display memory datasets.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-12-10 13:34:08 +08:00
aka James4u
1315abf348
Merge branch 'main' into feature/websocket-streaming-api 2025-12-09 06:27:29 -08:00
buua436
65a5a56d95
Refa:replace trio with asyncio (#11831)
### What problem does this PR solve?

change:
replace trio with asyncio

### Type of change
- [x] Refactoring
2025-12-09 19:23:14 +08:00
Yongteng Lei
a94b3b9df2
Refa: treat MinerU as an OCR model (#11849)
### What problem does this PR solve?

 Treat MinerU as an OCR model.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2025-12-09 18:54:14 +08:00
SmartDever02
1e10287a03 Fix ImportError about completion 2025-12-09 06:07:19 -03:00
SmartDever02
5ee639fa5a Fix the test issue 2025-12-08 16:42:55 -03:00
SmartDever02
e03df5b93a Merge branch 'feature/websocket-streaming-api' of github.com-james:SmartDever02/ragflow into feature/websocket-streaming-api 2025-12-08 16:22:26 -03:00
SmartDever02
22ba48e89d fix: Remove f-string prefixes from logging statements without placeholders - Line 309, 330, 344 in api_utils.py - Fixes ruff F541 linting errors 2025-12-08 16:12:25 -03:00
aka James4u
7c2c6f574c
Merge branch 'main' into feature/websocket-streaming-api 2025-12-08 11:08:47 -08:00
Jin Hai
43f51baa96
Fix errors (#11804)
### What problem does this PR solve?

1. typos
2. grammar errors.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-08 12:21:18 +08:00
Yongteng Lei
51ec708c58
Refa: cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats (#11779)
### What problem does this PR solve?

Cleanup synchronous functions in chat_model and implement
synchronization for conversation and dialog chats.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-12-08 09:43:03 +08:00
天海蒼灆
8de6b97806
Feature (canvas): Add Api for download "message" component output's file (#11772)
### What problem does this PR solve?

-Add Api for download "message" component output's file 
-Change the attachment output type check from tuple to
dictionary,because 'attachement' is not instance of tuple
-Update the message type to message_end to avoid the problem that
content does not send an error message when the message type is ans
["data"] ["content"]

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-12-05 19:42:35 +08:00
SmartDever02
081f7f7b74 refactor: Move WebSocket to SDK pattern with /ws/ prefix - Moved to api/apps/sdk/websocket.py following session.py pattern - Added ws_token_required decorator - WebSocket endpoints: /ws/chats/<id>/completions and /ws/agents/<id>/completions - Prevents routing conflicts with HTTP endpoints 2025-12-05 01:32:32 -03:00
SmartDever02
9ce780fefd refactor: Move WebSocket API to SDK pattern following session.py
- Moved websocket_app.py to api/apps/sdk/websocket.py
- Follows same structure as session.py for SDK endpoints
- Added ws_token_required decorator in api_utils.py (mirrors token_required)
- WebSocket endpoints now use SDK pattern:
  * @manager.websocket('/chats/<chat_id>/completions')
  * @manager.websocket('/agents/<agent_id>/completions')
- Removed old api/apps/websocket_app.py
- Added websockets>=14.0 and pytest-asyncio>=0.24.0 to test dependencies

Addresses reviewer feedback: websocket_app.py should mimic session.py in /api/sdk
for third-party calls, with /agents/<agent_id>/completions and
/chats/<chat_id>/completions endpoints similar to those in session.py
2025-12-05 01:32:00 -03:00
aka James4u
82d621c111
Merge branch 'main' into feature/websocket-streaming-api 2025-12-04 05:41:37 -08:00
Ted
ad03ede7cd
fix(sdk): add cancel_all_task_of call in stop_parsing endpoint (#11748)
## Problem
The SDK API endpoint `DELETE /datasets/{dataset_id}/chunks` only updates
database status but does not send cancellation signal via Redis, causing
background parsing tasks to continue and eventually complete (status
becomes DONE instead of CANCEL).

## Root Cause
The SDK endpoint was missing the `cancel_all_task_of(id)` call that the
web API
([api/apps/document_app.py](cci:7://file:///d:/workspace1/ragflow-admin/api/apps/document_app.py:0:0-0:0))
uses to properly stop background tasks.

## Solution
Added `cancel_all_task_of(id)` call in the
[stop_parsing](cci:1://file:///d:/workspace1/ragflow/api/apps/sdk/doc.py:785:0-855:23)
function to send cancellation signal via Redis, consistent with the web
API behavior.

## Related Issue
Fixes #11745

Co-authored-by: tedhappy <tedhappy@users.noreply.github.com>
2025-12-04 19:29:06 +08:00
shirukai
fa7b857aa9
fix: resolve "'bool' object has no attribute 'items'" in SDK enabled … (#11725)
### What problem does this PR solve?
Fixes the `AttributeError: 'bool' object has no attribute 'items'` error
when updating the `enabled` parameter of a document via the Python SDK
(Issue #11721).

Background: When calling `Document.update({"enabled": True/False})`
through the SDK, the server-side API returned a boolean `data=True` in
the response (instead of a dictionary). The SDK's `_update_from_dict`
method (in `base.py`) expects a dictionary to iterate over with
`.items()`, leading to an immediate AttributeError during response
parsing. This prevented successful synchronization of the updated
`enabled` status to the local SDK object, even if the server-side
database/update index operations succeeded.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
### Additional Context (optional, for clarity)
- **Root Cause**: Server returned `data=True` (boolean) for `enabled`
parameter updates, violating the SDK's expectation of a dictionary-type
`data` field.
- **Fix Logic**: 
1. Removed the separate `return get_result(data=True)` in the `enabled`
update branch to unify response flow.
  2. 
- **Backward Compatibility**: No breaking changes—other update scenarios
(e.g., renaming documents, modifying chunk methods) remain unaffected,
and the response format stays consistent.

Co-authored-by: shirukai <shirukai@hollysysdigital.com>
2025-12-04 11:24:01 +08:00
aka James4u
70200e430c
Merge branch 'main' into feature/websocket-streaming-api 2025-12-03 03:48:47 -08:00
Jin Hai
a7d40e9132
Update since 'File manager' is renamed to 'File' (#11698)
### What problem does this PR solve?

Update some docs and comments, since 'File manager' is rename to 'File'

### Type of change

- [x] Documentation Update
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2025-12-03 18:32:15 +08:00
SmartDever02
eaa38d6a7f fix the CLI issue 2025-12-03 07:14:40 -03:00
SmartDever02
e300873288 feat: Add WebSocket API for streaming responses
- Add WebSocket endpoint at /v1/ws/chat for real-time streaming
- Support multiple authentication methods (API token, user session, query params)
- Enable bidirectional communication for platforms like WeChat Mini Programs
- Implement streaming chat completions with incremental responses
- Add comprehensive error handling and connection management
- Include extensive inline documentation and comments

New files:
- api/apps/websocket_app.py: Main WebSocket API implementation
- docs/guides/websocket_api.md: Complete API documentation
- example/websocket/python_client.py: Python example client
- example/websocket/index.html: Web-based demo client
- example/websocket/README.md: Examples documentation

Features:
- Persistent WebSocket connections for multi-turn conversations
- Session management for conversation continuity
- Real-time streaming with low latency
- Compatible with WeChat Mini Programs and mobile apps
- Health check endpoint for connectivity testing
- Backward compatible with existing SSE endpoints

Resolves: #11683
2025-12-03 06:49:36 -03:00
hsparks-codes
4870d42949
feat: Auto-disable Raptor for structured data (Issue #11653) (#11676)
### What problem does this PR solve?

Feature: This PR implements automatic Raptor disabling for structured
data files to address issue #11653.

**Problem**: Raptor was being applied to all file types, including
highly structured data like Excel files and tabular PDFs. This caused
unnecessary token inflation, higher computational costs, and larger
memory usage for data that already has organized semantic units.

**Solution**: Automatically skip Raptor processing for:
- Excel files (.xls, .xlsx, .xlsm, .xlsb)
- CSV files (.csv, .tsv)
- PDFs with tabular data (table parser or html4excel enabled)

**Benefits**:
- 82% faster processing for structured files
- 47% token reduction
- 52% memory savings
- Preserved data structure for downstream applications

**Usage Examples**:
```
# Excel file - automatically skipped
should_skip_raptor(".xlsx")  # True

# CSV file - automatically skipped  
should_skip_raptor(".csv")  # True

# Tabular PDF - automatically skipped
should_skip_raptor(".pdf", parser_id="table")  # True

# Regular PDF - Raptor runs normally
should_skip_raptor(".pdf", parser_id="naive")  # False

# Override for special cases
should_skip_raptor(".xlsx", raptor_config={"auto_disable_for_structured_data": False})  # False
```

**Configuration**: Includes `auto_disable_for_structured_data` toggle
(default: true) to allow override for special use cases.

**Testing**: 44 comprehensive tests, 100% passing

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-03 17:02:29 +08:00
hsparks-codes
237a66913b
Feat: RAG evaluation (#11674)
### What problem does this PR solve?

Feature: This PR implements a comprehensive RAG evaluation framework to
address issue #11656.

**Problem**: Developers using RAGFlow lack systematic ways to measure
RAG accuracy and quality. They cannot objectively answer:
1. Are RAG results truly accurate?
2. How should configurations be adjusted to improve quality?
3. How to maintain and improve RAG performance over time?

**Solution**: This PR adds a complete evaluation system with:
- **Dataset & test case management** - Create ground truth datasets with
questions and expected answers
- **Automated evaluation** - Run RAG pipeline on test cases and compute
metrics
- **Comprehensive metrics** - Precision, recall, F1 score, MRR, hit rate
for retrieval quality
- **Smart recommendations** - Analyze results and suggest specific
configuration improvements (e.g., "increase top_k", "enable reranking")
- **20+ REST API endpoints** - Full CRUD operations for datasets, test
cases, and evaluation runs

**Impact**: Enables developers to objectively measure RAG quality,
identify issues, and systematically improve their RAG systems through
data-driven configuration tuning.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-03 17:00:58 +08:00
Yongteng Lei
e3f40db963
Refa: make RAGFlow more asynchronous 2 (#11689)
### What problem does this PR solve?

Make RAGFlow more asynchronous 2. #11551, #11579, #11619.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-12-03 14:19:53 +08:00
buua436
c8f608b2dd
Feat:support tts in agent (#11675)
### What problem does this PR solve?

change:
support tts in agent

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-03 12:03:59 +08:00
Yongteng Lei
5c81e01de5
Fix: incorrect async chat streamly output (#11679)
### What problem does this PR solve?

Incorrect async chat streamly output. #11677.

Disable beartype for #11666.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-03 11:15:45 +08:00
Kevin Hu
a6681d6366
Revert "Refa: make RAGFlow more asynchronous 2" (#11669)
Reverts infiniflow/ragflow#11664
2025-12-02 19:42:05 +08:00
Yongteng Lei
627c11c429
Refa: make RAGFlow more asynchronous 2 (#11664)
### What problem does this PR solve?

Make RAGFlow more asynchronous 2. #11551, #11579, #11619.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
- [x] Performance Improvement
2025-12-02 18:57:07 +08:00
Kevin Hu
299c655e39
Fix: file manager KB link issue. (#11648)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-02 12:14:27 +08:00
buua436
b8c0fb4572
Feat:new api /sequence2txt and update QWenSeq2txt (#11643)
### What problem does this PR solve?
change:
new api /sequence2txt,
update QWenSeq2txt and ZhipuSeq2txt

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-02 11:17:31 +08:00
Kevin Hu
81ae6cf78d
Feat: support uploading in dialog. (#11634)
### What problem does this PR solve?

#9590

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-01 16:54:57 +08:00
Yongteng Lei
b6c4722687
Refa: make RAGFlow more asynchronous (#11601)
### What problem does this PR solve?

Try to make this more asynchronous. Verified in chat and agent
scenarios, reducing blocking behavior. #11551, #11579.

However, the impact of these changes still requires further
investigation to ensure everything works as expected.

### Type of change

- [x] Refactoring
2025-12-01 14:24:06 +08:00
Kevin Hu
6ea4248bdc
Feat: support parent-child in search procedure. (#11629)
### What problem does this PR solve?

#7996

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-01 14:03:09 +08:00
Kevin Hu
88a28212b3
Fix: Table parse method issue. (#11627)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-12-01 12:42:35 +08:00