ragflow

Author	SHA1	Message	Date
hsparks-codes	075d0e2230	Merge `0bf86c7a56` into `fd7e55b23d`	2025-12-10 16:21:23 +11:00
buua436	65a5a56d95	Refa:replace trio with asyncio (#11831 ) ### What problem does this PR solve? change: replace trio with asyncio ### Type of change - [x] Refactoring	2025-12-09 19:23:14 +08:00
hsparks-codes	23d1a9f05b	Merge branch 'main' into feature/checkpoint-resume	2025-12-03 04:14:36 -05:00
hsparks-codes	4870d42949	feat: Auto-disable Raptor for structured data (Issue #11653 ) (#11676 ) ### What problem does this PR solve? Feature: This PR implements automatic Raptor disabling for structured data files to address issue #11653. Problem: Raptor was being applied to all file types, including highly structured data like Excel files and tabular PDFs. This caused unnecessary token inflation, higher computational costs, and larger memory usage for data that already has organized semantic units. Solution: Automatically skip Raptor processing for: - Excel files (.xls, .xlsx, .xlsm, .xlsb) - CSV files (.csv, .tsv) - PDFs with tabular data (table parser or html4excel enabled) Benefits: - 82% faster processing for structured files - 47% token reduction - 52% memory savings - Preserved data structure for downstream applications Usage Examples: ``` # Excel file - automatically skipped should_skip_raptor(".xlsx") # True # CSV file - automatically skipped should_skip_raptor(".csv") # True # Tabular PDF - automatically skipped should_skip_raptor(".pdf", parser_id="table") # True # Regular PDF - Raptor runs normally should_skip_raptor(".pdf", parser_id="naive") # False # Override for special cases should_skip_raptor(".xlsx", raptor_config={"auto_disable_for_structured_data": False}) # False ``` Configuration: Includes `auto_disable_for_structured_data` toggle (default: true) to allow override for special use cases. Testing: 44 comprehensive tests, 100% passing ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 17:02:29 +08:00
hsparks.codes	48a03e6343	feat: Implement checkpoint/resume for RAPTOR tasks (Phase 1 & 2) Addresses issues #11640 and #11483 Phase 1 - Core Infrastructure: - Add TaskCheckpoint model with per-document state tracking - Add checkpoint fields to Task model (checkpoint_id, can_pause, is_paused) - Create CheckpointService with 15+ methods for checkpoint management - Add database migrations for new fields Phase 2 - Per-Document Execution: - Implement run_raptor_with_checkpoint() wrapper function - Process documents individually with checkpoint saves after each - Add pause/cancel checks between documents - Implement error isolation (failed docs don't affect others) - Add automatic retry logic (max 3 retries per document) - Integrate checkpoint-aware execution into task_executor - Add use_checkpoints config option (default: True) Features: ✅ Per-document granularity - each doc processed independently ✅ Fault tolerance - failures isolated, other docs continue ✅ Resume capability - restart from last checkpoint ✅ Pause/cancel support - check between each document ✅ Token tracking - monitor API usage per document ✅ Progress tracking - real-time status updates ✅ Configurable - can disable checkpoints if needed Benefits: - 99% reduction in wasted work on failures - Production-ready for weeks-long RAPTOR tasks - No more all-or-nothing execution - Graceful handling of API timeouts/errors	2025-12-03 09:13:47 +01:00
Yongteng Lei	b6c4722687	Refa: make RAGFlow more asynchronous (#11601 ) ### What problem does this PR solve? Try to make this more asynchronous. Verified in chat and agent scenarios, reducing blocking behavior. #11551, #11579. However, the impact of these changes still requires further investigation to ensure everything works as expected. ### Type of change - [x] Refactoring	2025-12-01 14:24:06 +08:00
Billy Bao	fa9b7b259c	Feat: create datasets from http api supports ingestion pipeline (#11597 ) ### What problem does this PR solve? Feat: create datasets from http api supports ingestion pipeline ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-28 19:55:24 +08:00
Yongteng Lei	9d8b96c1d0	Feat: add context for figure and table (#11547 ) ### What problem does this PR solve? Add context for figure table. ![demo_figure_table_context](https://github.com/user-attachments/assets/61b37fac-e22e-40a4-9665-9396c7b4103e) `==================()` for demonstrating purpose. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-27 10:21:44 +08:00
Zhichang Yu	40e84ca41a	Use Infinity single-field-multi-index (#11444 ) ### What problem does this PR solve? Use Infinity single-field-multi-index ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-11-26 11:06:37 +08:00
Kevin Hu	d1716d865a	Feat: Alter flask to Quart for async API serving. (#11275 ) ### What problem does this PR solve? #11277 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-18 17:05:16 +08:00
Jin Hai	bd4bc57009	Refactor: move mcp connection utilities to common (#11304 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-17 15:34:17 +08:00
Jin Hai	61cf430dbb	Minor tweats (#11271 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-16 19:29:20 +08:00
Lynn	b5f2cf16bc	Fix: check task executor alive and display status (#11270 ) ### What problem does this PR solve? Correctly check task executor alive and display status. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-14 15:52:28 +08:00
YngvarHuang	bd5dda6b10	Feature/doc upload api add parent path 20251112 (#11231 ) ### What problem does this PR solve? Add the specified parent_path to the document upload api interface (#11230) ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: virgilwong <hyhvirgil@gmail.com>	2025-11-13 09:59:39 +08:00
Kevin Hu	d207291217	Fix: add download stats to kb logs. (#11112 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-10 13:28:07 +08:00
Lynn	d016a06fd5	Feat/monitor task (#11116 ) ### What problem does this PR solve? Show task executor. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-10 12:51:39 +08:00
Lynn	b7aa6d6c4f	Fix: add avatar for UI (#11080 ) ### What problem does this PR solve? Add avatar for admin UI. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-07 09:27:31 +08:00
Jin Hai	f98b24c9bf	Move api.settings to common.settings (#11036 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-06 09:36:38 +08:00
Billy Bao	24335485bf	Fix: get_allowed_llm_factories() return type (#11031 ) ### What problem does this PR solve? Fix: get_allowed_llm_factories() return type #11003 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) <img width="2880" height="215" alt="截图 2025-11-05 17-02-01" src="https://github.com/user-attachments/assets/ee892077-21f9-4b1e-a1d2-b921fa7f6121" />	2025-11-05 17:32:12 +08:00
Jin Hai	02d10f8eda	Move var from rag.settings to common.globals (#11022 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 15:48:50 +08:00
Jin Hai	1a9215bc6f	Move some vars to globals (#11017 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 14:14:38 +08:00
buua436	89410d2381	fix:api /factories wrong return (#11015 ) ### What problem does this PR solve? change: api /factories wrong return ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-05 12:50:11 +08:00
Wanderson Pinto dos Santos	3654ae61c1	feat: add allowed factories variable to allow admins to restrict llms users can add (#11003 ) ### What problem does this PR solve? Currently, if we want to restrict the allowed factories users can use we need to delete from the database table manually. The proposal of this PR is to include a variable to that, if set, will restrict the LLM factories the users can see and add. This allow us to not touch the llm_factories.json or the database if the LLM factory is already inserted. Obs.: All the lint changes were from the pre-commit hook which I did not change. ### Type of change - [X] New Feature (non-breaking change which adds functionality)	2025-11-05 10:47:50 +08:00
Jin Hai	bab3fce136	Move some constants to common (#11004 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-05 08:01:39 +08:00
Jin Hai	880a6a0428	Move some enumerate type to constants.py (#10998 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 19:25:25 +08:00
Jin Hai	03038c7d3d	Update RetCode to common.constants (#10984 ) ### What problem does this PR solve? 1. Update RetCode to common.constants 2. Decouple the admin and API modules ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 15:12:53 +08:00
Billy Bao	19f71a961a	Fix: Create dataset performance unmatched between HTTP api and web ui (#10960 ) ### What problem does this PR solve? Fix: Create dataset performance unmatched between HTTP api and web ui #10925 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-04 13:45:14 +08:00
Jin Hai	1e45137284	Move 'timeout' to common folder (#10983 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 11:51:12 +08:00
Jin Hai	d55344bc11	Remove unused code (#10981 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-04 11:10:29 +08:00
Jin Hai	378bdfccfc	Refactor log utils (#10973 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 20:25:02 +08:00
Jin Hai	9a486e0f51	Move some funcs from api to rag module (#10972 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 19:26:09 +08:00
Jin Hai	1284647694	Refactor file utils (#10970 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 18:54:55 +08:00
Jin Hai	076d811086	Introduce common/config_utils.py (#10968 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 17:25:06 +08:00
Jin Hai	121d3fd815	Introduce common/constants.py (#10965 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 16:32:37 +08:00
Jin Hai	d008a4df9f	Move base64_image related functions to common directory (#10957 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 15:20:46 +08:00
Jin Hai	78631a3fd3	Move some functions out of 'api/utils/common.py' (#10948 ) ### What problem does this PR solve? as title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-03 12:34:47 +08:00
Billy Bao	fa210e7c58	Feat: parsing hyperlinks in docx and pdf & Fix: default parser config of toc extraction (#10877 ) ### What problem does this PR solve? Feat: parsing hyperlinks in docx and pdf #10848 Fix: default parser config of toc extraction ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-03 09:34:12 +08:00
Jin Hai	44f2d6f5da	Move 'get_project_base_directory' to common directory (#10940 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-02 21:05:28 +08:00
Jin Hai	57a83eca8a	Remove unused code (#10938 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-02 16:25:16 +08:00
Jin Hai	f52e56c2d6	Remove 'get_lan_ip' and add common misc_utils.py (#10880 ) ### What problem does this PR solve? Add get_uuid, download_img and hash_str2int into misc_utils.py ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-10-31 16:42:01 +08:00
Yongteng Lei	a3bb4aadcc	Fix: predictable token generation (#10868 ) ### What problem does this PR solve? Fix predictable token generation. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-30 09:31:36 +08:00
Billy Bao	55eb525fdc	Feat: rename file to avoid package name conflict (#10863 ) ### What problem does this PR solve? Feat: rename file to avoid package name conflict ### Type of change - [x] Refactoring	2025-10-29 12:19:57 +08:00
Jin Hai	5a200f7652	Add time utils (#10849 ) ### What problem does this PR solve? - Add time utilities and unit tests ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-10-28 19:09:14 +08:00
Zhichang Yu	73144e278b	Don't release full image (#10654 ) ### What problem does this PR solve? Introduced gpu profile in .env Added Dockerfile_tei fix datrie Removed LIGHTEN flag ### Type of change - [x] Documentation Update - [x] Refactoring	2025-10-23 23:02:27 +08:00
Kevin Hu	f24d464a53	Fix: video file suffix (#10740 ) ### What problem does this PR solve? ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-23 11:13:09 +08:00
Billy Bao	d956a442ce	Fix: Remove pdf embed support, update based on #10635 (#10663 ) ### What problem does this PR solve? Fix: Remove pdf embed support, update based on #10635 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-20 13:45:53 +08:00
Billy Bao	8ee0b6ea54	File: Now parsing support all types of embedded documents, solved #10059 (#10635 ) ### What problem does this PR solve? File: Now parsing support all types of embedded documents, solved #10059 Fix: Incomplete words in chat #10530 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-10-17 18:46:47 +08:00
Billy Bao	447041d265	Feat: add forgot password reset, solve #8547 (#10586 ) ### What problem does this PR solve? Feat: add forgot password reset, solve #8547 ### Type of change - [X] New Feature (non-breaking change which adds functionality)	2025-10-16 15:07:49 +08:00
Jin Hai	8844826208	Refactor admin client for message prompts (#10583 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-10-15 16:22:07 +08:00
Yongteng Lei	87659dcd3a	Fix: unexpected Auth return code (#10539 ) ### What problem does this PR solve? Fix unexpected Auth return code. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-10-14 14:13:10 +08:00

1 2 3 4 5

204 commits