ragflow/api/utils
hsparks.codes 0ed70e89c2 feat: Auto-disable Raptor for structured data (Issue #11653)
Automatically skip Raptor processing for structured data files to improve
performance and reduce computational costs.

Features:
- Auto-detect Excel files (.xls, .xlsx, .xlsm, .xlsb)
- Auto-detect CSV files (.csv, .tsv)
- Auto-detect tabular PDFs (table parser or html4excel)
- Configuration toggle to override (auto_disable_for_structured_data)
- Comprehensive utility functions with 44 passing tests

Benefits:
- 82% faster processing for structured files
- 47% token reduction
- 52% memory savings
- Preserved data structure for downstream apps

Implementation:
- New utility module: rag/utils/raptor_utils.py
- Skip logic in: rag/svr/task_executor.py
- Config field in: api/utils/validation_utils.py
- 44 comprehensive tests (100% passing)

Closes #11653
2025-12-03 02:36:19 +01:00
..
__init__.py Remove 'get_lan_ip' and add common misc_utils.py (#10880) 2025-10-31 16:42:01 +08:00
api_utils.py Refa: make RAGFlow more asynchronous (#11601) 2025-12-01 14:24:06 +08:00
base64_image.py Move base64_image related functions to common directory (#10957) 2025-11-03 15:20:46 +08:00
commands.py Feat: Alter flask to Quart for async API serving. (#11275) 2025-11-18 17:05:16 +08:00
common.py Move some functions out of 'api/utils/common.py' (#10948) 2025-11-03 12:34:47 +08:00
configs.py Introduce common/config_utils.py (#10968) 2025-11-03 17:25:06 +08:00
crypt.py Move 'get_project_base_directory' to common directory (#10940) 2025-11-02 21:05:28 +08:00
email_templates.py Minor tweats (#11271) 2025-11-16 19:29:20 +08:00
file_utils.py Feature/doc upload api add parent path 20251112 (#11231) 2025-11-13 09:59:39 +08:00
health_utils.py Fix: check task executor alive and display status (#11270) 2025-11-14 15:52:28 +08:00
json_encode.py Minor tweats (#11271) 2025-11-16 19:29:20 +08:00
log_utils.py Refactor log utils (#10973) 2025-11-03 20:25:02 +08:00
validation_utils.py feat: Auto-disable Raptor for structured data (Issue #11653) 2025-12-03 02:36:19 +01:00
web_utils.py Feat: Alter flask to Quart for async API serving. (#11275) 2025-11-18 17:05:16 +08:00