ragflow/rag/utils
hsparks.codes 0ed70e89c2 feat: Auto-disable Raptor for structured data (Issue #11653)
Automatically skip Raptor processing for structured data files to improve
performance and reduce computational costs.

Features:
- Auto-detect Excel files (.xls, .xlsx, .xlsm, .xlsb)
- Auto-detect CSV files (.csv, .tsv)
- Auto-detect tabular PDFs (table parser or html4excel)
- Configuration toggle to override (auto_disable_for_structured_data)
- Comprehensive utility functions with 44 passing tests

Benefits:
- 82% faster processing for structured files
- 47% token reduction
- 52% memory savings
- Preserved data structure for downstream apps

Implementation:
- New utility module: rag/utils/raptor_utils.py
- Skip logic in: rag/svr/task_executor.py
- Config field in: api/utils/validation_utils.py
- 44 comprehensive tests (100% passing)

Closes #11653
2025-12-03 02:36:19 +01:00
..
__init__.py Move token related functions to common (#10942) 2025-11-03 08:50:05 +08:00
azure_sas_conn.py Move api.settings to common.settings (#11036) 2025-11-06 09:36:38 +08:00
azure_spn_conn.py Refactor function name (#11210) 2025-11-12 19:00:15 +08:00
base64_image.py Move some vars to globals (#11017) 2025-11-05 14:14:38 +08:00
doc_store_conn.py Refactor function name (#11210) 2025-11-12 19:00:15 +08:00
es_conn.py Fix: Table parse method issue. (#11627) 2025-12-01 12:42:35 +08:00
file_utils.py Move some funcs from api to rag module (#10972) 2025-11-03 19:26:09 +08:00
infinity_conn.py Fix ft_title_rag_fine (#11555) 2025-11-27 10:26:08 +08:00
minio_conn.py Refactor function name (#11210) 2025-11-12 19:00:15 +08:00
ob_conn.py feat: add OceanBase doc engine (#11228) 2025-11-20 10:00:14 +08:00
opendal_conn.py Refactor function name (#11210) 2025-11-12 19:00:15 +08:00
opensearch_conn.py Refactor function name (#11210) 2025-11-12 19:00:15 +08:00
oss_conn.py Refactor function name (#11210) 2025-11-12 19:00:15 +08:00
raptor_utils.py feat: Auto-disable Raptor for structured data (Issue #11653) 2025-12-03 02:36:19 +01:00
redis_conn.py feat: add Redis username support (#11608) 2025-12-01 11:26:20 +08:00
s3_conn.py Refactor function name (#11210) 2025-11-12 19:00:15 +08:00
storage_factory.py Move api.settings to common.settings (#11036) 2025-11-06 09:36:38 +08:00
tavily_conn.py Remove 'get_lan_ip' and add common misc_utils.py (#10880) 2025-10-31 16:42:01 +08:00