ragflow/rag/svr
hsparks.codes 0ed70e89c2 feat: Auto-disable Raptor for structured data (Issue #11653)
Automatically skip Raptor processing for structured data files to improve
performance and reduce computational costs.

Features:
- Auto-detect Excel files (.xls, .xlsx, .xlsm, .xlsb)
- Auto-detect CSV files (.csv, .tsv)
- Auto-detect tabular PDFs (table parser or html4excel)
- Configuration toggle to override (auto_disable_for_structured_data)
- Comprehensive utility functions with 44 passing tests

Benefits:
- 82% faster processing for structured files
- 47% token reduction
- 52% memory savings
- Preserved data structure for downstream apps

Implementation:
- New utility module: rag/utils/raptor_utils.py
- Skip logic in: rag/svr/task_executor.py
- Config field in: api/utils/validation_utils.py
- 44 comprehensive tests (100% passing)

Closes #11653
2025-12-03 02:36:19 +01:00
..
cache_file_svr.py Refactor function name (#11210) 2025-11-12 19:00:15 +08:00
discord_svr.py Use consistent log file names, introduced initLogger (#3403) 2024-11-14 17:13:48 +08:00
sync_data_source.py Refactor: better describe how to get prefix for sync data source (#11636) 2025-12-01 17:46:44 +08:00
task_executor.py feat: Auto-disable Raptor for structured data (Issue #11653) 2025-12-03 02:36:19 +01:00