ragflow/api
hsparks.codes 0ed70e89c2 feat: Auto-disable Raptor for structured data (Issue #11653)
Automatically skip Raptor processing for structured data files to improve
performance and reduce computational costs.

Features:
- Auto-detect Excel files (.xls, .xlsx, .xlsm, .xlsb)
- Auto-detect CSV files (.csv, .tsv)
- Auto-detect tabular PDFs (table parser or html4excel)
- Configuration toggle to override (auto_disable_for_structured_data)
- Comprehensive utility functions with 44 passing tests

Benefits:
- 82% faster processing for structured files
- 47% token reduction
- 52% memory savings
- Preserved data structure for downstream apps

Implementation:
- New utility module: rag/utils/raptor_utils.py
- Skip logic in: rag/svr/task_executor.py
- Config field in: api/utils/validation_utils.py
- 44 comprehensive tests (100% passing)

Closes #11653
2025-12-03 02:36:19 +01:00
..
apps Fix: file manager KB link issue. (#11648) 2025-12-02 12:14:27 +08:00
common Feat:admin api (#10642) 2025-10-18 16:09:48 +08:00
db Feat:new api /sequence2txt and update QWenSeq2txt (#11643) 2025-12-02 11:17:31 +08:00
utils feat: Auto-disable Raptor for structured data (Issue #11653) 2025-12-03 02:36:19 +01:00
__init__.py Update comments (#4569) 2025-01-21 20:52:28 +08:00
constants.py Introduce common/constants.py (#10965) 2025-11-03 16:32:37 +08:00
ragflow_server.py Refa: make RAGFlow more asynchronous (#11601) 2025-12-01 14:24:06 +08:00
settings.py Move api.settings to common.settings (#11036) 2025-11-06 09:36:38 +08:00
validation.py Fix errors detected by Ruff (#3918) 2024-12-08 14:21:12 +08:00