ragflow/docs/guides
cutiechi 8f9bcb1c74
Feat: make document parsing and embedding batch sizes configurable via environment variables (#8266)
### Description

This PR introduces two new environment variables, ‎`DOC_BULK_SIZE` and
‎`EMBEDDING_BATCH_SIZE`, to allow flexible tuning of batch sizes for
document parsing and embedding vectorization in RAGFlow. By making these
parameters configurable, users can optimize performance and resource
usage according to their hardware capabilities and workload
requirements.

### What problem does this PR solve?

Previously, the batch sizes for document parsing and embedding were
hardcoded, limiting the ability to adjust throughput and memory
consumption. This PR enables users to set these values via environment
variables (in ‎`.env`, Helm chart, or directly in the deployment
environment), improving flexibility and scalability for both small and
large deployments.

- ‎`DOC_BULK_SIZE`: Controls how many document chunks are processed in a
single batch during document parsing (default: 4).
- ‎`EMBEDDING_BATCH_SIZE`: Controls how many text chunks are processed
in a single batch during embedding vectorization (default: 16).

This change updates the codebase, documentation, and configuration files
to reflect the new options.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):

### Additional context
- Updated ‎`.env`, ‎`helm/values.yaml`, and documentation to describe
the new variables.
- Modified relevant code paths to use the environment variables instead
of hardcoded values.
- Users can now tune these parameters to achieve better throughput or
reduce memory usage as needed.

Before:
Default value:
<img width="643" alt="image"
src="https://github.com/user-attachments/assets/086e1173-18f3-419d-a0f5-68394f63866a"
/>
After:
10x:
<img width="777" alt="image"
src="https://github.com/user-attachments/assets/5722bbc0-0bcb-4536-b928-077031e550f1"
/>
2025-06-16 13:40:47 +08:00
..
agent Feat: make document parsing and embedding batch sizes configurable via environment variables (#8266) 2025-06-16 13:40:47 +08:00
chat Docs: Miscellaneous editorial updates (#7865) 2025-05-26 19:36:35 +08:00
dataset Docs: Miscellaneous editorial updates (#8237) 2025-06-13 09:46:24 +08:00
models Docs: RAGFlow does not suppport batch metadata setting (#7795) 2025-05-22 17:02:23 +08:00
team Docs: Added a guide on switching document engine (#7692) 2025-05-16 19:02:36 +08:00
_category_.json Added release notes for v0.13.0 (#3691) 2024-11-27 19:26:03 +08:00
ai_search.md Docs: Miscellaneous UI updates (#8031) 2025-06-04 09:31:41 +08:00
manage_files.md Docs: update for v0.19.0 (#7823) 2025-05-23 18:25:47 +08:00
run_health_check.md Miscellaneous doc updates and refactored team management doc. (#6730) 2025-04-01 19:05:30 +08:00
tracing.mdx Docs: update for v0.19.0 (#7823) 2025-05-23 18:25:47 +08:00
upgrade_ragflow.mdx Docs: update for v0.19.0 (#7823) 2025-05-23 18:25:47 +08:00