From 8905f8fab993995d87cac97c74814632f86348b8 Mon Sep 17 00:00:00 2001
From: Mendon Kissling <59585235+mendonk@users.noreply.github.com>
Date: Wed, 1 Oct 2025 14:48:23 -0400
Subject: [PATCH] add-config-yaml-section-combine-env-vars
---
docs/docs/reference/configuration.mdx | 235 +++++++++++++++++---------
docs/sidebars.js | 2 +-
2 files changed, 156 insertions(+), 81 deletions(-)
diff --git a/docs/docs/reference/configuration.mdx b/docs/docs/reference/configuration.mdx
index 105cb7c5..cb29105f 100644
--- a/docs/docs/reference/configuration.mdx
+++ b/docs/docs/reference/configuration.mdx
@@ -1,90 +1,177 @@
---
-title: Environment variables and configuration values
+title: Environment variables
slug: /reference/configuration
---
-OpenRAG supports multiple configuration methods with the following priority, from highest to lowest:
+import Icon from "@site/src/components/icon/icon";
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
-1. [Configuration file (`config.yaml`)](#openrag-config-variables) - The `config.yaml` file is generated with values input during [Application onboarding](/install#application-onboarding), and controls the [OpenRAG configuration variables](#openrag-config-variables).
-2. [Environment variables](#environment-variables) - Environment variables control how OpenRAG connects to services. Environment variables in the `.env` control underlying services such as Langflow authentication, Oauth settings, and OpenSearch security.
-3. [Langflow runtime overrides](#langflow-runtime-overrides)
-4. [Default or fallback values](#default-values-and-fallbacks)
+OpenRAG recognizes [supported environment variables](#supported-environment-variables) from the following sources:
-## OpenRAG configuration variables {#openrag-config-variables}
+* **[Environment variables](#supported-environment-variables)** - Values set in `.env` or `docker-compose.yml` file.
+* **[Configuration file variables (`config.yaml`)](#configuration-file)** - Values generated during application onboarding and saved to `config.yaml`.
+* **[Langflow runtime overrides](#langflow-runtime-overrides)** - Langflow components may tweak environment variables at runtime.
+* **[Default or fallback values](#default-values-and-fallbacks)** - These values are default or fallback values if OpenRAG doesn't find a value.
-These values control what the OpenRAG application does.
+## Configure environment variables
-### Provider settings
+Environment variables can be set in a `.env` or `docker-compose.yml` file.
-| Variable | Description | Default |
-| -------------------- | ---------------------------------------- | -------- |
-| `MODEL_PROVIDER` | Model provider (openai, anthropic, etc.) | `openai` |
-| `PROVIDER_API_KEY` | API key for the model provider. | |
-| `PROVIDER_ENDPOINT` | Custom provider endpoint. Only used for IBM or Ollama providers. | |
-| `PROVIDER_PROJECT_ID`| Project ID for providers. Only required for the IBM watsonx.ai provider. | |
-| `OPENAI_API_KEY` | OpenAI API key. | |
+### Precedence
-### Knowledge settings
+Environment variables always take precedence over other variables, except when the same variable exists in both [config.yaml](#configuration-file) and the `.env`. In this case, the variable in `config.yaml` will take precedence.
-| Variable | Description | Default |
-| ------------------------------ | --------------------------------------- | ------------------------ |
-| `EMBEDDING_MODEL` | Embedding model for vector search. | `text-embedding-3-small` |
-| `CHUNK_SIZE` | Text chunk size for document processing. | `1000` |
-| `CHUNK_OVERLAP` | Overlap between chunks. | `200` |
-| `OCR_ENABLED` | Enable OCR for image processing. | `true` |
-| `PICTURE_DESCRIPTIONS_ENABLED` | Enable picture descriptions. | `false` |
+### Set environment variables
-### Agent settings
+To set environment variables, do the following:
-| Variable | Description | Default |
-| --------------- | --------------------------------- | ------------------------ |
-| `LLM_MODEL` | Language model for the chat agent. | `gpt-4o-mini` |
-| `SYSTEM_PROMPT` | System prompt for the agent. | "You are a helpful AI assistant with access to a knowledge base. Answer questions based on the provided context." |
+
+
-## Environment variables
+Stop OpenRAG, set the values in the .env file, and then start OpenRAG.
+```bash
+OPENAI_API_KEY=your-api-key-here
+EMBEDDING_MODEL=text-embedding-3-small
+CHUNK_SIZE=1000
+```
+
-## Required variables
+
+Stop OpenRAG, set the values in the `docker-compose.yml` file, and then start OpenRAG.
+```yaml
+environment:
+ - OPENAI_API_KEY=your-api-key-here
+ - EMBEDDING_MODEL=text-embedding-3-small
+ - CHUNK_SIZE=1000
+```
-| Variable | Description |
-| ----------------------------- | ------------------------------------------- |
-| `OPENAI_API_KEY` | Your OpenAI API key |
-| `OPENSEARCH_PASSWORD` | Password for OpenSearch admin user |
-| `LANGFLOW_SUPERUSER` | Langflow admin username |
-| `LANGFLOW_SUPERUSER_PASSWORD` | Langflow admin password |
-| `LANGFLOW_CHAT_FLOW_ID` | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
-| `LANGFLOW_INGEST_FLOW_ID` | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
-| `NUDGES_FLOW_ID` | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
+
+
-## Ingestion configuration
+## Supported environment variables
-| Variable | Description |
-| ------------------------------ | ------------------------------------------------------ |
-| `DISABLE_INGEST_WITH_LANGFLOW` | Disable Langflow ingestion pipeline. Default: `false`. |
+All OpenRAG configuration can be controlled through environment variables.
-- `false` or unset: Uses Langflow pipeline (upload → ingest → delete).
-- `true`: Uses traditional OpenRAG processor for document ingestion.
+### AI provider settings
-## Optional variables
+Configure which AI models and providers OpenRAG uses for language processing and embeddings.
+For more information, see [Application onboarding](/install#application-onboarding).
-| Variable | Description |
-| ------------------------------------------------------------------------- | ------------------------------------------------------------------ |
-| `OPENSEARCH_HOST` | OpenSearch host (default: `localhost`) |
-| `OPENSEARCH_PORT` | OpenSearch port (default: `9200`) |
-| `OPENSEARCH_USERNAME` | OpenSearch username (default: `admin`) |
-| `LANGFLOW_URL` | Langflow URL (default: `http://localhost:7860`) |
-| `LANGFLOW_PUBLIC_URL` | Public URL for Langflow (default: `http://localhost:7860`) |
-| `GOOGLE_OAUTH_CLIENT_ID` / `GOOGLE_OAUTH_CLIENT_SECRET` | Google OAuth authentication |
-| `MICROSOFT_GRAPH_OAUTH_CLIENT_ID` / `MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET` | Microsoft OAuth |
-| `WEBHOOK_BASE_URL` | Base URL for webhook endpoints |
-| `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | AWS integrations |
-| `SESSION_SECRET` | Session management (default: auto-generated, change in production) |
-| `LANGFLOW_KEY` | Explicit Langflow API key (auto-generated if not provided) |
-| `LANGFLOW_SECRET_KEY` | Secret key for Langflow internal operations |
-| `DOCLING_OCR_ENGINE` | OCR engine for document processing |
-| `LANGFLOW_AUTO_LOGIN` | Enable auto-login for Langflow (default: `False`) |
-| `LANGFLOW_NEW_USER_IS_ACTIVE` | New users are active by default (default: `False`) |
-| `LANGFLOW_ENABLE_SUPERUSER_CLI` | Enable superuser CLI (default: `False`) |
-| `OPENRAG_DOCUMENTS_PATHS` | Document paths for ingestion (default: `./documents`) |
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model for vector search. |
+| `LLM_MODEL` | `gpt-4o-mini` | Language model for the chat agent. |
+| `MODEL_PROVIDER` | `openai` | Model provider, such as OpenAI or IBM watsonx.ai. |
+| `OPENAI_API_KEY` | - | Your OpenAI API key. Required. |
+| `PROVIDER_API_KEY` | - | API key for the model provider. |
+| `PROVIDER_ENDPOINT` | - | Custom provider endpoint. Only used for IBM or Ollama providers. |
+| `PROVIDER_PROJECT_ID` | - | Project ID for providers. Only required for the IBM watsonx.ai provider. |
+
+### Document processing
+
+Control how OpenRAG processes and ingests documents into your knowledge base.
+For more information, see [Ingestion](/core-components/ingestion).
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `CHUNK_OVERLAP` | `200` | Overlap between chunks. |
+| `CHUNK_SIZE` | `1000` | Text chunk size for document processing. |
+| `DISABLE_INGEST_WITH_LANGFLOW` | `false` | Disable Langflow ingestion pipeline. |
+| `DOCLING_OCR_ENGINE` | - | OCR engine for document processing. |
+| `OCR_ENABLED` | `false` | Enable OCR for image processing. |
+| `OPENRAG_DOCUMENTS_PATHS` | `./documents` | Document paths for ingestion. |
+| `PICTURE_DESCRIPTIONS_ENABLED` | `false` | Enable picture descriptions. |
+
+### Langflow settings
+
+Configure Langflow authentication.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LANGFLOW_AUTO_LOGIN` | `False` | Enable auto-login for Langflow. |
+| `LANGFLOW_CHAT_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
+| `LANGFLOW_ENABLE_SUPERUSER_CLI` | `False` | Enable superuser CLI. |
+| `LANGFLOW_INGEST_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
+| `LANGFLOW_KEY` | auto-generated | Explicit Langflow API key. |
+| `LANGFLOW_NEW_USER_IS_ACTIVE` | `False` | New users are active by default. |
+| `LANGFLOW_PUBLIC_URL` | `http://localhost:7860` | Public URL for Langflow. |
+| `LANGFLOW_SECRET_KEY` | - | Secret key for Langflow internal operations. |
+| `LANGFLOW_SUPERUSER` | - | Langflow admin username. Required. |
+| `LANGFLOW_SUPERUSER_PASSWORD` | - | Langflow admin password. Required. |
+| `LANGFLOW_URL` | `http://localhost:7860` | Langflow URL. |
+| `NUDGES_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
+| `SYSTEM_PROMPT` | "You are a helpful AI assistant with access to a knowledge base. Answer questions based on the provided context." | System prompt for the Langflow agent. |
+
+
+### OAuth provider settings
+
+Configure OAuth providers and external service integrations.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | - | AWS integrations. |
+| `GOOGLE_OAUTH_CLIENT_ID` / `GOOGLE_OAUTH_CLIENT_SECRET` | - | Google OAuth authentication. |
+| `MICROSOFT_GRAPH_OAUTH_CLIENT_ID` / `MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET` | - | Microsoft OAuth. |
+| `WEBHOOK_BASE_URL` | - | Base URL for webhook endpoints. |
+
+### OpenSearch settings
+
+Configure OpenSearch database authentication.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `OPENSEARCH_HOST` | `localhost` | OpenSearch host. |
+| `OPENSEARCH_PASSWORD` | - | Password for OpenSearch admin user. Required. |
+| `OPENSEARCH_PORT` | `9200` | OpenSearch port. |
+| `OPENSEARCH_USERNAME` | `admin` | OpenSearch username. |
+
+### System settings
+
+Configure general system components, session management, and logging.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LANGFLOW_KEY_RETRIES` | `15` | Number of retries for Langflow key generation. |
+| `LANGFLOW_KEY_RETRY_DELAY` | `2.0` | Delay between retries in seconds. |
+| `LOG_FORMAT` | - | Log format (set to "json" for JSON output). |
+| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR). |
+| `MAX_WORKERS` | - | Maximum number of workers for document processing. |
+| `SERVICE_NAME` | `openrag` | Service name for logging. |
+| `SESSION_SECRET` | auto-generated | Session management.. |
+
+## Configuration file (`config.yaml`) {#configuration-file}
+
+A `config.yaml` file is generated with values input during [Application onboarding](/install#application-onboarding) and contains some of the same configuration variables as environment variables. The variables in `config.yaml` take precedence over environment variables.
+
+
+Which variables can `config.yaml` override?
+
+* CHUNK_OVERLAP
+* CHUNK_SIZE
+* EMBEDDING_MODEL
+* LLM_MODEL
+* MODEL_PROVIDER
+* OCR_ENABLED
+* OPENAI_API_KEY (backward compatibility)
+* PICTURE_DESCRIPTIONS_ENABLED
+* PROVIDER_API_KEY
+* PROVIDER_ENDPOINT
+* PROVIDER_PROJECT_ID
+* SYSTEM_PROMPT
+
+
+### Edit the `config.yaml` file
+
+To manually edit the `config.yaml` file, do the following:
+1. Stop OpenRAG.
+2. In the `config.yaml` file, change the value `edited:false` to `edited:true`.
+4. Make your changes, and then save your file.
+3. Start OpenRAG.
+
+The `config.yaml` value set for `MODEL_PROVIDER` can **not** be changed after onboarding.
+If you change this value in `config.yaml`, it will have no effect on restart.
+To change your `MODEL_PROVIDER`, you must [delete the OpenRAG containers](/tui#status), delete `config.yaml`, and [install OpenRAG](/install) again.
## Langflow runtime overrides
@@ -101,20 +188,8 @@ These values can be found in the code base at the following locations.
### OpenRAG configuration defaults
-These values are are defined in `src/config/config_manager.py`.
+These values are defined in [`config_manager.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/config/config_manager.py).
### System configuration defaults
-These fallback values are defined in `src/config/settings.py`.
-
-### TUI default values
-
-These values are defined in `src/tui/managers/env_manager.py`.
-
-### Frontend default values
-
-These values are defined in `frontend/src/lib/constants.ts`.
-
-### Docling preset configurations
-
-These values are defined in `src/api/settings.py`.
\ No newline at end of file
+These fallback values are defined in [`settings.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py).
\ No newline at end of file
diff --git a/docs/sidebars.js b/docs/sidebars.js
index 6fd9a177..7f038137 100644
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -75,7 +75,7 @@ const sidebars = {
{
type: "doc",
id: "reference/configuration",
- label: "Environment Variables and Configuration File"
+ label: "Environment variables"
},
],
},