add-config-yaml-section-combine-env-vars

This commit is contained in:
Mendon Kissling 2025-10-01 14:48:23 -04:00
parent ccedc06ede
commit 8905f8fab9
2 changed files with 156 additions and 81 deletions

View file

@ -1,90 +1,177 @@
---
title: Environment variables and configuration values
title: Environment variables
slug: /reference/configuration
---
OpenRAG supports multiple configuration methods with the following priority, from highest to lowest:
import Icon from "@site/src/components/icon/icon";
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
1. [Configuration file (`config.yaml`)](#openrag-config-variables) - The `config.yaml` file is generated with values input during [Application onboarding](/install#application-onboarding), and controls the [OpenRAG configuration variables](#openrag-config-variables).
2. [Environment variables](#environment-variables) - Environment variables control how OpenRAG connects to services. Environment variables in the `.env` control underlying services such as Langflow authentication, Oauth settings, and OpenSearch security.
3. [Langflow runtime overrides](#langflow-runtime-overrides)
4. [Default or fallback values](#default-values-and-fallbacks)
OpenRAG recognizes [supported environment variables](#supported-environment-variables) from the following sources:
## OpenRAG configuration variables {#openrag-config-variables}
* **[Environment variables](#supported-environment-variables)** - Values set in `.env` or `docker-compose.yml` file.
* **[Configuration file variables (`config.yaml`)](#configuration-file)** - Values generated during application onboarding and saved to `config.yaml`.
* **[Langflow runtime overrides](#langflow-runtime-overrides)** - Langflow components may tweak environment variables at runtime.
* **[Default or fallback values](#default-values-and-fallbacks)** - These values are default or fallback values if OpenRAG doesn't find a value.
These values control what the OpenRAG application does.
## Configure environment variables
### Provider settings
Environment variables can be set in a `.env` or `docker-compose.yml` file.
| Variable | Description | Default |
| -------------------- | ---------------------------------------- | -------- |
| `MODEL_PROVIDER` | Model provider (openai, anthropic, etc.) | `openai` |
| `PROVIDER_API_KEY` | API key for the model provider. | |
| `PROVIDER_ENDPOINT` | Custom provider endpoint. Only used for IBM or Ollama providers. | |
| `PROVIDER_PROJECT_ID`| Project ID for providers. Only required for the IBM watsonx.ai provider. | |
| `OPENAI_API_KEY` | OpenAI API key. | |
### Precedence
### Knowledge settings
Environment variables always take precedence over other variables, except when the same variable exists in both [config.yaml](#configuration-file) and the `.env`. In this case, the variable in `config.yaml` will take precedence.
| Variable | Description | Default |
| ------------------------------ | --------------------------------------- | ------------------------ |
| `EMBEDDING_MODEL` | Embedding model for vector search. | `text-embedding-3-small` |
| `CHUNK_SIZE` | Text chunk size for document processing. | `1000` |
| `CHUNK_OVERLAP` | Overlap between chunks. | `200` |
| `OCR_ENABLED` | Enable OCR for image processing. | `true` |
| `PICTURE_DESCRIPTIONS_ENABLED` | Enable picture descriptions. | `false` |
### Set environment variables
### Agent settings
To set environment variables, do the following:
| Variable | Description | Default |
| --------------- | --------------------------------- | ------------------------ |
| `LLM_MODEL` | Language model for the chat agent. | `gpt-4o-mini` |
| `SYSTEM_PROMPT` | System prompt for the agent. | "You are a helpful AI assistant with access to a knowledge base. Answer questions based on the provided context." |
<Tabs>
<TabItem value="env-file" label=".env file" default>
## Environment variables
Stop OpenRAG, set the values in the .env file, and then start OpenRAG.
```bash
OPENAI_API_KEY=your-api-key-here
EMBEDDING_MODEL=text-embedding-3-small
CHUNK_SIZE=1000
```
</TabItem>
## Required variables
<TabItem value="docker" label="Docker Compose">
Stop OpenRAG, set the values in the `docker-compose.yml` file, and then start OpenRAG.
```yaml
environment:
- OPENAI_API_KEY=your-api-key-here
- EMBEDDING_MODEL=text-embedding-3-small
- CHUNK_SIZE=1000
```
| Variable | Description |
| ----------------------------- | ------------------------------------------- |
| `OPENAI_API_KEY` | Your OpenAI API key |
| `OPENSEARCH_PASSWORD` | Password for OpenSearch admin user |
| `LANGFLOW_SUPERUSER` | Langflow admin username |
| `LANGFLOW_SUPERUSER_PASSWORD` | Langflow admin password |
| `LANGFLOW_CHAT_FLOW_ID` | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
| `LANGFLOW_INGEST_FLOW_ID` | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
| `NUDGES_FLOW_ID` | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
</TabItem>
</Tabs>
## Ingestion configuration
## Supported environment variables
| Variable | Description |
| ------------------------------ | ------------------------------------------------------ |
| `DISABLE_INGEST_WITH_LANGFLOW` | Disable Langflow ingestion pipeline. Default: `false`. |
All OpenRAG configuration can be controlled through environment variables.
- `false` or unset: Uses Langflow pipeline (upload → ingest → delete).
- `true`: Uses traditional OpenRAG processor for document ingestion.
### AI provider settings
## Optional variables
Configure which AI models and providers OpenRAG uses for language processing and embeddings.
For more information, see [Application onboarding](/install#application-onboarding).
| Variable | Description |
| ------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| `OPENSEARCH_HOST` | OpenSearch host (default: `localhost`) |
| `OPENSEARCH_PORT` | OpenSearch port (default: `9200`) |
| `OPENSEARCH_USERNAME` | OpenSearch username (default: `admin`) |
| `LANGFLOW_URL` | Langflow URL (default: `http://localhost:7860`) |
| `LANGFLOW_PUBLIC_URL` | Public URL for Langflow (default: `http://localhost:7860`) |
| `GOOGLE_OAUTH_CLIENT_ID` / `GOOGLE_OAUTH_CLIENT_SECRET` | Google OAuth authentication |
| `MICROSOFT_GRAPH_OAUTH_CLIENT_ID` / `MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET` | Microsoft OAuth |
| `WEBHOOK_BASE_URL` | Base URL for webhook endpoints |
| `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | AWS integrations |
| `SESSION_SECRET` | Session management (default: auto-generated, change in production) |
| `LANGFLOW_KEY` | Explicit Langflow API key (auto-generated if not provided) |
| `LANGFLOW_SECRET_KEY` | Secret key for Langflow internal operations |
| `DOCLING_OCR_ENGINE` | OCR engine for document processing |
| `LANGFLOW_AUTO_LOGIN` | Enable auto-login for Langflow (default: `False`) |
| `LANGFLOW_NEW_USER_IS_ACTIVE` | New users are active by default (default: `False`) |
| `LANGFLOW_ENABLE_SUPERUSER_CLI` | Enable superuser CLI (default: `False`) |
| `OPENRAG_DOCUMENTS_PATHS` | Document paths for ingestion (default: `./documents`) |
| Variable | Default | Description |
|----------|---------|-------------|
| `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model for vector search. |
| `LLM_MODEL` | `gpt-4o-mini` | Language model for the chat agent. |
| `MODEL_PROVIDER` | `openai` | Model provider, such as OpenAI or IBM watsonx.ai. |
| `OPENAI_API_KEY` | - | Your OpenAI API key. Required. |
| `PROVIDER_API_KEY` | - | API key for the model provider. |
| `PROVIDER_ENDPOINT` | - | Custom provider endpoint. Only used for IBM or Ollama providers. |
| `PROVIDER_PROJECT_ID` | - | Project ID for providers. Only required for the IBM watsonx.ai provider. |
### Document processing
Control how OpenRAG processes and ingests documents into your knowledge base.
For more information, see [Ingestion](/core-components/ingestion).
| Variable | Default | Description |
|----------|---------|-------------|
| `CHUNK_OVERLAP` | `200` | Overlap between chunks. |
| `CHUNK_SIZE` | `1000` | Text chunk size for document processing. |
| `DISABLE_INGEST_WITH_LANGFLOW` | `false` | Disable Langflow ingestion pipeline. |
| `DOCLING_OCR_ENGINE` | - | OCR engine for document processing. |
| `OCR_ENABLED` | `false` | Enable OCR for image processing. |
| `OPENRAG_DOCUMENTS_PATHS` | `./documents` | Document paths for ingestion. |
| `PICTURE_DESCRIPTIONS_ENABLED` | `false` | Enable picture descriptions. |
### Langflow settings
Configure Langflow authentication.
| Variable | Default | Description |
|----------|---------|-------------|
| `LANGFLOW_AUTO_LOGIN` | `False` | Enable auto-login for Langflow. |
| `LANGFLOW_CHAT_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
| `LANGFLOW_ENABLE_SUPERUSER_CLI` | `False` | Enable superuser CLI. |
| `LANGFLOW_INGEST_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
| `LANGFLOW_KEY` | auto-generated | Explicit Langflow API key. |
| `LANGFLOW_NEW_USER_IS_ACTIVE` | `False` | New users are active by default. |
| `LANGFLOW_PUBLIC_URL` | `http://localhost:7860` | Public URL for Langflow. |
| `LANGFLOW_SECRET_KEY` | - | Secret key for Langflow internal operations. |
| `LANGFLOW_SUPERUSER` | - | Langflow admin username. Required. |
| `LANGFLOW_SUPERUSER_PASSWORD` | - | Langflow admin password. Required. |
| `LANGFLOW_URL` | `http://localhost:7860` | Langflow URL. |
| `NUDGES_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
| `SYSTEM_PROMPT` | "You are a helpful AI assistant with access to a knowledge base. Answer questions based on the provided context." | System prompt for the Langflow agent. |
### OAuth provider settings
Configure OAuth providers and external service integrations.
| Variable | Default | Description |
|----------|---------|-------------|
| `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | - | AWS integrations. |
| `GOOGLE_OAUTH_CLIENT_ID` / `GOOGLE_OAUTH_CLIENT_SECRET` | - | Google OAuth authentication. |
| `MICROSOFT_GRAPH_OAUTH_CLIENT_ID` / `MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET` | - | Microsoft OAuth. |
| `WEBHOOK_BASE_URL` | - | Base URL for webhook endpoints. |
### OpenSearch settings
Configure OpenSearch database authentication.
| Variable | Default | Description |
|----------|---------|-------------|
| `OPENSEARCH_HOST` | `localhost` | OpenSearch host. |
| `OPENSEARCH_PASSWORD` | - | Password for OpenSearch admin user. Required. |
| `OPENSEARCH_PORT` | `9200` | OpenSearch port. |
| `OPENSEARCH_USERNAME` | `admin` | OpenSearch username. |
### System settings
Configure general system components, session management, and logging.
| Variable | Default | Description |
|----------|---------|-------------|
| `LANGFLOW_KEY_RETRIES` | `15` | Number of retries for Langflow key generation. |
| `LANGFLOW_KEY_RETRY_DELAY` | `2.0` | Delay between retries in seconds. |
| `LOG_FORMAT` | - | Log format (set to "json" for JSON output). |
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR). |
| `MAX_WORKERS` | - | Maximum number of workers for document processing. |
| `SERVICE_NAME` | `openrag` | Service name for logging. |
| `SESSION_SECRET` | auto-generated | Session management.. |
## Configuration file (`config.yaml`) {#configuration-file}
A `config.yaml` file is generated with values input during [Application onboarding](/install#application-onboarding) and contains some of the same configuration variables as environment variables. The variables in `config.yaml` take precedence over environment variables.
<details open>
<summary>Which variables can `config.yaml` override?</summary>
* CHUNK_OVERLAP
* CHUNK_SIZE
* EMBEDDING_MODEL
* LLM_MODEL
* MODEL_PROVIDER
* OCR_ENABLED
* OPENAI_API_KEY (backward compatibility)
* PICTURE_DESCRIPTIONS_ENABLED
* PROVIDER_API_KEY
* PROVIDER_ENDPOINT
* PROVIDER_PROJECT_ID
* SYSTEM_PROMPT
</details>
### Edit the `config.yaml` file
To manually edit the `config.yaml` file, do the following:
1. Stop OpenRAG.
2. In the `config.yaml` file, change the value `edited:false` to `edited:true`.
4. Make your changes, and then save your file.
3. Start OpenRAG.
The `config.yaml` value set for `MODEL_PROVIDER` can **not** be changed after onboarding.
If you change this value in `config.yaml`, it will have no effect on restart.
To change your `MODEL_PROVIDER`, you must [delete the OpenRAG containers](/tui#status), delete `config.yaml`, and [install OpenRAG](/install) again.
## Langflow runtime overrides
@ -101,20 +188,8 @@ These values can be found in the code base at the following locations.
### OpenRAG configuration defaults
These values are are defined in `src/config/config_manager.py`.
These values are defined in [`config_manager.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/config/config_manager.py).
### System configuration defaults
These fallback values are defined in `src/config/settings.py`.
### TUI default values
These values are defined in `src/tui/managers/env_manager.py`.
### Frontend default values
These values are defined in `frontend/src/lib/constants.ts`.
### Docling preset configurations
These values are defined in `src/api/settings.py`.
These fallback values are defined in [`settings.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py).

View file

@ -75,7 +75,7 @@ const sidebars = {
{
type: "doc",
id: "reference/configuration",
label: "Environment Variables and Configuration File"
label: "Environment variables"
},
],
},