Merge pull request #140 from langflow-ai/docs-env-vars

docs: environment variables and config.yaml
2025-10-02 17:22:49 -04:00 · 2025-10-02 17:22:49 -04:00 · 586c61ef56
commit 586c61ef56
parent d258841ce9 8d336b2475
10 changed files with 215 additions and 139 deletions
--- a/README.md
+++ b/README.md
@ -62,7 +62,7 @@ LANGFLOW_CHAT_FLOW_ID=your_chat_flow_id
 LANGFLOW_INGEST_FLOW_ID=your_ingest_flow_id
 NUDGES_FLOW_ID=your_nudges_flow_id
 ```
-See extended configuration, including ingestion and optional variables: [docs/configure/configuration.md](docs/docs/configure/configuration.md)
+See extended configuration, including ingestion and optional variables: [docs/reference/configuration.md](docs/docs/reference/configuration.md)
 ### 3. Start OpenRAG

 ```bash
--- a/docs/docs/_partial-onboarding.mdx
+++ b/docs/docs/_partial-onboarding.mdx
@ -5,10 +5,12 @@ import TabItem from '@theme/TabItem';

 The first time you start OpenRAG, whether using the TUI or a `.env` file, you must complete application onboarding.

-Values input during onboarding can be changed later in the OpenRAG **Settings** page, except for the language model and embedding model _provider_. 
-**Your provider can only be selected once, and you must use the same provider for your language model and embedding model.**
-The language model can be changed, but the embeddings model cannot be changed.
-To change your provider selection, you must completely reinstall OpenRAG.
+Most values from onboarding can be changed later in the OpenRAG **Settings** page, but there are important restrictions.
+
+The **language model provider** and **embeddings model provider** can only be selected at onboarding, and you must use the same provider for your language model and embedding model.
+To change your provider selection later, you must completely reinstall OpenRAG.
+
+The **language model** can be changed later in **Settings**, but the **embeddings model** cannot be changed later.

    <Tabs groupId="Provider">
    <TabItem value="OpenAI" label="OpenAI" default>
@ -36,14 +38,12 @@ To change your provider selection, you must completely reinstall OpenRAG.
    :::   
    1. Enter your Ollama server's base URL address.
    The default Ollama server address is `http://localhost:11434`.
-    Since OpenRAG is running in a container, you may need to change `localhost` to access services outside of the container. For example, change `http://localhost:11434` to `http://host.docker.internal:11434` to connect to Ollama.
-    OpenRAG automatically sends a test connection to your Ollama server to confirm connectivity.
+    OpenRAG automatically transforms `localhost` to access services outside of the container, and sends a test connection to your Ollama server to confirm connectivity.
    2. Select the **Embedding Model** and **Language Model** your Ollama server is running.
-    OpenRAG automatically lists the available models from your Ollama server.
+    OpenRAG retrieves the available models from your Ollama server.
    3. To load 2 sample PDFs, enable **Sample dataset**.
    This is recommended, but not required.
    4. Click **Complete**.
    5. Continue with the [Quickstart](/quickstart).        
    </TabItem>
-    </Tabs>
-
+    </Tabs>
--- a/docs/docs/configure/configuration.mdx
+++ b/docs/docs/configure/configuration.mdx
@ -1,110 +0,0 @@
---
-title: Configuration
-slug: /configure/configuration
---
-
-import PartialExternalPreview from '@site/docs/_partial-external-preview.mdx';
-
-<PartialExternalPreview />
-
-OpenRAG supports multiple configuration methods with the following priority:
-
-1. **Environment Variables** (highest priority)
-2. **Configuration File** (`config.yaml`)
-3. **Langflow Flow Settings** (runtime override)
-4. **Default Values** (fallback)
-
-## Configuration File
-
-Create a `config.yaml` file in the project root to configure OpenRAG:
-
-```yaml
-# OpenRAG Configuration File
-provider:
-  model_provider: "openai" # openai, anthropic, azure, etc.
-  api_key: "your-api-key" # or use OPENAI_API_KEY env var
-
-knowledge:
-  embedding_model: "text-embedding-3-small"
-  chunk_size: 1000
-  chunk_overlap: 200
-  ocr: true
-  picture_descriptions: false
-
-agent:
-  llm_model: "gpt-4o-mini"
-  system_prompt: "You are a helpful AI assistant..."
-```
-
-## Environment Variables
-
-Environment variables will override configuration file settings. You can still use `.env` files:
-
-```bash
-cp .env.example .env
-```
-
-## Required Variables
-
-| Variable                      | Description                                 |
-| ----------------------------- | ------------------------------------------- |
-| `OPENAI_API_KEY`              | Your OpenAI API key                         |
-| `OPENSEARCH_PASSWORD`         | Password for OpenSearch admin user          |
-| `LANGFLOW_SUPERUSER`          | Langflow admin username                     |
-| `LANGFLOW_SUPERUSER_PASSWORD` | Langflow admin password                     |
-| `LANGFLOW_CHAT_FLOW_ID`       | ID of your Langflow chat flow               |
-| `LANGFLOW_INGEST_FLOW_ID`     | ID of your Langflow ingestion flow          |
-| `NUDGES_FLOW_ID`              | ID of your Langflow nudges/suggestions flow |
-
-## Ingestion Configuration
-
-| Variable                       | Description                                            |
-| ------------------------------ | ------------------------------------------------------ |
-| `DISABLE_INGEST_WITH_LANGFLOW` | Disable Langflow ingestion pipeline (default: `false`) |
-
- `false` or unset: Uses Langflow pipeline (upload → ingest → delete)
- `true`: Uses traditional OpenRAG processor for document ingestion
-
-## Optional Variables
-
-| Variable                                                                  | Description                                                        |
-| ------------------------------------------------------------------------- | ------------------------------------------------------------------ |
-| `LANGFLOW_PUBLIC_URL`                                                     | Public URL for Langflow (default: `http://localhost:7860`)         |
-| `GOOGLE_OAUTH_CLIENT_ID` / `GOOGLE_OAUTH_CLIENT_SECRET`                   | Google OAuth authentication                                        |
-| `MICROSOFT_GRAPH_OAUTH_CLIENT_ID` / `MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET` | Microsoft OAuth                                                    |
-| `WEBHOOK_BASE_URL`                                                        | Base URL for webhook endpoints                                     |
-| `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`                             | AWS integrations                                                   |
-| `SESSION_SECRET`                                                          | Session management (default: auto-generated, change in production) |
-| `LANGFLOW_KEY`                                                            | Explicit Langflow API key (auto-generated if not provided)         |
-| `LANGFLOW_SECRET_KEY`                                                     | Secret key for Langflow internal operations                        |
-
-## OpenRAG Configuration Variables
-
-These environment variables override settings in `config.yaml`:
-
-### Provider Settings
-
-| Variable           | Description                              | Default  |
-| ------------------ | ---------------------------------------- | -------- |
-| `MODEL_PROVIDER`   | Model provider (openai, anthropic, etc.) | `openai` |
-| `PROVIDER_API_KEY` | API key for the model provider           |          |
-| `OPENAI_API_KEY`   | OpenAI API key (backward compatibility)  |          |
-
-### Knowledge Settings
-
-| Variable                       | Description                             | Default                  |
-| ------------------------------ | --------------------------------------- | ------------------------ |
-| `EMBEDDING_MODEL`              | Embedding model for vector search       | `text-embedding-3-small` |
-| `CHUNK_SIZE`                   | Text chunk size for document processing | `1000`                   |
-| `CHUNK_OVERLAP`                | Overlap between chunks                  | `200`                    |
-| `OCR_ENABLED`                  | Enable OCR for image processing         | `true`                   |
-| `PICTURE_DESCRIPTIONS_ENABLED` | Enable picture descriptions             | `false`                  |
-
-### Agent Settings
-
-| Variable        | Description                       | Default                  |
-| --------------- | --------------------------------- | ------------------------ |
-| `LLM_MODEL`     | Language model for the chat agent | `gpt-4o-mini`            |
-| `SYSTEM_PROMPT` | System prompt for the agent       | Default assistant prompt |
-
-See `.env.example` for a complete list with descriptions, and `docker-compose*.yml` for runtime usage.
--- a/docs/docs/core-components/agents.mdx
+++ b/docs/docs/core-components/agents.mdx
@ -52,7 +52,7 @@ This filter is the [Knowledge filter](/knowledge#create-knowledge-filters), and

 <PartialModifyFlows />

-For an example of changing out the agent's LLM in OpenRAG, see the [Quickstart](/quickstart#change-components).
+For an example of changing out the agent's language model in OpenRAG, see the [Quickstart](/quickstart#change-components).

 To restore the flow to its initial state, in OpenRAG, click <Icon name="Settings" aria-hidden="true"/> **Settings**, and then click **Restore Flow**.
 OpenRAG warns you that this discards all custom settings. Click **Restore** to restore the flow.
--- a/docs/docs/core-components/ingestion.mdx
+++ b/docs/docs/core-components/ingestion.mdx
@ -46,7 +46,7 @@ If OpenRAG detects that the local machine is running on macOS, OpenRAG uses the

 ## Use OpenRAG default ingestion instead of Docling serve

-If you want to use OpenRAG's built-in pipeline instead of Docling serve, set `DISABLE_INGEST_WITH_LANGFLOW=true` in [Environment variables](/configure/configuration#ingestion-configuration).
+If you want to use OpenRAG's built-in pipeline instead of Docling serve, set `DISABLE_INGEST_WITH_LANGFLOW=true` in [Environment variables](/reference/configuration#document-processing).

 The built-in pipeline still uses the Docling processor, but uses it directly without the Docling Serve API.

--- a/docs/docs/get-started/docker.mdx
+++ b/docs/docs/get-started/docker.mdx
@ -51,12 +51,12 @@ The following values are **required** to be set:
   ```bash
   OPENSEARCH_PASSWORD=your_secure_password
   OPENAI_API_KEY=your_openai_api_key
-   
   LANGFLOW_SUPERUSER=admin
   LANGFLOW_SUPERUSER_PASSWORD=your_langflow_password
   LANGFLOW_SECRET_KEY=your_secret_key
   ```
-   For more information on configuring OpenRAG with environment variables, see [Environment variables](/configure/configuration).
+   
+   For more information on configuring OpenRAG with environment variables, see [Environment variables](/reference/configuration).

 4. Deploy OpenRAG with Docker Compose based on your deployment type.

@ -95,12 +95,35 @@ The following values are **required** to be set:

 <PartialOnboarding />

-## Rebuild all Docker containers
+## Container management commands

-If you need to reset state and rebuild all of your containers, run the following command.
+Manage your OpenRAG containers with the following commands.
+These commands are also available in the TUI's [Status menu](/get-started/tui#status).
+
+### Upgrade containers
+
+Upgrade your containers to the latest version while preserving your data.
+
+```bash
+docker compose pull
+docker compose up -d --force-recreate
+```
+
+### Rebuild containers (destructive)
+
+Reset state by rebuilding all of your containers.
 Your OpenSearch and Langflow databases will be lost.
 Documents stored in the `./documents` directory will persist, since the directory is mounted as a volume in the OpenRAG backend container.

 ```bash
 docker compose up --build --force-recreate --remove-orphans
 ```
+
+### Remove all containers and data (destructive)
+
+Completely remove your OpenRAG installation and delete all data.
+This deletes all of your data, including OpenSearch data, uploaded documents, and authentication. 
+```bash
+docker compose down --volumes --remove-orphans --rmi local
+docker system prune -f
+```
--- a/docs/docs/get-started/install.mdx
+++ b/docs/docs/get-started/install.mdx
@ -5,12 +5,12 @@ slug: /install

 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
-import PartialOnboarding from '@site/docs/_partial-onboarding.mdx';
+import PartialOnboarding from '@site/docs/_partial-onboarding.mdx'; 
 import PartialExternalPreview from '@site/docs/_partial-external-preview.mdx';

 <PartialExternalPreview />

-[Install the OpenRAG Python wheel](#install-python-wheel) and then use the [OpenRAG Terminal User Interface(TUI)](#setup) to run and configure your OpenRAG deployment with a guided setup process.
+[Install the OpenRAG Python wheel](#install-python-wheel), and then run the [OpenRAG Terminal User Interface(TUI)](#setup) to start your OpenRAG deployment with a guided setup process.

 If you prefer running Docker commands and manually editing `.env` files, see [Deploy with Docker](/get-started/docker).

@ -29,17 +29,16 @@ If you prefer running Docker commands and manually editing `.env` files, see [De
 The `.whl` file is currently available as an internal download during public preview, and will be published to PyPI in a future release.
 :::

-The OpenRAG wheel installs the Terminal User Interface (TUI) for running and managing OpenRAG.
+The OpenRAG wheel installs the Terminal User Interface (TUI) for configuring and running OpenRAG.

-1. Create a new project with a virtual environment using `uv`.
-   This creates and activates a virtual environment for your project.
+1. Create a new project with a virtual environment using `uv init`.

   ```bash
   uv init YOUR_PROJECT_NAME
   cd YOUR_PROJECT_NAME
   ```

-   The terminal prompt won't change like it would when using `venv`, but the `uv` commands will use the project's virtual environment.
+   The `(venv)` prompt doesn't change, but `uv` commands will automatically use the project's virtual environment.
   For more information on virtual environments, see the [uv documentation](https://docs.astral.sh/uv/pip/environments).

 2. Add the local OpenRAG wheel to your project's virtual environment.
@ -65,7 +64,9 @@ The OpenRAG wheel installs the Terminal User Interface (TUI) for running and man

 ## Set up OpenRAG with the TUI {#setup}

-**Basic Setup** completes or auto-generates most of the required values to start OpenRAG.
+The TUI creates a `.env` file in your OpenRAG directory root and starts OpenRAG.
+
+**Basic Setup** generates all of the required values except the OpenAI API key.
 **Basic Setup** does not set up OAuth connections for ingestion from Google Drive, OneDrive, or AWS.
 For OAuth setup, use **Advanced Setup**.

--- a/docs/docs/reference/configuration.mdx
+++ b/docs/docs/reference/configuration.mdx
@ -0,0 +1,162 @@
+---
+title: Environment variables
+slug: /reference/configuration
+---
+
+import Icon from "@site/src/components/icon/icon";
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+OpenRAG recognizes [supported environment variables](#supported-environment-variables) from the following sources:
+
+* [Environment variables](#supported-environment-variables) - Values set in the `.env` file.
+* [Langflow runtime overrides](#langflow-runtime-overrides) - Langflow components may tweak environment variables at runtime.
+* [Default or fallback values](#default-values-and-fallbacks) - These values are default or fallback values if OpenRAG doesn't find a value.
+
+## Configure environment variables
+
+Environment variables are set in a `.env` file in the root of your OpenRAG project directory.
+
+For an example `.env` file, see [`.env.example` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/.env.example).
+
+The Docker Compose files are populated with values from your `.env`, so you don't need to edit the Docker Compose files manually.
+
+Environment variables always take precedence over other variables.
+
+### Set environment variables
+
+To set environment variables, do the following.
+
+1. Stop OpenRAG.
+2. Set the values in the `.env` file:
+   ```bash
+   LOG_LEVEL=DEBUG
+   LOG_FORMAT=json
+   SERVICE_NAME=openrag-dev
+   ```
+3. Start OpenRAG.
+
+Updating provider API keys or provider endpoints in the `.env` file will not take effect after [Application onboarding](/install#application-onboarding). To change these values, you must:
+
+1. Stop OpenRAG.
+2. Remove the containers:
+   ```
+   docker-compose down
+   ```
+3. Update the values in your `.env` file.
+4. Start OpenRAG containers.
+   ```
+   docker-compose up -d
+   ```
+5. Complete [Application onboarding](/install#application-onboarding) again.
+
+## Supported environment variables
+
+All OpenRAG configuration can be controlled through environment variables.
+
+### AI provider settings
+
+Configure which AI models and providers OpenRAG uses for language processing and embeddings.
+For more information, see [Application onboarding](/install#application-onboarding).
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `EMBEDDING_MODEL` | `text-embedding-3-small` | Embedding model for vector search. |
+| `LLM_MODEL` | `gpt-4o-mini` | Language model for the chat agent. |
+| `MODEL_PROVIDER` | `openai` | Model provider, such as OpenAI or IBM watsonx.ai. |
+| `OPENAI_API_KEY` | - | Your OpenAI API key. Required. |
+| `PROVIDER_API_KEY` | - | API key for the model provider. |
+| `PROVIDER_ENDPOINT` | - | Custom provider endpoint. Only used for IBM or Ollama providers. |
+| `PROVIDER_PROJECT_ID` | - | Project ID for providers. Only required for the IBM watsonx.ai provider. |
+
+### Document processing
+
+Control how OpenRAG processes and ingests documents into your knowledge base.
+For more information, see [Ingestion](/ingestion).
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `CHUNK_OVERLAP` | `200` | Overlap between chunks. |
+| `CHUNK_SIZE` | `1000` | Text chunk size for document processing. |
+| `DISABLE_INGEST_WITH_LANGFLOW` | `false` | Disable Langflow ingestion pipeline. |
+| `DOCLING_OCR_ENGINE` | - | OCR engine for document processing. |
+| `OCR_ENABLED` | `false` | Enable OCR for image processing. |
+| `OPENRAG_DOCUMENTS_PATHS` | `./documents` | Document paths for ingestion. |
+| `PICTURE_DESCRIPTIONS_ENABLED` | `false` | Enable picture descriptions. |
+
+### Langflow settings
+
+Configure Langflow authentication.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LANGFLOW_AUTO_LOGIN` | `False` | Enable auto-login for Langflow. |
+| `LANGFLOW_CHAT_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
+| `LANGFLOW_ENABLE_SUPERUSER_CLI` | `False` | Enable superuser CLI. |
+| `LANGFLOW_INGEST_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
+| `LANGFLOW_KEY` | auto-generated | Explicit Langflow API key. |
+| `LANGFLOW_NEW_USER_IS_ACTIVE` | `False` | New users are active by default. |
+| `LANGFLOW_PUBLIC_URL` | `http://localhost:7860` | Public URL for Langflow. |
+| `LANGFLOW_SECRET_KEY` | - | Secret key for Langflow internal operations. |
+| `LANGFLOW_SUPERUSER` | - | Langflow admin username. Required. |
+| `LANGFLOW_SUPERUSER_PASSWORD` | - | Langflow admin password. Required. |
+| `LANGFLOW_URL` | `http://localhost:7860` | Langflow URL. |
+| `NUDGES_FLOW_ID` | pre-filled | This value is pre-filled. The default value is found in [.env.example](https://github.com/langflow-ai/openrag/blob/main/.env.example). |
+| `SYSTEM_PROMPT` | "You are a helpful AI assistant with access to a knowledge base. Answer questions based on the provided context." | System prompt for the Langflow agent. |
+
+### OAuth provider settings
+
+Configure OAuth providers and external service integrations.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | - | AWS integrations. |
+| `GOOGLE_OAUTH_CLIENT_ID` / `GOOGLE_OAUTH_CLIENT_SECRET` | - | Google OAuth authentication. |
+| `MICROSOFT_GRAPH_OAUTH_CLIENT_ID` / `MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET` | - | Microsoft OAuth. |
+| `WEBHOOK_BASE_URL` | - | Base URL for webhook endpoints. |
+
+### OpenSearch settings
+
+Configure OpenSearch database authentication.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `OPENSEARCH_HOST` | `localhost` | OpenSearch host. |
+| `OPENSEARCH_PASSWORD` | - | Password for OpenSearch admin user. Required. |
+| `OPENSEARCH_PORT` | `9200` | OpenSearch port. |
+| `OPENSEARCH_USERNAME` | `admin` | OpenSearch username. |
+
+### System settings
+
+Configure general system components, session management, and logging.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `LANGFLOW_KEY_RETRIES` | `15` | Number of retries for Langflow key generation. |
+| `LANGFLOW_KEY_RETRY_DELAY` | `2.0` | Delay between retries in seconds. |
+| `LOG_FORMAT` | - | Log format (set to "json" for JSON output). |
+| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR). |
+| `MAX_WORKERS` | - | Maximum number of workers for document processing. |
+| `SERVICE_NAME` | `openrag` | Service name for logging. |
+| `SESSION_SECRET` | auto-generated | Session management. |
+
+## Langflow runtime overrides
+
+Langflow runtime overrides allow you to modify component settings at runtime without changing the base configuration.
+
+Runtime overrides are implemented through **tweaks** - parameter modifications that are passed to specific Langflow components during flow execution.
+
+For more information on tweaks, see [Input schema (tweaks)](https://docs.langflow.org/concepts-publish#input-schema).
+
+## Default values and fallbacks
+
+When no environment variables or configuration file values are provided, OpenRAG uses default values.
+These values can be found in the code base at the following locations.
+
+### OpenRAG configuration defaults
+
+These values are defined in [`config_manager.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/config/config_manager.py).
+
+### System configuration defaults
+
+These fallback values are defined in [`settings.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py).
--- a/docs/docs/support/troubleshoot.mdx
+++ b/docs/docs/support/troubleshoot.mdx
@ -13,12 +13,12 @@ This page provides troubleshooting advice for issues you might encounter when us

 ## OpenSearch fails to start

-Check that `OPENSEARCH_PASSWORD` set in [Environment variables](/configure/configuration) meets requirements.
+Check that `OPENSEARCH_PASSWORD` set in [Environment variables](/reference/configuration) meets requirements.
 The password must contain at least 8 characters, and must contain at least one uppercase letter, one lowercase letter, one digit, and one special character that is strong.

 ## Langflow connection issues

-Verify the `LANGFLOW_SUPERUSER` credentials set in [Environment variables](/configure/configuration) are correct.
+Verify the `LANGFLOW_SUPERUSER` credentials set in [Environment variables](/reference/configuration) are correct.

 ## Memory errors

@ -108,4 +108,4 @@ To reset your local containers and pull new images, do the following:

 3. In the OpenRAG TUI, click **Status**, and then click **Upgrade**.
 When the **Close** button is active, the upgrade is complete.
-Close the window and open the OpenRAG appplication. 
+Close the window and open the OpenRAG appplication.
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@ -70,11 +70,11 @@ const sidebars = {
    },
    {
      type: "category",
-      label: "Configuration",
+      label: "Reference",
      items: [
        {
          type: "doc",
-          id: "configure/configuration",
+          id: "reference/configuration",
          label: "Environment variables"
        },
      ],
@ -93,4 +93,4 @@ const sidebars = {
  ],
 };

-export default sidebars;
+export default sidebars;