diff --git a/docs/docs/core-components/knowledge.mdx b/docs/docs/core-components/knowledge.mdx index 53900987..f8f2fc0c 100644 --- a/docs/docs/core-components/knowledge.mdx +++ b/docs/docs/core-components/knowledge.mdx @@ -126,10 +126,10 @@ The default embedding dimension is `1536`, and the default model is the OpenAI ` If you want to use an unsupported model, you must manually set the model in your [OpenRAG `.env` file](/reference/configuration). If you use an unsupported embedding model that doesn't have defined dimensions in `settings.py`, then OpenRAG falls back to the default dimensions (1536) and logs a warning. OpenRAG's OpenSearch instance and flows continue to work, but [similarity search](https://www.ibm.com/think/topics/vector-search) quality can be affected if the actual model dimensions aren't 1536. -To change the embedding model after onboarding, it is recommended that you modify the embedding model configuration on the OpenRAG **Settings** page or in your [OpenRAG `.env` file](/reference/configuration). +To change the embedding model after onboarding, modify the embedding model configuration on the OpenRAG **Settings** page or in your [OpenRAG `.env` file](/reference/configuration). This ensures that all relevant [OpenRAG flows](/agents) are updated to use the new embedding model configuration. -If you edit these settings in the `.env` or `docker-compose` files, you must [stop and restart the OpenRAG containers](/manage-services#stop-and-start-containers) to apply the changes. +If you edit these settings in the `.env` file, you must [stop and restart the OpenRAG containers](/manage-services#stop-and-start-containers) to apply the changes. ### Set Docling parameters @@ -137,14 +137,17 @@ OpenRAG uses [Docling](https://docling-project.github.io/docling/) for document When you [upload documents](/ingestion), Docling processes the files, splits them into chunks, and stores them as separate, structured documents in your OpenSearch knowledge base. -#### Select a Docling implementation +#### Select a Docling implementation {#select-a-docling-implementation} You can use either Docling Serve or OpenRAG's built-in Docling ingestion pipeline to process documents. * **Docling Serve ingestion**: By default, OpenRAG uses [Docling Serve](https://github.com/docling-project/docling-serve). -This means that OpenRAG starts a `docling serve` process on your local machine and runs Docling ingestion through an API service. +It starts a local `docling serve` process, and then runs Docling ingestion through the Docling Serve API. -* **Built-in Docling ingestion**: If you want to use OpenRAG's built-in Docling ingestion pipeline instead of the separate Docling Serve service, set `DISABLE_INGEST_WITH_LANGFLOW=true` in your [OpenRAG environment variables](/reference/configuration#document-processing-settings). + To use a remote `docling serve` instance or your own local instance, set `DOCLING_SERVE_URL=http://**HOST_IP**:5001` in your [OpenRAG `.env` file](/reference/configuration#document-processing-settings). + The service must run on port 5001. + +* **Built-in Docling ingestion**: If you want to use OpenRAG's built-in Docling ingestion pipeline instead of the separate Docling Serve service, set `DISABLE_INGEST_WITH_LANGFLOW=true` in your [OpenRAG `.env` file](/reference/configuration#document-processing-settings). The built-in pipeline uses the Docling processor directly instead of through the Docling Serve API. For the underlying functionality, see [`processors.py`](https://github.com/langflow-ai/openrag/blob/main/src/models/processors.py#L58) in the OpenRAG repository. diff --git a/docs/docs/get-started/docker.mdx b/docs/docs/get-started/docker.mdx index 0c65e9c5..58071201 100644 --- a/docs/docs/get-started/docker.mdx +++ b/docs/docs/get-started/docker.mdx @@ -88,7 +88,7 @@ The following variables are required or recommended: ## Start services -1. Start `docling serve` on port 5001 on the host machine: +1. To use the default Docling Serve implementation, start `docling serve` on port 5001 on the host machine using the included script: ```bash uv run python scripts/docling_ctl.py start --port 5001 @@ -97,10 +97,16 @@ The following variables are required or recommended: Docling cannot run inside a Docker container due to system-level dependencies, so you must manage it as a separate service on the host machine. For more information, see [Stop, start, and inspect native services](/manage-services#start-native-services). - This port is required to deploy OpenRAG successfully; don't use a different port. + Port 5001 is required to deploy OpenRAG successfully; don't use a different port. Additionally, this enables the [MLX framework](https://opensource.apple.com/projects/mlx/) for accelerated performance on Apple Silicon Mac machines. -2. Confirm `docling serve` is running: + :::tip + If you don't want to use the default Docling Serve implementation, see [Select a Docling implementation](/knowledge#select-a-docling-implementation). + ::: + +2. Confirm `docling serve` is running. + + The following command checks the status of the default Docling Serve implementation: ```bash uv run python scripts/docling_ctl.py status diff --git a/docs/docs/reference/configuration.mdx b/docs/docs/reference/configuration.mdx index 8ba77b8f..537def62 100644 --- a/docs/docs/reference/configuration.mdx +++ b/docs/docs/reference/configuration.mdx @@ -62,12 +62,15 @@ Some of these variables are immutable and can only be changed by redeploying Ope Control how OpenRAG [processes and ingests documents](/ingestion) into your knowledge base. +Most of these settings can be configured on the OpenRAG **Settings** page or in the `.env` file. + | Variable | Default | Description | |----------|---------|-------------| | `CHUNK_OVERLAP` | `200` | Overlap between chunks. | | `CHUNK_SIZE` | `1000` | Text chunk size for document processing. | | `DISABLE_INGEST_WITH_LANGFLOW` | `false` | Disable Langflow ingestion pipeline. | | `DOCLING_OCR_ENGINE` | Set by OS | OCR engine for document processing. For macOS, `ocrmac`. For any other OS, `easyocr`. | +| `DOCLING_SERVE_URL` | `http://**HOST_IP**:5001` | URL for the [Docling Serve instance](/knowledge#select-a-docling-implementation). By default, OpenRAG starts a local `docling serve` process and auto-detects the host. To use your own local or remote Docling Serve instance, set this variable to the full path to the target instance. The service must run on port 5001. | | `OCR_ENABLED` | `false` | Enable OCR for image processing. | | `OPENRAG_DOCUMENTS_PATH` | `~/.openrag/documents` | The [local documents path](/knowledge#set-the-local-documents-path) for ingestion. | | `PICTURE_DESCRIPTIONS_ENABLED` | `false` | Enable picture descriptions. |