openrag/docs/docs/_partial-ollama.mdx

import Icon from "@site/src/components/icon/icon";

Using Ollama for your OpenRAG language model provider offers greater flexibility and configuration, but can also be overwhelming to start.
These recommendations are a reasonable starting point for users with at least one GPU and experience running LLMs locally.

For best performance, OpenRAG recommends OpenAI's `gpt-oss:20b` language model. However, this model uses 16GB of RAM, so consider using Ollama Cloud or running Ollama on a remote machine.

For generating embeddings, OpenRAG recommends the [`nomic-embed-text`](https://ollama.com/library/nomic-embed-text) embedding model, which provides high-quality embeddings optimized for retrieval tasks.

To run models in [**Ollama Cloud**](https://docs.ollama.com/cloud), follow these steps:

    1. Sign in to Ollama Cloud.
    In a terminal, enter `ollama signin` to connect your local environment with Ollama Cloud.
    2. To run the model, in Ollama, select the `gpt-oss:20b-cloud` model, or run `ollama run gpt-oss:20b-cloud` in a terminal.
    Ollama Cloud models are run at the same URL as your local Ollama server at `http://localhost:11434`, and automatically offloaded to Ollama's cloud service.
    3. Connect OpenRAG to the same local Ollama server as you would for local models in onboarding, using the default address of `http://localhost:11434`.
    4. In the **Language model** field, select the `gpt-oss:20b-cloud` model.
<br></br>
To run models on a **remote Ollama server**, follow these steps:

    1. Ensure your remote Ollama server is accessible from your OpenRAG instance.
    2. In the **Ollama Base URL** field, enter your remote Ollama server's base URL, such as `http://your-remote-server:11434`.
    OpenRAG connects to the remote Ollama server and populates the lists with the server's available models.
    3. Select your **Embedding model** and **Language model** from the available options.