diff --git a/404.html b/404.html index 18d04046..1225d4e4 100644 --- a/404.html +++ b/404.html @@ -4,7 +4,7 @@
Ollama is not included with OpenRAG. To install Ollama, see the Ollama documentation.
http://localhost:11434.
-OpenRAG automatically transforms localhost to access services outside of the container, and sends a test connection to your Ollama server to confirm connectivity.Using Ollama for your OpenRAG language model provider offers greater flexibility and configuration, but can also be overwhelming to start. +These recommendations are a reasonable starting point for users with at least one GPU and experience running LLMs locally.
+For best performance, OpenRAG recommends OpenAI's gpt-oss:20b language model. However, this model uses 16GB of RAM, so consider using Ollama Cloud or running Ollama on a remote machine.
For generating embeddings, OpenRAG recommends the nomic-embed-text embedding model, which provides high-quality embeddings optimized for retrieval tasks.
To run models in Ollama Cloud, follow these steps:
+ollama signin to connect your local environment with Ollama Cloud.gpt-oss:20b-cloud model, or run ollama run gpt-oss:20b-cloud in a terminal.
+Ollama Cloud models are run at the same URL as your local Ollama server at http://localhost:11434, and automatically offloaded to Ollama's cloud service.http://localhost:11434.gpt-oss:20b-cloud model.To run models on a remote Ollama server, follow these steps:
+http://your-remote-server:11434.
+OpenRAG connects to the remote Ollama server and populates the lists with the server's available models.Ollama is not included with OpenRAG. To install Ollama, see the Ollama documentation.
http://localhost:11434.
-OpenRAG automatically transforms localhost to access services outside of the container, and sends a test connection to your Ollama server to confirm connectivity.Using Ollama for your OpenRAG language model provider offers greater flexibility and configuration, but can also be overwhelming to start. +These recommendations are a reasonable starting point for users with at least one GPU and experience running LLMs locally.
+For best performance, OpenRAG recommends OpenAI's gpt-oss:20b language model. However, this model uses 16GB of RAM, so consider using Ollama Cloud or running Ollama on a remote machine.
For generating embeddings, OpenRAG recommends the nomic-embed-text embedding model, which provides high-quality embeddings optimized for retrieval tasks.
To run models in Ollama Cloud, follow these steps:
+ollama signin to connect your local environment with Ollama Cloud.gpt-oss:20b-cloud model, or run ollama run gpt-oss:20b-cloud in a terminal.
+Ollama Cloud models are run at the same URL as your local Ollama server at http://localhost:11434, and automatically offloaded to Ollama's cloud service.http://localhost:11434.gpt-oss:20b-cloud model.To run models on a remote Ollama server, follow these steps:
+http://your-remote-server:11434.
+OpenRAG connects to the remote Ollama server and populates the lists with the server's available models.