openrag/docs/docs/core-components/knowledge.mdx
2025-11-26 09:10:32 -08:00

79 lines
No EOL
4.9 KiB
Text

---
title: Configure knowledge
slug: /knowledge
---
import Icon from "@site/src/components/icon/icon";
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
OpenRAG includes a built-in [OpenSearch](https://docs.opensearch.org/latest/) instance that serves as the underlying datastore for your _knowledge_ (documents).
This specialized database is used to store and retrieve your documents and the associated vector data (embeddings).
The documents in your OpenSearch knowledge base provide specialized context in addition to the general knowledge available to the language model that you select when you [install OpenRAG](/install) or [edit a flow](/agents).
You can [upload documents](/ingestion) from a variety of sources to populate your knowledge base with unique content, such as your own company documents, research papers, or websites.
Documents are processed through OpenRAG's knowledge ingestion flows with Docling.
Then, the [OpenRAG **Chat**](/chat) can run [similarity searches](https://www.ibm.com/think/topics/vector-search) against your OpenSearch database to retrieve relevant information and generate context-aware responses.
You can configure how documents are ingested and how the **Chat** interacts with your knowledge base.
## Browse knowledge {#browse-knowledge}
The **Knowledge** page lists the documents OpenRAG has ingested into your OpenSearch database, specifically in the `documents` index.
To explore the raw contents of your knowledge base, click <Icon name="Library" aria-hidden="true"/> **Knowledge** to get a list of all ingested documents.
Click a document to view the chunks produced from splitting the document during ingestion.
OpenRAG includes some sample documents that you can use to see how the agent references documents in the [**Chat**](/chat).
## OpenSearch authentication and document access {#auth}
When you [install OpenRAG](/install), you can choose between two setup modes: **Basic Setup** and **Advanced Setup**.
The mode you choose determines how OpenRAG authenticates with OpenSearch and controls access to documents:
* **Basic Setup (no-auth mode)**: If you choose **Basic Setup**, then OpenRAG is installed in no-auth mode.
This mode uses one, anonymous JWT token for OpenSearch authentication.
There is no differentiation between users.
All users that access your OpenRAG instance can access all documents uploaded to your OpenSearch `documents` index.
* **Advanced Setup (OAuth mode)**: If you choose **Advanced Setup**, then OpenRAG is installed in OAuth mode.
This mode uses a unique JWT token for each OpenRAG user, and each document is tagged with user ownership. Documents are filtered by user owner.
This means users see only the documents that they uploaded or have access to.
You can enable OAuth mode after installation.
For more information, see [Ingest files through OAuth connectors](/ingestion#oauth-ingestion).
## Set the embedding model and dimensions {#set-the-embedding-model-and-dimensions}
When you [install OpenRAG](/install), you select an embedding model during **Application Onboarding**.
OpenRAG automatically detects and configures the appropriate vector dimensions for your selected embedding model, ensuring optimal search performance and compatibility.
In the OpenRAG repository, you can find the complete list of supported models in [`models_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/models_service.py) and the corresponding vector dimensions in [`settings.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py).
The default embedding dimension is `1536` and the default model is `text-embedding-3-small`.
You can use any supported or unsupported embedding model by specifying the model in your OpenRAG configuration during installation.
If you use an unsupported embedding model that doesn't have defined dimensions in `settings.py`, then OpenRAG falls back to the default dimensions (1536) and logs a warning. OpenRAG's OpenSearch instance and flows continue to work, but [similarity search](https://www.ibm.com/think/topics/vector-search) quality can be affected if the actual model dimensions aren't 1536.
The embedding model setting is immutable.
To change the embedding model, you must [reinstall OpenRAG](/install#reinstall).
## Set ingestion parameters
For information about modifying ingestion parameters and flows, see [Ingest knowledge](/ingestion).
## Delete knowledge
To clear your entire knowledge base, you can delete the contents of the `./opensearch-data` folder in your OpenRAG installation directory, or you can [reset the OpenRAG containers](/install#tui-container-management).
Be aware that both of these operations are destructive and cannot be undone.
In particular, resetting containers reverts your OpenRAG instance to the initial state as though it were a fresh installation.
## See also
* [Ingest knowledge](/ingestion)
* [Filter knowledge](/knowledge-filters)
* [Chat with knowledge](/chat)