141 lines
No EOL
8.6 KiB
Text
141 lines
No EOL
8.6 KiB
Text
---
|
|
title: OpenSearch Knowledge
|
|
slug: /knowledge
|
|
---
|
|
|
|
import Icon from "@site/src/components/icon/icon";
|
|
import Tabs from '@theme/Tabs';
|
|
import TabItem from '@theme/TabItem';
|
|
import PartialModifyFlows from '@site/docs/_partial-modify-flows.mdx';
|
|
|
|
OpenRAG uses [OpenSearch](https://docs.opensearch.org/latest/) for its vector-backed knowledge store.
|
|
This is a specialized database for storing and retrieving embeddings, which helps your Agent efficiently find relevant information.
|
|
OpenSearch provides powerful hybrid search capabilities with enterprise-grade security and multi-tenancy support.
|
|
|
|
## Authentication and document access {#auth}
|
|
|
|
OpenRAG supports two authentication modes based on how you [install OpenRAG](/install), and which mode you choose affects document access.
|
|
|
|
**No-auth mode (Basic Setup)**: This mode uses a single anonymous JWT token for OpenSearch authentication, so documents uploaded to the `documents` index by one user are visible to all other users on the OpenRAG server.
|
|
|
|
**OAuth mode (Advanced Setup)**: Each OpenRAG user is granted a JWT token, and each document is tagged with user ownership. Documents are filtered by user ownership, ensuring users only see documents they uploaded or have access to.
|
|
|
|
## Ingest knowledge
|
|
|
|
OpenRAG supports knowledge ingestion through direct file uploads and OAuth connectors.
|
|
To configure the knowledge ingestion pipeline parameters, see [Docling Ingestion](/ingestion).
|
|
|
|
### Direct file ingestion
|
|
|
|
The **Knowledge Ingest** flow uses Langflow's [**File** component](https://docs.langflow.org/components-data#file) to split and embed files loaded from your local machine into the OpenSearch database.
|
|
|
|
The default path to your local folder is mounted from the `./documents` folder in your OpenRAG project directory to the `/app/documents/` directory inside the Docker container. Files added to the host or the container will be visible in both locations. To configure this location, modify the **Documents Paths** variable in either the TUI's [Advanced Setup](/install#setup) menu or in the `.env` used by Docker Compose.
|
|
|
|
To load and process a single file from the mapped location, click <Icon name="Plus" aria-hidden="true"/> **Add Knowledge**, and then click **Add File**.
|
|
The file is loaded into your OpenSearch database, and appears in the Knowledge page.
|
|
|
|
To load and process a directory from the mapped location, click <Icon name="Plus" aria-hidden="true"/> **Add Knowledge**, and then click **Process Folder**.
|
|
The files are loaded into your OpenSearch database, and appear in the Knowledge page.
|
|
|
|
### Ingest files through OAuth connectors {#oauth-ingestion}
|
|
|
|
OpenRAG supports Google Drive, OneDrive, and AWS S3 as OAuth connectors for seamless document synchronization.
|
|
|
|
OAuth integration allows individual users to connect their personal cloud storage accounts to OpenRAG. Each user must separately authorize OpenRAG to access their own cloud storage files. When a user connects a cloud service, they are redirected to authenticate with that service provider and grant OpenRAG permission to sync documents from their personal cloud storage.
|
|
|
|
Before users can connect their cloud storage accounts, you must configure OAuth credentials in OpenRAG. This requires registering OpenRAG as an OAuth application with a cloud provider and obtaining client ID and secret keys for each service you want to support.
|
|
|
|
To add an OAuth connector to OpenRAG, do the following.
|
|
This example uses Google OAuth.
|
|
If you wish to use another provider, add the secrets to another provider.
|
|
|
|
<Tabs groupId="Installation type">
|
|
<TabItem value="TUI" label="TUI" default>
|
|
1. If OpenRAG is running, stop it with **Status** > **Stop Services**.
|
|
2. Click **Advanced Setup**.
|
|
3. Add the OAuth provider's client and secret key in the [Advanced Setup](/install#setup) menu.
|
|
4. Click **Save Configuration**.
|
|
The TUI generates a new `.env` file with your OAuth values.
|
|
5. Click **Start Container Services**.
|
|
</TabItem>
|
|
<TabItem value=".env" label=".env">
|
|
1. Stop the Docker deployment.
|
|
2. Add the OAuth provider's client and secret key in the `.env` file for Docker Compose.
|
|
```bash
|
|
GOOGLE_OAUTH_CLIENT_ID='YOUR_OAUTH_CLIENT_ID'
|
|
GOOGLE_OAUTH_CLIENT_SECRET='YOUR_OAUTH_CLIENT_SECRET'
|
|
```
|
|
3. Save your `.env`. file.
|
|
4. Start the Docker deployment.
|
|
</TabItem>
|
|
</Tabs>
|
|
|
|
The OpenRAG frontend at `http://localhost:3000` now redirects to an OAuth callback login page for your OAuth provider.
|
|
A successful authentication opens OpenRAG with the required scopes for your connected storage.
|
|
|
|
To add knowledge from an OAuth-connected storage provider, do the following:
|
|
|
|
1. Click <Icon name="Plus" aria-hidden="true"/> **Add Knowledge**, and then select the storage provider, for example, **Google Drive**.
|
|
The **Add Cloud Knowledge** page opens.
|
|
2. To add files or folders from the connected storage, click <Icon name="Plus" aria-hidden="true"/> **Add Files**.
|
|
Select the files or folders you want and click **Select**.
|
|
You can select multiples.
|
|
3. When your files are selected, click **Ingest Files**.
|
|
The ingestion process may take some time, depending on the size of your documents.
|
|
4. When ingestion is complete, your documents are available in the Knowledge screen.
|
|
|
|
## Explore knowledge
|
|
|
|
The **Knowledge** page lists the documents OpenRAG has ingested into the OpenSearch vector database's `documents` index.
|
|
|
|
To explore your current knowledge, click <Icon name="Library" aria-hidden="true"/> **Knowledge**.
|
|
Click on a document to display the chunks derived from splitting the default documents into the vector database.
|
|
|
|
Documents are processed with the default **Knowledge Ingest** flow, so if you want to split your documents differently, edit the **Knowledge Ingest** flow.
|
|
|
|
<PartialModifyFlows />
|
|
|
|
## Create knowledge filters
|
|
|
|
OpenRAG includes a knowledge filter system for organizing and managing document collections.
|
|
Knowledge filters are saved search configurations that allow you to create custom views of your document collection. They store search queries, filter criteria, and display settings that can be reused across different parts of OpenRAG.
|
|
|
|
Knowledge filters help agents work more efficiently with large document collections by focusing their context within relevant documents sets.
|
|
|
|
To create a knowledge filter, do the following:
|
|
|
|
1. Click <Icon name="Funnel" aria-hidden="true"/> **All Knowledge**, and then click <Icon name="Plus" aria-hidden="true"/> **Create New Filter**.
|
|
The **Create New Knowledge Filter** pane appears.
|
|
2. Enter a **Name** and **Description**, and then click <Icon name="Save" aria-hidden="true"/> **Create Filter**.
|
|
A new filter is created with default settings that match everything.
|
|
3. To modify the default filter, click <Icon name="Funnel" aria-hidden="true"/> **All Knowledge**, and then click your new filter to edit it in the **Knowledge Filter** pane.
|
|
|
|
The following filter options are configurable.
|
|
|
|
* **Search Query**: Enter text for semantic search, such as "financial reports from Q4".
|
|
* **Data Sources**: Select specific data sources or folders to include.
|
|
* **Document Types**: Filter by file type.
|
|
* **Owners**: Filter by who uploaded the documents.
|
|
* **Sources**: Filter by connector types, such as local upload or Google Drive.
|
|
* **Result Limit**: Set maximum number of results. The default is `10`.
|
|
* **Score Threshold**: Set minimum relevance score. The default score is `0`.
|
|
|
|
4. When you're done editing the filter, click <Icon name="Save" aria-hidden="true"/> **Save Configuration**.
|
|
|
|
5. To apply the filter to OpenRAG globally, click <Icon name="Funnel" aria-hidden="true"/> **All Knowledge**, and then select the filter to apply.
|
|
|
|
To apply the filter to a single chat session, in the <Icon name="MessageSquare" aria-hidden="true"/> **Chat** window, click **@**, and then select the filter to apply.
|
|
|
|
## OpenRAG default configuration
|
|
|
|
OpenRAG automatically detects and configures the correct vector dimensions for embedding models, ensuring optimal search performance and compatibility.
|
|
|
|
The complete list of supported models is available at [`models_service.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/services/models_service.py).
|
|
|
|
You can use custom embedding models by specifying them in your configuration.
|
|
|
|
If you use an unknown embedding model, OpenRAG will automatically fall back to `1536` dimensions and log a warning. The system will continue to work, but search quality may be affected if the actual model dimensions differ from `1536`.
|
|
|
|
The default embedding dimension is `1536` and the default model is `text-embedding-3-small`.
|
|
|
|
For models with known vector dimensions, see [`settings.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py). |