112 lines
5.6 KiB
Text
112 lines
5.6 KiB
Text
---
|
|
title: Knowledge stored with OpenSearch
|
|
slug: /knowledge
|
|
---
|
|
|
|
import Icon from "@site/src/components/icon/icon";
|
|
import Tabs from '@theme/Tabs';
|
|
import TabItem from '@theme/TabItem';
|
|
import PartialModifyFlows from '@site/docs/_partial-modify-flows.mdx';
|
|
|
|
OpenRAG uses [OpenSearch](https://docs.opensearch.org/latest/) for its vector-backed knowledge store.
|
|
OpenSearch provides powerful hybrid search capabilities with enterprise-grade security and multi-tenancy support.
|
|
|
|
## Explore knowledge
|
|
|
|
The Knowledge page lists the documents OpenRAG has ingested into the OpenSearch vector database's `documents` index.
|
|
|
|
To explore your current knowledge, click <Icon name="Library" aria-hidden="true"/> **Knowledge**.
|
|
Click on a document to display the chunks derived from splitting the default documents into the vector database.
|
|
|
|
Documents are processed with the default **Knowledge Ingest** flow, so if you want to split your documents differently, edit the **Knowledge Ingest** flow.
|
|
|
|
<PartialModifyFlows />
|
|
|
|
## Ingest knowledge
|
|
|
|
OpenRAG supports knowledge ingestion through direct file uploads and OAuth connectors.
|
|
|
|
### Direct file ingestion
|
|
|
|
The **Knowledge Ingest** flow uses Langflow's [**File** component](https://docs.langflow.org/components-data#file) to split and embed files loaded from your local machine into the OpenSearch database.
|
|
|
|
The default path to your local folder is mounted from the `./documents` folder in your OpenRAG project directory to the `/app/documents/` directory inside the Docker container. Files added to the host or the container will be visible in both locations. To configure this location, modify the **Documents Paths** variable in either the TUI's [Advanced Setup](/install#advanced-setup) or in the `.env` used by Docker Compose. Add multiple paths in a comma-separated list with no spaces. For example, `./documents,/Users/username/Documents`.
|
|
|
|
To load and process a single file from the mapped location, click <Icon name="Plus" aria-hidden="true"/> **Add Knowledge**, and then click **Add File**.
|
|
The file is loaded into your OpenSearch database, and appears in the Knowledge page.
|
|
|
|
To load and process a directory from the mapped location, click <Icon name="Plus" aria-hidden="true"/> **Add Knowledge**, and then click **Process Folder**.
|
|
The files are loaded into your OpenSearch database, and appear in the Knowledge page.
|
|
|
|
### Ingest files through OAuth connectors
|
|
|
|
OpenRAG supports the following enterprise-grade OAuth connectors for seamless document synchronization.
|
|
|
|
- **Google Drive**
|
|
- **OneDrive**
|
|
- **AWS**
|
|
|
|
OAuth integration allows your OpenRAG server to authenticate users and applications through any OAuth 2.0 compliant service. When users or applications connect to your server, they are redirected to your chosen OAuth provider to authenticate. Upon successful authentication, they are granted access to the connector.
|
|
|
|
Before configuring OAuth in OpenRAG, you must first set up an OAuth application with an external OAuth 2.0 service provider. You must register your OpenRAG server as an OAuth client, and then obtain the `client` and `secret` keys to complete the configuration in OpenRAG.
|
|
|
|
To add an OAuth connector to OpenRAG, do the following.
|
|
This example uses Google OAuth.
|
|
If you wish to use another provider, add the secrets to another provider.
|
|
|
|
<Tabs groupId="Installation type">
|
|
<TabItem value="TUI" label="TUI" default>
|
|
1. If OpenRAG is running, stop it with **Status** > **Stop Services**.
|
|
2. Click **Advanced Setup**.
|
|
3. Add the OAuth provider's client and secret key in the [Advanced Setup](/install#advanced-setup) menu.
|
|
4. Click **Save Configuration**.
|
|
The TUI generates a new `.env` file with your OAuth values.
|
|
5. Click **Start Container Services**.
|
|
</TabItem>
|
|
<TabItem value=".env" label=".env">
|
|
1. Stop the Docker deployment.
|
|
2. Add the OAuth provider's client and secret key in the `.env` file for Docker Compose.
|
|
```bash
|
|
GOOGLE_OAUTH_CLIENT_ID='YOUR_OAUTH_CLIENT_ID'
|
|
GOOGLE_OAUTH_CLIENT_SECRET='YOUR_OAUTH_CLIENT_SECRET'
|
|
```
|
|
3. Save your `.env`. file.
|
|
4. Start the Docker deployment.
|
|
</TabItem>
|
|
</Tabs>
|
|
|
|
The OpenRAG frontend at `http://localhost:3000` now redirects to an OAuth callback login page for your OAuth provider.
|
|
A successful authentication opens OpenRAG with the required scopes for your connected storage.
|
|
|
|
To add knowledge from an OAuth-connected storage provider, do the following:
|
|
|
|
1. Click <Icon name="Plus" aria-hidden="true"/> **Add Knowledge**, and then select the storage provider, for example, **Google Drive**.
|
|
The **Add Cloud Knowledge** page opens.
|
|
2. To add files or folders from the connected storage, click <Icon name="Plus" aria-hidden="true"/> **Add Files**.
|
|
Select the files or folders you want and click **Select**.
|
|
You can select multiples.
|
|
3. When your files are selected, click **Ingest Files**.
|
|
The ingestion process may take some time, depending on the size of your documents.
|
|
4. When ingestion is complete, your documents are available in the Knowledge screen.
|
|
|
|
## Knowledge Filter System
|
|
|
|
OpenRAG includes a knowledge filter system for organizing and managing document collections:
|
|
|
|
|
|
|
|
|
|
## OpenRAG default configuration
|
|
|
|
OpenRAG creates a specialized OpenSearch index called `documents` with the values defined at `src/config/settings.py`.
|
|
- **Vector Dimensions**: 1536-dimensional embeddings using OpenAI's `text-embedding-3-small` model.
|
|
- **KNN Vector Type**: Uses `knn_vector` field type with `disk_ann` method and `jvector` engine.
|
|
- **Distance Metric**: L2 (Euclidean) distance for vector similarity.
|
|
- **Performance Optimization**: Configured with `ef_construction: 100` and `m: 16` parameters.
|
|
|
|
OpenRAG supports hybrid search, which combines semantic and keyword search.
|
|
|
|
|
|
|
|
|
|
|