From 5c45395df5941684088111860e2a0c6610e300a7 Mon Sep 17 00:00:00 2001
From: April M <36110273+aimurphy@users.noreply.github.com>
Date: Fri, 16 Jan 2026 14:52:38 -0800
Subject: [PATCH] ingestion status
---
docs/docs/_partial-anonymous-user-owner.mdx | 1 +
docs/docs/_partial-gpu-mode-tip.mdx | 2 +-
docs/docs/core-components/ingestion.mdx | 51 +++++++++++--------
.../core-components/knowledge-filters.mdx | 24 ++++++---
docs/docs/core-components/knowledge.mdx | 48 +++++++++++++----
docs/docs/get-started/docker.mdx | 2 +-
6 files changed, 90 insertions(+), 38 deletions(-)
create mode 100644 docs/docs/_partial-anonymous-user-owner.mdx
diff --git a/docs/docs/_partial-anonymous-user-owner.mdx b/docs/docs/_partial-anonymous-user-owner.mdx
new file mode 100644
index 00000000..35544305
--- /dev/null
+++ b/docs/docs/_partial-anonymous-user-owner.mdx
@@ -0,0 +1 @@
+In no-auth mode, all documents are attributed to **Anonymous User** because there is no distinct document ownership or unique JWTs. For more control over document ownership and visibility, use OAuth mode. For more information, see [OpenSearch authentication and document access](/knowledge#auth).
\ No newline at end of file
diff --git a/docs/docs/_partial-gpu-mode-tip.mdx b/docs/docs/_partial-gpu-mode-tip.mdx
index d9d229fb..f6e672db 100644
--- a/docs/docs/_partial-gpu-mode-tip.mdx
+++ b/docs/docs/_partial-gpu-mode-tip.mdx
@@ -2,4 +2,4 @@ GPU acceleration isn't required for most use cases.
OpenRAG's CPU-only deployment doesn't prevent you from using GPU acceleration in external services, such as Ollama servers.
GPU acceleration is required only for specific use cases, typically involving customization of the ingestion flows or ingestion logic.
-For example, writing alternate ingest logic in OpenRAG that uses GPUs directly in the container, or customizing the ingestion flows to use Langflow's Docling component with GPU acceleration instead of OpenRAG's `docling serve` service.
\ No newline at end of file
+For example, writing alternate ingest logic in OpenRAG that uses GPUs directly in the container, or customizing the ingestion flows to use Langflow's Docling component with GPU acceleration instead of OpenRAG's Docling Serve service.
\ No newline at end of file
diff --git a/docs/docs/core-components/ingestion.mdx b/docs/docs/core-components/ingestion.mdx
index 5f456a7e..386ae382 100644
--- a/docs/docs/core-components/ingestion.mdx
+++ b/docs/docs/core-components/ingestion.mdx
@@ -171,26 +171,37 @@ The agent can call this component to fetch web content from a given URL, and the
Like all OpenRAG flows, you can [inspect the flow in Langflow](/agents#inspect-and-modify-flows), and you can customize it.
For more information about MCP in Langflow, see the Langflow documentation on [MCP clients](https://docs.langflow.org/mcp-client) and [MCP servers](https://docs.langflow.org/mcp-tutorial).
-## Monitor ingestion
+## Monitor ingestion {#monitor-ingestion}
-Document ingestion tasks run in the background.
+Depending on the amount of data to ingest, document ingestion can take a few seconds, minutes, or longer.
+For this reason, document ingestion tasks run in the background.
In the OpenRAG user interface, a badge is shown on **Tasks** when OpenRAG tasks are active.
-Click **Tasks** to inspect and cancel tasks:
+Click **Tasks** to inspect and cancel tasks.
+Tasks are separated into multiple sections:
-* **Active Tasks**: All tasks that are **Pending**, **Running**, or **Processing**.
-For each active task, depending on its state, you can find the task ID, start time, duration, number of files processed, and the total files enqueued for processing.
+* The **Active Tasks** section includes all tasks that are **Pending**, **Running**, or **Processing**:
-* **Pending**: The task is queued and waiting to start.
+ * **Pending**: The task is queued and waiting to start.
+ * **Running**: The task is actively processing files.
+ * **Processing**: The task is performing ingestion operations.
-* **Running**: The task is actively processing files.
+ To stop an active task, click **Cancel**. Canceling a task stops processing immediately and marks the ingestion as failed.
-* **Processing**: The task is performing ingestion operations.
+* The **Recent Tasks** section lists recently finished tasks.
-* **Failed**: Something went wrong during ingestion, or the task was manually canceled.
-For troubleshooting advice, see [Troubleshoot ingestion](#troubleshoot-ingestion).
+ :::warning
+ **Completed** doesn't mean success.
-To stop an active task, click **Cancel**. Canceling a task stops processing immediately and marks the task as **Failed**.
+ A completed task can report successful ingestions, failed ingestions, or both, depending on the number of files processed.
+ :::
+
+ Check the **Success** and **Failed** counts for each completed task to determine the overall success rate.
+
+ **Failed** means something went wrong during ingestion, or the task was manually canceled.
+ For more information, see [Troubleshoot ingestion](#troubleshoot-ingestion).
+
+For each task, depending on its state, you can find the task ID, start time, duration, number of files processed successfully, number of files that failed, and the number of files enqueued for processing.
### Ingestion performance expectations
@@ -247,9 +258,9 @@ The following issues can occur during document ingestion.
If an ingestion task fails, do the following:
-* Make sure you are uploading supported file types.
-* Split excessively large files into smaller files before uploading.
-* Remove unusual embedded content, such as videos or animations, before uploading. Although Docling can replace some non-text content with placeholders during ingestion, some embedded content might cause errors.
+* Make sure you uploaded only supported file types.
+* Split very large files into smaller files.
+* Remove unusual or complex embedded content, such as videos or animations. Although Docling can replace some non-text content with placeholders during ingestion, some embedded content might cause errors.
* Make sure your Podman/Docker VM has sufficient memory for the ingestion tasks.
The minimum recommendation is 8 GB of RAM.
If you regularly upload large files, more RAM is recommended.
@@ -261,17 +272,17 @@ For more information, see [Memory issue with Podman on macOS](/support/troublesh
If the OpenRAG **Chat** doesn't seem to use your documents correctly, [browse your knowledge base](/knowledge#browse-knowledge) to confirm that the documents are uploaded in full, and the chunks are correct.
If the documents are present and well-formed, check your [knowledge filters](/knowledge-filters).
-If a global filter is applied, make sure the expected documents are included in the global filter.
-If the global filter excludes any documents, the agent cannot access those documents unless you apply a chat-level filter or change the global filter.
+If you applied a filter to the chat, make sure the expected documents aren't excluded by the filter settings.
+You can test this by applying the filter when you [browse the knowledge base](/knowledge#browse-knowledge).
+If the filter excludes any documents, the agent cannot access those documents.
+Be aware that some settings create dynamic filters that don't always produce the same results, such as a **Search query** combined with a low **Response limit**.
-If text is missing or incorrectly processed, you need to reupload the documents after modifying the ingestion parameters or the documents themselves.
+If the document chunks have missing, incorrect, or unexpected text, you must [delete the documents](/knowledge#delete-knowledge) from your knowledge base, modify the [ingestion parameters](/knowledge#knowledge-ingestion-settings) or the documents themselves, and then reingest the documents.
For example:
* Break combined documents into separate files for better metadata context.
* Make sure scanned documents are legible enough for extraction, and enable the **OCR** option. Poorly scanned documents might require additional preparation or rescanning before ingestion.
-* Adjust the **Chunk Size** and **Chunk Overlap** settings to better suit your documents. Larger chunks provide more context but can include irrelevant information, while smaller chunks yield more precise semantic search but can lack context.
-
-For more information about modifying ingestion parameters and flows, see [Knowledge ingestion settings](/knowledge#knowledge-ingestion-settings).
+* Adjust the **Chunk size** and **Chunk overlap** settings to better suit your documents. Larger chunks provide more context but can include irrelevant information, while smaller chunks yield more precise semantic search but can lack context.
## See also
diff --git a/docs/docs/core-components/knowledge-filters.mdx b/docs/docs/core-components/knowledge-filters.mdx
index f2a2c2b6..351dae01 100644
--- a/docs/docs/core-components/knowledge-filters.mdx
+++ b/docs/docs/core-components/knowledge-filters.mdx
@@ -4,6 +4,7 @@ slug: /knowledge-filters
---
import Icon from "@site/src/components/icon/icon";
+import PartialAnonymousUserOwner from '@site/docs/_partial-anonymous-user-owner.mdx';
OpenRAG's knowledge filters help you organize and manage your [knowledge base](/knowledge) by creating pre-defined views of your documents.
@@ -41,16 +42,25 @@ This is purely cosmetic, but it can help you visually distinguish different sets
Use the filter settings to narrow the scope of documents that the filter captures:
* **Search Query**: Enter a natural language text string for semantic search.
- When you apply a filter that has a **Search Query**, only documents matching the search query are included.
- It is recommended that you use the **Score Threshold** setting to avoid returning irrelevant documents.
+
+ When you apply a filter that has a **Search Query**, only documents matching the search query are included.
+ It is recommended that you also use the **Score Threshold** setting to avoid returning irrelevant documents.
+
* **Data Sources**: Select specific files and folders to include in the filter.
- This is useful if you want to create a filter for a specific project or topic and you know the specific documents you want to include.
- Similarly, if you upload a folder of documents, you might want to create a filter that only includes the documents from that folder.
+
+ This is useful if you want to create a filter for a specific project or topic and you know the specific documents you want to include.
+ Similarly, if you upload a folder of documents or enable an OAuth connector, you might want to create a filter that only includes the documents from that source.
+
* **Document Types**: Filter by file type.
+
* **Owners**: Filter by the user that uploaded the documents.
- **Anonymous User** means a document was uploaded in an OpenRAG environment where OAuth isn't configured.
- * **Connectors**: Filter by [upload source](/ingestion), such as the local file system or a Google Drive OAuth connector.
+
+
+
+ * **Connectors**: Filter by [upload source](/ingestion), such as the local file system or an OAuth connector.
+
* **Response Limit**: Set the maximum number of results to return from the knowledge base. The default is `10`, which means the filter returns only the top 10 most relevant documents.
+
* **Score Threshold**: Set the minimum relevance score for similarity search. The default score is `0`. A threshold is recommended to avoid returned irrelevant documents.
6. Click **Create Filter**.
@@ -65,7 +75,7 @@ On the filter settings pane, edit the filter as desired, and then click **Update
In the OpenRAG **Chat**, click **Filter**, and then select the filter to apply.
Chat filters apply to one chat session only.
-You can also use filters when [browsing the **Knowledge** page](/knowledge#browse-knowledge).
+You can also use filters when [browsing your knowledge base](/knowledge#browse-knowledge).
This is a helpful way to test filters and manage knowledge bases that have many documents.
## Delete a filter
diff --git a/docs/docs/core-components/knowledge.mdx b/docs/docs/core-components/knowledge.mdx
index 6686475c..53900987 100644
--- a/docs/docs/core-components/knowledge.mdx
+++ b/docs/docs/core-components/knowledge.mdx
@@ -5,6 +5,7 @@ slug: /knowledge
import Icon from "@site/src/components/icon/icon";
import PartialOpenSearchAuthMode from '@site/docs/_partial-opensearch-auth-mode.mdx';
+import PartialAnonymousUserOwner from '@site/docs/_partial-anonymous-user-owner.mdx';
OpenRAG includes a built-in [OpenSearch](https://docs.opensearch.org/latest/) instance that serves as the underlying datastore for your _knowledge_ (documents).
This specialized database is used to store and retrieve your documents and the associated vector data (embeddings).
@@ -24,31 +25,60 @@ The **Knowledge** page lists the documents OpenRAG has ingested into your OpenSe
To explore the raw contents of your knowledge base, click **Knowledge** to get a list of all ingested documents.
-Click a document to view the chunks produced from splitting the document during ingestion as well as technical details related to chunking.
+### Inspect knowledge
-For each document, the **Knowledge** page provides metadata, including the size, type, user that uploaded the document, the number of chunks created from the document, and the embedding model and dimensions used to embed the document.
+For each document, the **Knowledge** page provides the following information:
-The search field at the top of the **Knowledge** page allows you to search for specific documents by name, contents, or with a knowledge filter:
+* **Source**: Name of the ingested content, such as the file name.
-* To search with a text string, enter your search string in the search field, and then press Enter.
+* **Size**
-* To apply a [knowledge filter](/knowledge-filters) when browsing your knowledge base, click the filter in the **Knowledge Filters** list.
+* **Type**
+
+* **Owner**: User that uploaded the document.
+
+
+
+* **Chunks**: Number of chunks created by splitting the document during ingestion.
+
+ Click a document to view the individual chunks and technical details related to chunking.
+ If the chunks seem incorrect or incomplete, see [Troubleshoot ingestion](/ingestion#troubleshoot-ingestion).
+
+* **Avg score**: Average similarity score across all chunks of the document.
+
+ If you [search the knowledge base](#search-knowledge), the **Avg score** column shows the similarity score for your search query or filter.
+
+* **Embedding model** and **Dimensions**: The embedding model and dimensions used to embed the chunks.
+
+* **Status**: Status of document ingestion.
+If ingestion is complete and successful, then the status is **Active**.
+For more information, see [Monitor ingestion](/ingestion#monitor-ingestion).
+
+### Search knowledge {#search-knowledge}
+
+You can use the search field on the **Knowledge** page to find documents using semantic search and knowledge filters:
+
+To search all documents, enter a search string in the search field, and then press Enter.
+
+To apply a [knowledge filter](/knowledge-filters), select the filter from the **Knowledge Filters** list.
The filter settings pane opens, and the filter appears in the search field.
To remove the filter, close the filter settings pane or clear the filter from the search field.
- If a knowledge filter contains a search query, that query is applied in addition to any additional string you enter in the search field.
+You can use the filter alone or in combination with a search string.
+If a knowledge filter has a **Search Query**, that query is applied in addition to any text string you enter in the search field.
-When you search, the **Avg score** column shows how relevant each document is to your search query.
+Only one filter can be applied at a time.
### Default documents {#default-documents}
By default, OpenRAG includes some initial documents about OpenRAG.
These documents are ingested automatically during the [application onboarding process](/install#application-onboarding).
-You can use these documents to ask OpenRAG about itself, and to test the [**Chat**](/chat) feature before uploading your own documents.
+You can use these documents to ask OpenRAG about itself, or to test the [**Chat**](/chat) feature before uploading your own documents.
-If you [delete](#delete-knowledge) these documents, you won't be able to ask OpenRAG about itself and it's own functionality.
+If you [delete these documents](#delete-knowledge), then you won't be able to ask OpenRAG about itself and it's own functionality.
It is recommended that you keep these documents, and use [filters](/knowledge-filters) to separate them from your other knowledge.
+An **OpenRAG Docs** filter is created automatically for these documents.
## OpenSearch authentication and document access {#auth}
diff --git a/docs/docs/get-started/docker.mdx b/docs/docs/get-started/docker.mdx
index 1fc4e798..0c65e9c5 100644
--- a/docs/docs/get-started/docker.mdx
+++ b/docs/docs/get-started/docker.mdx
@@ -100,7 +100,7 @@ The following variables are required or recommended:
This port is required to deploy OpenRAG successfully; don't use a different port.
Additionally, this enables the [MLX framework](https://opensource.apple.com/projects/mlx/) for accelerated performance on Apple Silicon Mac machines.
-2. Confirm `docling serve` is running.
+2. Confirm `docling serve` is running:
```bash
uv run python scripts/docling_ctl.py status