From 45690da3f941c44779532297e517bba1c6d67e9b Mon Sep 17 00:00:00 2001 From: phact Date: Tue, 7 Oct 2025 13:01:08 -0400 Subject: [PATCH 1/4] v0.1.17 --- pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyproject.toml b/pyproject.toml index 6c74348b..7800ec8e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "openrag" -version = "0.1.16" +version = "0.1.17" description = "Add your description here" readme = "README.md" requires-python = ">=3.13" From f48d3e5a63ef66fb5633d5b4970f6fcab2333f3f Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Tue, 7 Oct 2025 13:58:42 -0400 Subject: [PATCH 2/4] knowledge-ingestion-flows --- docs/docs/core-components/ingestion.mdx | 27 ++++++++++++++++++++++++- docs/docs/core-components/knowledge.mdx | 5 +---- 2 files changed, 27 insertions(+), 5 deletions(-) diff --git a/docs/docs/core-components/ingestion.mdx b/docs/docs/core-components/ingestion.mdx index d3ce81b0..6f327a42 100644 --- a/docs/docs/core-components/ingestion.mdx +++ b/docs/docs/core-components/ingestion.mdx @@ -50,4 +50,29 @@ If you want to use OpenRAG's built-in pipeline instead of Docling serve, set `DI The built-in pipeline still uses the Docling processor, but uses it directly without the Docling Serve API. -For more information, see [`processors.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/models/processors.py#L58). \ No newline at end of file +For more information, see [`processors.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/models/processors.py#L58). + +## Knowledge ingestion flows + +[Flows](https://docs.langflow.org/concepts-overview) in Langflow are functional representations of application workflows, with multiple [component](https://docs.langflow.org/concepts-components) nodes connected as single steps in a workflow. + +The **OpenSearch Ingestion** flow is the default knowledge ingestion flow in OpenRAG: when you **Add Knowledge** in OpenRAG, you run the OpenSearch Ingestion flow in the background. The flow ingests documents using **Docling Serve** to import and process documents. + +This flow contains ten components connected together to process and store documents in your knowledge base: + +* The [**Docling Serve** component](https://docs.langflow.org/bundles-docling) processes input documents by connecting to your instance of Docling Serve. +* The [**Export DoclingDocument** component](https://docs.langflow.org/components-docling) exports the processed DoclingDocument to markdown format with image export mode set to placeholder. This conversion makes the structured document data into a standardized format for further processing. +* Three [**DataFrame Operations** components](https://docs.langflow.org/components-processing#dataframe-operations) sequentially add metadata columns to the document data of `filename`, `file_size`, and `mimetype`. +* The [**Split Text** component](https://docs.langflow.org/components-processing#split-text) splits the processed text into chunks with a chunk size of 1000 characters and an overlap of 200 characters. +* Four **Secret Input** components provide secure access to configuration variables: `CONNECTOR_TYPE`, `OWNER`, `OWNER_EMAIL`, and `OWNER_NAME`. These are runtime variables populated from OAuth login. +* The **Create Data** component combines the secret inputs into a structured data object that will be associated with the document embeddings. +* The [**Embedding Model** component](https://docs.langflow.org/components-embedding-models) generates vector embeddings using OpenAI's `text-embedding-3-small` model. The embedding model is selected at [Application onboarding] and cannot be changed. +* The [**OpenSearch** component](https://docs.langflow.org/bundles-elastic#opensearch) stores the processed documents and their embeddings in the `documents` index at `https://opensearch:9200`. + + + + +### OpenSearch URL Ingestion flow + +An additional knowledge ingestion flow is included in OpenRAG. +The **OpenSearch URL Ingestion flow** \ No newline at end of file diff --git a/docs/docs/core-components/knowledge.mdx b/docs/docs/core-components/knowledge.mdx index d2a74ca4..7ff80ab6 100644 --- a/docs/docs/core-components/knowledge.mdx +++ b/docs/docs/core-components/knowledge.mdx @@ -18,6 +18,7 @@ OpenSearch provides powerful hybrid search capabilities with enterprise-grade se ## Ingest knowledge OpenRAG supports knowledge ingestion through direct file uploads and OAuth connectors. +To configure the knowledge ingestion pipeline parameters, see [Docling Ingestion](/ingestion). ### Direct file ingestion @@ -101,10 +102,6 @@ Documents are processed with the default **Knowledge Ingest** flow, so if you wa -### Knowledge ingestion settings - -To configure the knowledge ingestion pipeline parameters, see [Docling Ingestion](/ingestion). - ## Create knowledge filters OpenRAG includes a knowledge filter system for organizing and managing document collections. From 35b76bbfe9ef8b5f310296411458970023307247 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Tue, 7 Oct 2025 14:52:43 -0400 Subject: [PATCH 3/4] updated-ingestion-flow --- docs/docs/core-components/agents.mdx | 2 +- docs/docs/core-components/ingestion.mdx | 12 +++++++----- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/docs/core-components/agents.mdx b/docs/docs/core-components/agents.mdx index 3ee4617b..0a0494e9 100644 --- a/docs/docs/core-components/agents.mdx +++ b/docs/docs/core-components/agents.mdx @@ -34,7 +34,7 @@ In an agentic context, tools are functions that the agent can run to perform tas -## Use the OpenRAG OpenSearch Agent flow +## Use the OpenRAG OpenSearch Agent flow {flow} If you've chatted with your knowledge in OpenRAG, you've already experienced the OpenRAG OpenSearch Agent chat flow. To switch OpenRAG over to the [Langflow visual editor](https://docs.langflow.org/concepts-overview) and view the OpenRAG OpenSearch Agentflow, click