knowledge-ingestion-flows

2025-10-07 13:58:42 -04:00 · 2025-10-07 13:58:42 -04:00 · f48d3e5a63
commit f48d3e5a63
parent 8aa945b930
2 changed files with 27 additions and 5 deletions
--- a/docs/docs/core-components/ingestion.mdx
+++ b/docs/docs/core-components/ingestion.mdx
@ -50,4 +50,29 @@ If you want to use OpenRAG's built-in pipeline instead of Docling serve, set `DI
 The built-in pipeline still uses the Docling processor, but uses it directly without the Docling Serve API.
-For more information, see [`processors.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/models/processors.py#L58).
+For more information, see [`processors.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/models/processors.py#L58).
 ## Knowledge ingestion flows
 [Flows](https://docs.langflow.org/concepts-overview) in Langflow are functional representations of application workflows, with multiple [component](https://docs.langflow.org/concepts-components) nodes connected as single steps in a workflow.
 The **OpenSearch Ingestion** flow is the default knowledge ingestion flow in OpenRAG: when you **Add Knowledge** in OpenRAG, you run the OpenSearch Ingestion flow in the background. The flow ingests documents using **Docling Serve** to import and process documents.
 This flow contains ten components connected together to process and store documents in your knowledge base:
 * The [**Docling Serve** component](https://docs.langflow.org/bundles-docling) processes input documents by connecting to your instance of Docling Serve.
 * The [**Export DoclingDocument** component](https://docs.langflow.org/components-docling) exports the processed DoclingDocument to markdown format with image export mode set to placeholder. This conversion makes the structured document data into a standardized format for further processing.
 * Three [**DataFrame Operations** components](https://docs.langflow.org/components-processing#dataframe-operations) sequentially add metadata columns to the document data of `filename`, `file_size`, and `mimetype`.
 * The [**Split Text** component](https://docs.langflow.org/components-processing#split-text) splits the processed text into chunks with a chunk size of 1000 characters and an overlap of 200 characters.
 * Four **Secret Input** components provide secure access to configuration variables: `CONNECTOR_TYPE`, `OWNER`, `OWNER_EMAIL`, and `OWNER_NAME`. These are runtime variables populated from OAuth login.
 * The **Create Data** component combines the secret inputs into a structured data object that will be associated with the document embeddings.
 * The [**Embedding Model** component](https://docs.langflow.org/components-embedding-models) generates vector embeddings using OpenAI's `text-embedding-3-small` model. The embedding model is selected at [Application onboarding] and cannot be changed.
 * The [**OpenSearch** component](https://docs.langflow.org/bundles-elastic#opensearch) stores the processed documents and their embeddings in the `documents` index at `https://opensearch:9200`.
 <PartialModifyFlows />
 ### OpenSearch URL Ingestion flow
 An additional knowledge ingestion flow is included in OpenRAG.
 The **OpenSearch URL Ingestion flow**
--- a/docs/docs/core-components/knowledge.mdx
+++ b/docs/docs/core-components/knowledge.mdx
@ -18,6 +18,7 @@ OpenSearch provides powerful hybrid search capabilities with enterprise-grade se
 ## Ingest knowledge
 OpenRAG supports knowledge ingestion through direct file uploads and OAuth connectors.
 To configure the knowledge ingestion pipeline parameters, see [Docling Ingestion](/ingestion).
 ### Direct file ingestion
@ -101,10 +102,6 @@ Documents are processed with the default **Knowledge Ingest** flow, so if you wa
 <PartialModifyFlows />
 ### Knowledge ingestion settings
 To configure the knowledge ingestion pipeline parameters, see [Docling Ingestion](/ingestion).
 ## Create knowledge filters
 OpenRAG includes a knowledge filter system for organizing and managing document collections.