Merge branch 'main' into feat-add-tui-llm-providers

2025-11-21 16:02:53 -05:00 · 2025-11-21 16:02:53 -05:00 · 40e076f742
commit 40e076f742
parent 4bf4e42840 512660cea0
2 changed files with 75 additions and 98 deletions
--- a/docs/docs/core-components/ingestion.mdx
+++ b/docs/docs/core-components/ingestion.mdx
@ -78,4 +78,43 @@ If you want to use OpenRAG's built-in pipeline instead of Docling serve, set `DI

 The built-in pipeline still uses the Docling processor, but uses it directly without the Docling Serve API.

-For more information, see [`processors.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/models/processors.py#L58).
+For more information, see [`processors.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/models/processors.py#L58).
+
+## Performance expectations
+
+On a local VM with 7 vCPUs and 8 GiB RAM, OpenRAG ingested approximately 5.03 GB across 1,083 files in about 42 minutes.
+This equates to approximately 2.4 documents per second.
+
+You can generally expect equal or better performance on developer laptops and significantly faster on servers.
+Throughput scales with CPU cores, memory, storage speed, and configuration choices such as embedding model, chunk size and overlap, and concurrency.
+
+This test returned 12 errors (approximately 1.1%).
+All errors were file-specific, and they didn't stop the pipeline.
+
+Ingestion dataset:
+
+* Total files: 1,083 items mounted
+* Total size on disk: 5,026,474,862 bytes (approximately 5.03 GB)
+
+Hardware specifications:
+
+* Machine: Apple M4 Pro
+* Podman VM:
+  * Name: `podman-machine-default`
+  * Type: `applehv`
+  * vCPUs: 7
+  * Memory: 8 GiB
+  * Disk size: 100 GiB
+
+Test results:
+
+```text
+2025-09-24T22:40:45.542190Z /app/src/main.py:231 Ingesting default documents when ready disable_langflow_ingest=False
+2025-09-24T22:40:45.546385Z /app/src/main.py:270 Using Langflow ingestion pipeline for default documents file_count=1082
+...
+2025-09-24T23:19:44.866365Z /app/src/main.py:351 Langflow ingestion completed success_count=1070 error_count=12 total_files=1082
+```
+
+Elapsed time: ~42 minutes 15 seconds (2,535 seconds)
+
+Throughput: ~2.4 documents/second
--- a/docs/docs/get-started/what-is-openrag.mdx
+++ b/docs/docs/get-started/what-is-openrag.mdx
@ -1,125 +1,63 @@
 ---
 title: What is OpenRAG?
 slug: /
+hide_table_of_contents: true
 ---

 OpenRAG is an open-source package for building agentic RAG systems that integrates with a wide range of orchestration tools, vector databases, and LLM providers.

 OpenRAG connects and amplifies three popular, proven open-source projects into one powerful platform:

-* [Langflow](https://docs.langflow.org): Langflow is a versatile tool for building and deploying AI agents and MCP servers. It supports all major LLMs, vector databases, and a growing library of AI tools. 
+* [Langflow](https://docs.langflow.org): Langflow is a versatile tool for building and deploying AI agents and MCP servers. It supports all major LLMs, vector databases, and a growing library of AI tools.

 * [OpenSearch](https://docs.opensearch.org/latest/): OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.

-* [Docling](https://docling-project.github.io/docling/): Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem. 
+* [Docling](https://docling-project.github.io/docling/): Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.

-OpenRAG builds on Langflow's familiar interface while adding OpenSearch for vector storage and Docling for simplified document parsing, with opinionated flows that serve as ready-to-use recipes for ingestion, retrieval, and generation from popular sources like Google Drive, OneDrive, and Sharepoint.
+OpenRAG builds on Langflow's familiar interface while adding OpenSearch for vector storage and Docling for simplified document parsing. It uses opinionated flows that serve as ready-to-use recipes for ingestion, retrieval, and generation from familiar sources like Google Drive, OneDrive, and SharePoint.

-What's more, every part of the stack is swappable. Write your own custom components in Python, try different language models, and customize your flows to build an agentic RAG system.
+What's more, every part of the stack is interchangeable: You can write your own custom components in Python, try different language models, and customize your flows to build a personalized agentic RAG system.

-Ready to get started? [Install OpenRAG](/install) and then run the [Quickstart](/quickstart) to create a powerful RAG pipeline.
+:::tip
+Ready to get started? Try the [quickstart](/quickstart) to install OpenRAG and start exploring in minutes.
+:::

 ## OpenRAG architecture

 OpenRAG deploys and orchestrates a lightweight, container-based architecture that combines **Langflow**, **OpenSearch**, and **Docling** into a cohesive RAG platform.

 ```mermaid
-%%{init: {'theme': 'dark', 'flowchart': {'useMaxWidth': false, 'width': '100%'}}}%%
-flowchart LR
-    %% Encapsulate the entire diagram in a rectangle with black background
-    subgraph DiagramContainer["OpenRAG Architecture"]
-        style DiagramContainer fill:#000000,stroke:#ffffff,color:white,stroke-width:2px
-
-        %% Define subgraphs for the different sections
-        subgraph LocalService["Local Service"]
-            DoclingSrv[Docling Serve]
-            style DoclingSrv fill:#a8d1ff,stroke:#0066cc,color:black,stroke-width:2px
-        end
-
-        subgraph Containers
-            Backend["OpenRAG Backend"]
-            style Backend fill:#e6ffe6,stroke:#006600,color:black,stroke-width:2px
-            Langflow
-            style Langflow fill:#e6ffe6,stroke:#006600,color:black,stroke-width:2px
-            OpenSearch
-            style OpenSearch fill:#e6ffe6,stroke:#006600,color:black,stroke-width:2px
-            Frontend["OpenRAG Frontend"]
-            style Frontend fill:#ffcc99,stroke:#ff6600,color:black,stroke-width:2px
-        end
-
-        subgraph ThirdParty["Third Party Services"]
-            GoogleDrive["Google Drive"]
-            style GoogleDrive fill:#f2e6ff,stroke:#6600cc,color:black,stroke-width:2px
-            OneDrive
-            style OneDrive fill:#f2e6ff,stroke:#6600cc,color:black,stroke-width:2px
-            SharePoint["SharePoint"]
-            style SharePoint fill:#f2e6ff,stroke:#6600cc,color:black,stroke-width:2px
-            More[...]
-            style More fill:#f2e6ff,stroke:#6600cc,color:black,stroke-width:2px
-        end
-
-        %% Define connections
-        DoclingSrv --> Backend
-        GoogleDrive --> Backend
-        OneDrive --> Backend
-        SharePoint --> Backend
-        More --> Backend
-        Backend --> Langflow
-        Langflow <--> OpenSearch
-        Backend <--> Frontend
-
-        %% Style subgraphs
-        style LocalService fill:#333333,stroke:#666666,color:white,stroke-width:2px
-        style Containers fill:#444444,stroke:#666666,color:white,stroke-width:2px
-        style ThirdParty fill:#333333,stroke:#666666,color:white,stroke-width:2px
+---
+config:
+  theme: 'base'
+  themeVariables:
+    lineColor: '#2e8555'
+---
+flowchart TD
+    subgraph Containers
+    backend[OpenRAG Backend] --> langflow[Langflow]
+    langflow <--> opensearch[OpenSearch]
+    backend <--> frontend[OpenRAG frontend]
    end
+    subgraph local [Local services]
+    docling[Docling Serve]
+    end
+    subgraph ext [External connectors]
+    drive1[Google Drive]
+    drive2[OneDrive]
+    drive3[SharePoint]
+    drive4[Others]
+    end
+    local --> backend
+    ext --> backend
 ```

-The **OpenRAG Backend** is the central orchestration service that coordinates all other components.
+* The **OpenRAG Backend** is the central orchestration service that coordinates all other components.

-**Langflow** provides a visual workflow engine for building AI agents, and connects to **OpenSearch** for vector storage and retrieval.
+* **Langflow** provides a visual workflow engine for building AI agents, and connects to **OpenSearch** for vector storage and retrieval.

-**Docling Serve** is a local document processing service managed by the **OpenRAG Backend**.
+* **Docling Serve** is a local document processing service managed by the **OpenRAG Backend**.

-**Third Party Services** like **Google Drive** connect to the **OpenRAG Backend** through OAuth authentication, allowing synchronication of cloud storage with the OpenSearch knowledge base.
+* **External connectors** integrate third-party cloud storage services through OAuth authenticated connections to the **OpenRAG Backend**, allowing synchronization of external storage with your OpenSearch knowledge base.

-The **OpenRAG Frontend** provides the user interface for interacting with the system.
-
-## Performance expectations
-
-On a local VM with 7 vCPUs and 8 GiB RAM, OpenRAG ingested approximately 5.03 GB across 1,083 files in about 42 minutes.
-This equates to approximately 2.4 documents per second.
-
-You can generally expect equal or better performance on developer laptops and significantly faster on servers.
-Throughput scales with CPU cores, memory, storage speed, and configuration choices such as embedding model, chunk size and overlap, and concurrency.
-
-This test returned 12 errors (approximately 1.1%).
-All errors were file‑specific, and they didn't stop the pipeline.
-
-Ingestion dataset:
-
-* Total files: 1,083 items mounted
-* Total size on disk: 5,026,474,862 bytes (approximately 5.03 GB)
-
-Hardware specifications:
-
-* Machine: Apple M4 Pro
-* Podman VM:
-  * Name: `podman-machine-default`
-  * Type: `applehv`
-  * vCPUs: 7
-  * Memory: 8 GiB
-  * Disk size: 100 GiB
-
-Test results:
-
-```text
-2025-09-24T22:40:45.542190Z /app/src/main.py:231 Ingesting default documents when ready disable_langflow_ingest=False
-2025-09-24T22:40:45.546385Z /app/src/main.py:270 Using Langflow ingestion pipeline for default documents file_count=1082
-...
-2025-09-24T23:19:44.866365Z /app/src/main.py:351 Langflow ingestion completed success_count=1070 error_count=12 total_files=1082
-```
-
-Elapsed time: ~42 minutes 15 seconds (2,535 seconds)
-
-Throughput: ~2.4 documents/second
+* The **OpenRAG Frontend** provides the user interface for interacting with the platform.