Merge branch 'main' into feat-add-tui-llm-providers

2025-11-21 16:02:53 -05:00 · 2025-11-21 16:02:53 -05:00 · 40e076f742
commit 40e076f742
parent 4bf4e42840 512660cea0
2 changed files with 75 additions and 98 deletions
--- a/docs/docs/core-components/ingestion.mdx
+++ b/docs/docs/core-components/ingestion.mdx
@ -79,3 +79,42 @@ If you want to use OpenRAG's built-in pipeline instead of Docling serve, set `DI
 The built-in pipeline still uses the Docling processor, but uses it directly without the Docling Serve API.
 For more information, see [`processors.py` in the OpenRAG repository](https://github.com/langflow-ai/openrag/blob/main/src/models/processors.py#L58).
 ## Performance expectations
 On a local VM with 7 vCPUs and 8 GiB RAM, OpenRAG ingested approximately 5.03 GB across 1,083 files in about 42 minutes.
 This equates to approximately 2.4 documents per second.
 You can generally expect equal or better performance on developer laptops and significantly faster on servers.
 Throughput scales with CPU cores, memory, storage speed, and configuration choices such as embedding model, chunk size and overlap, and concurrency.
 This test returned 12 errors (approximately 1.1%).
 All errors were file-specific, and they didn't stop the pipeline.
 Ingestion dataset:
 * Total files: 1,083 items mounted
 * Total size on disk: 5,026,474,862 bytes (approximately 5.03 GB)
 Hardware specifications:
 * Machine: Apple M4 Pro
 * Podman VM:
  * Name: `podman-machine-default`
  * Type: `applehv`
  * vCPUs: 7
  * Memory: 8 GiB
  * Disk size: 100 GiB
 Test results:
 ```text
 2025-09-24T22:40:45.542190Z /app/src/main.py:231 Ingesting default documents when ready disable_langflow_ingest=False
 2025-09-24T22:40:45.546385Z /app/src/main.py:270 Using Langflow ingestion pipeline for default documents file_count=1082
 ...
 2025-09-24T23:19:44.866365Z /app/src/main.py:351 Langflow ingestion completed success_count=1070 error_count=12 total_files=1082
 ```
 Elapsed time: ~42 minutes 15 seconds (2,535 seconds)
 Throughput: ~2.4 documents/second
--- a/docs/docs/get-started/what-is-openrag.mdx
+++ b/docs/docs/get-started/what-is-openrag.mdx
@ -1,6 +1,7 @@
 ---
 title: What is OpenRAG?
 slug: /
 hide_table_of_contents: true
 ---
 OpenRAG is an open-source package for building agentic RAG systems that integrates with a wide range of orchestration tools, vector databases, and LLM providers.
@ -13,113 +14,50 @@ OpenRAG connects and amplifies three popular, proven open-source projects into o
 * [Docling](https://docling-project.github.io/docling/): Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
-OpenRAG builds on Langflow's familiar interface while adding OpenSearch for vector storage and Docling for simplified document parsing, with opinionated flows that serve as ready-to-use recipes for ingestion, retrieval, and generation from popular sources like Google Drive, OneDrive, and Sharepoint.
+OpenRAG builds on Langflow's familiar interface while adding OpenSearch for vector storage and Docling for simplified document parsing. It uses opinionated flows that serve as ready-to-use recipes for ingestion, retrieval, and generation from familiar sources like Google Drive, OneDrive, and SharePoint.
-What's more, every part of the stack is swappable. Write your own custom components in Python, try different language models, and customize your flows to build an agentic RAG system.
+What's more, every part of the stack is interchangeable: You can write your own custom components in Python, try different language models, and customize your flows to build a personalized agentic RAG system.
-Ready to get started? [Install OpenRAG](/install) and then run the [Quickstart](/quickstart) to create a powerful RAG pipeline.
+:::tip
 Ready to get started? Try the [quickstart](/quickstart) to install OpenRAG and start exploring in minutes.
 :::
 ## OpenRAG architecture
 OpenRAG deploys and orchestrates a lightweight, container-based architecture that combines **Langflow**, **OpenSearch**, and **Docling** into a cohesive RAG platform.
 ```mermaid
-%%{init: {'theme': 'dark', 'flowchart': {'useMaxWidth': false, 'width': '100%'}}}%%
+---
-flowchart LR
+config:
-    %% Encapsulate the entire diagram in a rectangle with black background
+  theme: 'base'
-    subgraph DiagramContainer["OpenRAG Architecture"]
+  themeVariables:
-        style DiagramContainer fill:#000000,stroke:#ffffff,color:white,stroke-width:2px
+    lineColor: '#2e8555'
-
+---
-        %% Define subgraphs for the different sections
+flowchart TD
-        subgraph LocalService["Local Service"]
+    subgraph Containers
-            DoclingSrv[Docling Serve]
+    backend[OpenRAG Backend] --> langflow[Langflow]
-            style DoclingSrv fill:#a8d1ff,stroke:#0066cc,color:black,stroke-width:2px
+    langflow <--> opensearch[OpenSearch]
-        end
+    backend <--> frontend[OpenRAG frontend]
        subgraph Containers
            Backend["OpenRAG Backend"]
            style Backend fill:#e6ffe6,stroke:#006600,color:black,stroke-width:2px
            Langflow
            style Langflow fill:#e6ffe6,stroke:#006600,color:black,stroke-width:2px
            OpenSearch
            style OpenSearch fill:#e6ffe6,stroke:#006600,color:black,stroke-width:2px
            Frontend["OpenRAG Frontend"]
            style Frontend fill:#ffcc99,stroke:#ff6600,color:black,stroke-width:2px
        end
        subgraph ThirdParty["Third Party Services"]
            GoogleDrive["Google Drive"]
            style GoogleDrive fill:#f2e6ff,stroke:#6600cc,color:black,stroke-width:2px
            OneDrive
            style OneDrive fill:#f2e6ff,stroke:#6600cc,color:black,stroke-width:2px
            SharePoint["SharePoint"]
            style SharePoint fill:#f2e6ff,stroke:#6600cc,color:black,stroke-width:2px
            More[...]
            style More fill:#f2e6ff,stroke:#6600cc,color:black,stroke-width:2px
        end
        %% Define connections
        DoclingSrv --> Backend
        GoogleDrive --> Backend
        OneDrive --> Backend
        SharePoint --> Backend
        More --> Backend
        Backend --> Langflow
        Langflow <--> OpenSearch
        Backend <--> Frontend
        %% Style subgraphs
        style LocalService fill:#333333,stroke:#666666,color:white,stroke-width:2px
        style Containers fill:#444444,stroke:#666666,color:white,stroke-width:2px
        style ThirdParty fill:#333333,stroke:#666666,color:white,stroke-width:2px
    end
    subgraph local [Local services]
    docling[Docling Serve]
    end
    subgraph ext [External connectors]
    drive1[Google Drive]
    drive2[OneDrive]
    drive3[SharePoint]
    drive4[Others]
    end
    local --> backend
    ext --> backend
 ```
-The **OpenRAG Backend** is the central orchestration service that coordinates all other components.
+* The **OpenRAG Backend** is the central orchestration service that coordinates all other components.
-**Langflow** provides a visual workflow engine for building AI agents, and connects to **OpenSearch** for vector storage and retrieval.
+* **Langflow** provides a visual workflow engine for building AI agents, and connects to **OpenSearch** for vector storage and retrieval.
-**Docling Serve** is a local document processing service managed by the **OpenRAG Backend**.
+* **Docling Serve** is a local document processing service managed by the **OpenRAG Backend**.
-**Third Party Services** like **Google Drive** connect to the **OpenRAG Backend** through OAuth authentication, allowing synchronication of cloud storage with the OpenSearch knowledge base.
+* **External connectors** integrate third-party cloud storage services through OAuth authenticated connections to the **OpenRAG Backend**, allowing synchronization of external storage with your OpenSearch knowledge base.
-The **OpenRAG Frontend** provides the user interface for interacting with the system.
+* The **OpenRAG Frontend** provides the user interface for interacting with the platform.
 ## Performance expectations
 On a local VM with 7 vCPUs and 8 GiB RAM, OpenRAG ingested approximately 5.03 GB across 1,083 files in about 42 minutes.
 This equates to approximately 2.4 documents per second.
 You can generally expect equal or better performance on developer laptops and significantly faster on servers.
 Throughput scales with CPU cores, memory, storage speed, and configuration choices such as embedding model, chunk size and overlap, and concurrency.
 This test returned 12 errors (approximately 1.1%).
 All errors were file‑specific, and they didn't stop the pipeline.
 Ingestion dataset:
 * Total files: 1,083 items mounted
 * Total size on disk: 5,026,474,862 bytes (approximately 5.03 GB)
 Hardware specifications:
 * Machine: Apple M4 Pro
 * Podman VM:
  * Name: `podman-machine-default`
  * Type: `applehv`
  * vCPUs: 7
  * Memory: 8 GiB
  * Disk size: 100 GiB
 Test results:
 ```text
 2025-09-24T22:40:45.542190Z /app/src/main.py:231 Ingesting default documents when ready disable_langflow_ingest=False
 2025-09-24T22:40:45.546385Z /app/src/main.py:270 Using Langflow ingestion pipeline for default documents file_count=1082
 ...
 2025-09-24T23:19:44.866365Z /app/src/main.py:351 Langflow ingestion completed success_count=1070 error_count=12 total_files=1082
 ```
 Elapsed time: ~42 minutes 15 seconds (2,535 seconds)
 Throughput: ~2.4 documents/second