# RAGFlow - Sequence Diagrams

Tài liệu này mô tả các luồng xử lý chính trong hệ thống RAGFlow thông qua sequence diagrams.

## 1. User Authentication Flow

### 1.1 User Registration

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant DB as MySQL
    participant R as Redis

    U->>W: Click Register
    W->>W: Show registration form
    U->>W: Enter email, password, nickname
    W->>A: POST /api/v1/user/register

    A->>A: Validate input data
    A->>DB: Check if email exists

    alt Email exists
        DB-->>A: User found
        A-->>W: 400 - Email already registered
        W-->>U: Show error message
    else Email not exists
        DB-->>A: No user found
        A->>A: Hash password (bcrypt)
        A->>A: Generate user ID
        A->>DB: INSERT User
        A->>DB: CREATE Tenant for user
        A->>DB: CREATE UserTenant association
        DB-->>A: Success
        A->>A: Generate JWT token
        A->>R: Store session
        A-->>W: 200 - Registration success + token
        W->>W: Store token in localStorage
        W-->>U: Redirect to dashboard
    end
```

### 1.2 User Login

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant DB as MySQL
    participant R as Redis

    U->>W: Enter email/password
    W->>A: POST /api/v1/user/login

    A->>DB: SELECT User WHERE email

    alt User not found
        DB-->>A: No user
        A-->>W: 401 - Invalid credentials
        W-->>U: Show error
    else User found
        DB-->>A: User record
        A->>A: Verify password (bcrypt)

        alt Password invalid
            A-->>W: 401 - Invalid credentials
            W-->>U: Show error
        else Password valid
            A->>A: Generate JWT (access_token)
            A->>A: Generate refresh_token
            A->>R: Store session data
            A->>DB: Update last_login_time
            A-->>W: 200 - Login success
            Note over A,W: Response: {access_token, refresh_token, user_info}
            W->>W: Store tokens
            W-->>U: Redirect to dashboard
        end
    end
```

## 2. Knowledge Base Management

### 2.1 Create Knowledge Base

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant DB as MySQL
    participant ES as Elasticsearch

    U->>W: Click "Create Knowledge Base"
    W->>W: Show KB creation modal
    U->>W: Enter name, description, settings
    W->>A: POST /api/v1/kb/create
    Note over W,A: Headers: Authorization: Bearer {token}

    A->>A: Validate JWT token
    A->>A: Extract tenant_id from token
    A->>DB: Check KB name uniqueness in tenant

    alt Name exists
        A-->>W: 400 - Name already exists
        W-->>U: Show error
    else Name unique
        A->>A: Generate KB ID
        A->>DB: INSERT Knowledgebase
        Note over A,DB: {id, name, tenant_id, embd_id, parser_id, ...}

        A->>ES: CREATE Index for KB
        Note over A,ES: Index: ragflow_{kb_id}
        ES-->>A: Index created

        DB-->>A: KB record saved
        A-->>W: 200 - KB created
        Note over A,W: {kb_id, name, created_at}
        W-->>U: Show success, refresh KB list
    end
```

### 2.2 List Knowledge Bases

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant DB as MySQL

    U->>W: Open Knowledge Base page
    W->>A: GET /api/v1/kb/list?page=1&size=10

    A->>A: Validate JWT, extract tenant_id
    A->>DB: SELECT * FROM knowledgebase WHERE tenant_id
    A->>DB: COUNT total KBs

    DB-->>A: KB list + count

    loop For each KB
        A->>DB: COUNT documents in KB
        A->>DB: SUM chunk_num for KB
    end

    A->>A: Build response with stats
    A-->>W: 200 - KB list with pagination
    Note over A,W: {data: [...], total, page, size}

    W->>W: Render KB cards
    W-->>U: Display knowledge bases
```

## 3. Document Upload & Processing

### 3.1 Document Upload Flow

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant M as MinIO
    participant DB as MySQL
    participant Q as Task Queue (Redis)

    U->>W: Select files to upload
    W->>W: Validate file types/sizes

    loop For each file
        W->>A: POST /api/v1/document/upload
        Note over W,A: multipart/form-data: file, kb_id

        A->>A: Validate file type
        A->>A: Generate file_id, doc_id

        A->>M: Upload file to bucket
        Note over A,M: Bucket: ragflow, Key: {tenant_id}/{kb_id}/{file_id}
        M-->>A: Upload success, file_key

        A->>DB: INSERT File record
        Note over A,DB: {id, name, size, location, tenant_id}

        A->>DB: INSERT Document record
        Note over A,DB: {id, kb_id, name, status: 'UNSTART'}

        A->>Q: PUSH parsing task
        Note over A,Q: {doc_id, file_location, parser_config}

        A-->>W: 200 - Upload success
        Note over A,W: {doc_id, file_id, status}
    end

    W-->>U: Show upload progress/success
```

### 3.2 Document Parsing Flow (Background Task)

```mermaid
sequenceDiagram
    participant Q as Task Queue
    participant W as Worker
    participant M as MinIO
    participant P as Parser (DeepDoc)
    participant E as Embedding Model
    participant ES as Elasticsearch
    participant DB as MySQL

    Q->>W: POP task from queue
    W->>DB: UPDATE doc status = 'RUNNING'

    W->>M: Download file
    M-->>W: File content

    W->>P: Parse document
    Note over W,P: Based on file type (PDF, DOCX, etc.)

    P->>P: Extract text content
    P->>P: Extract tables
    P->>P: Extract images (if any)
    P->>P: Layout analysis (for PDF)
    P-->>W: Parsed content

    W->>W: Apply chunking strategy
    Note over W: Token-based, sentence-based, or page-based

    W->>W: Generate chunks

    loop For each chunk batch
        W->>E: Generate embeddings
        Note over W,E: batch_size typically 32
        E-->>W: Vector embeddings [1536 dim]

        W->>ES: Bulk index chunks
        Note over W,ES: {chunk_id, content, embedding, doc_id, kb_id}
        ES-->>W: Index success

        W->>DB: INSERT Chunk records
    end

    W->>DB: UPDATE Document
    Note over W,DB: status='FINISHED', chunk_num, token_num

    W->>DB: UPDATE Task status = 'SUCCESS'
```

## 4. Chat/Dialog Flow

### 4.1 Create Chat Session

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant DB as MySQL

    U->>W: Click "New Chat"
    W->>A: POST /api/v1/dialog/create
    Note over W,A: {name, kb_ids[], llm_id, prompt_config}

    A->>A: Validate KB access
    A->>DB: INSERT Dialog record
    Note over A,DB: {id, name, tenant_id, kb_ids, llm_id, ...}

    DB-->>A: Dialog created
    A-->>W: 200 - Dialog created
    Note over A,W: {dialog_id, name, created_at}

    W-->>U: Open chat interface
```

### 4.2 Chat Message Flow (RAG)

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant ES as Elasticsearch
    participant RR as Reranker
    participant LLM as LLM Provider
    participant DB as MySQL

    U->>W: Type question
    W->>A: POST /api/v1/dialog/chat (SSE)
    Note over W,A: {dialog_id, conversation_id, question}

    A->>DB: Load dialog config
    Note over A,DB: Get kb_ids, llm_config, prompt

    A->>DB: Load conversation history

    rect rgb(200, 220, 240)
        Note over A,ES: RETRIEVAL PHASE
        A->>A: Query understanding
        A->>A: Generate query embedding

        A->>ES: Hybrid search
        Note over A,ES: Vector similarity + BM25 full-text
        ES-->>A: Top 100 candidates

        A->>RR: Rerank candidates
        Note over A,RR: Cross-encoder scoring
        RR-->>A: Top K chunks (typically 5-10)
    end

    rect rgb(220, 240, 200)
        Note over A,LLM: GENERATION PHASE
        A->>A: Build prompt with context
        Note over A: System prompt + Retrieved chunks + Question

        A->>LLM: Stream completion request

        loop Streaming response
            LLM-->>A: Token chunk
            A-->>W: SSE: data chunk
            W-->>U: Display token
        end

        LLM-->>A: [DONE]
    end

    A->>DB: Save conversation message
    Note over A,DB: {role, content, doc_ids[], conversation_id}

    A-->>W: SSE: [DONE] + sources
    W-->>U: Show sources/citations
```

### 4.3 Streaming Response Detail

```mermaid
sequenceDiagram
    participant W as Web Frontend
    participant A as API Server
    participant LLM as LLM Provider

    W->>A: POST /api/v1/dialog/chat
    Note over W,A: Accept: text/event-stream

    A->>A: Process retrieval...

    A->>LLM: POST /v1/chat/completions
    Note over A,LLM: stream: true

    loop Until complete
        LLM-->>A: data: {"choices":[{"delta":{"content":"..."}}]}
        A->>A: Extract content
        A-->>W: data: {"answer": "...", "reference": {...}}
        W->>W: Append to display
    end

    LLM-->>A: data: [DONE]
    A-->>W: data: {"answer": "", "reference": {...}, "done": true}
    W->>W: Show final state
```

## 5. Agent Workflow Execution

### 5.1 Canvas Workflow Execution

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant C as Canvas Engine
    participant Comp as Components
    participant LLM as LLM Provider
    participant Tools as External Tools

    U->>W: Run workflow
    W->>A: POST /api/v1/canvas/run
    Note over W,A: {canvas_id, input_data}

    A->>C: Initialize canvas execution
    C->>C: Parse workflow DSL
    C->>C: Build execution graph

    rect rgb(240, 220, 200)
        Note over C,Comp: BEGIN Component
        C->>Comp: Execute BEGIN
        Comp->>Comp: Initialize variables
        Comp-->>C: {user_input: "..."}
    end

    rect rgb(200, 220, 240)
        Note over C,Comp: RETRIEVAL Component
        C->>Comp: Execute RETRIEVAL
        Comp->>A: Search knowledge bases
        A-->>Comp: Retrieved chunks
        Comp-->>C: {context: [...]}
    end

    rect rgb(220, 240, 200)
        Note over C,LLM: LLM Component
        C->>Comp: Execute LLM
        Comp->>Comp: Build prompt with variables
        Comp->>LLM: Chat completion
        LLM-->>Comp: Response
        Comp-->>C: {llm_output: "..."}
    end

    rect rgb(240, 240, 200)
        Note over C,Tools: TOOL Component (optional)
        C->>Comp: Execute TOOL (e.g., Tavily)
        Comp->>Tools: API call
        Tools-->>Comp: Tool result
        Comp-->>C: {tool_output: {...}}
    end

    rect rgb(220, 220, 240)
        Note over C,Comp: CATEGORIZE Component
        C->>Comp: Execute CATEGORIZE
        Comp->>Comp: Evaluate conditions
        Comp-->>C: {next_node: "node_id"}
    end

    C->>C: Continue to next component...

    C-->>A: Workflow complete
    A-->>W: SSE: Final output
    W-->>U: Display result
```

### 5.2 Agent with Tools Flow

```mermaid
sequenceDiagram
    participant U as User
    participant A as Agent Engine
    participant LLM as LLM Provider
    participant T1 as Tavily Search
    participant T2 as Wikipedia
    participant T3 as Code Executor

    U->>A: Question requiring tools

    A->>LLM: Initial prompt + available tools
    Note over A,LLM: Tools: [tavily_search, wikipedia, code_exec]

    loop ReAct Loop
        LLM-->>A: Thought + Action
        Note over LLM,A: Action: {"tool": "tavily_search", "input": "..."}

        alt Tool: tavily_search
            A->>T1: Search query
            T1-->>A: Search results
        else Tool: wikipedia
            A->>T2: Page lookup
            T2-->>A: Wikipedia content
        else Tool: code_exec
            A->>T3: Execute code
            T3-->>A: Execution result
        end

        A->>LLM: Observation from tool

        alt LLM decides more tools needed
            LLM-->>A: Another Action
        else LLM ready to answer
            LLM-->>A: Final Answer
        end
    end

    A-->>U: Final response with sources
```

## 6. GraphRAG Flow

### 6.1 Knowledge Graph Construction

```mermaid
sequenceDiagram
    participant D as Document
    participant E as Entity Extractor
    participant LLM as LLM Provider
    participant ER as Entity Resolution
    participant G as Graph Store

    D->>E: Document chunks

    loop For each chunk
        E->>LLM: Extract entities prompt
        Note over E,LLM: "Extract entities and relationships..."
        LLM-->>E: Entities + Relations
        Note over LLM,E: [{entity, type, properties}, {src, rel, dst}]
    end

    E->>ER: All extracted entities

    ER->>ER: Cluster similar entities
    ER->>LLM: Entity resolution prompt
    Note over ER,LLM: "Are these the same entity?"
    LLM-->>ER: Resolution decisions

    ER->>ER: Merge duplicate entities
    ER-->>G: Resolved entities + relations

    G->>G: Build graph structure
    G->>G: Create entity embeddings
    G->>G: Index for search
```

### 6.2 GraphRAG Query Flow

```mermaid
sequenceDiagram
    participant U as User
    participant Q as Query Analyzer
    participant G as Graph Store
    participant V as Vector Search
    participant LLM as LLM Provider

    U->>Q: Natural language query

    Q->>LLM: Analyze query
    Note over Q,LLM: Extract entities, intent, constraints
    LLM-->>Q: Query analysis

    par Graph Search
        Q->>G: Find related entities
        G->>G: Traverse relationships
        G-->>Q: Subgraph context
    and Vector Search
        Q->>V: Semantic search
        V-->>Q: Relevant chunks
    end

    Q->>Q: Merge graph + vector results
    Q->>Q: Build unified context

    Q->>LLM: Generate with context
    Note over Q,LLM: Context includes entity relations
    LLM-->>Q: Response with graph insights

    Q-->>U: Answer + entity graph visualization
```

## 7. File Operations

### 7.1 File Download Flow

```mermaid
sequenceDiagram
    participant U as User
    participant W as Web Frontend
    participant A as API Server
    participant M as MinIO
    participant DB as MySQL

    U->>W: Click download
    W->>A: GET /api/v1/file/download/{file_id}

    A->>A: Validate JWT
    A->>DB: Get file record
    A->>A: Check user permission

    alt No permission
        A-->>W: 403 Forbidden
    else Has permission
        A->>M: Get file from storage
        M-->>A: File stream
        A-->>W: File stream with headers
        Note over A,W: Content-Disposition: attachment
        W-->>U: Download starts
    end
```

## 8. Search Operations

### 8.1 Hybrid Search Flow

```mermaid
sequenceDiagram
    participant U as User
    participant A as API Server
    participant E as Embedding Model
    participant ES as Elasticsearch

    U->>A: Search query

    A->>E: Embed query text
    E-->>A: Query vector [1536]

    A->>ES: Hybrid query
    Note over A,ES: script_score (vector) + bool (BM25)

    ES->>ES: Vector similarity search
    Note over ES: cosine_similarity on dense_vector

    ES->>ES: BM25 full-text search
    Note over ES: match on content field

    ES->>ES: Combine scores
    Note over ES: final = vector_score * weight + bm25_score * weight

    ES-->>A: Ranked results

    A->>A: Post-process results
    A->>A: Add highlights
    A->>A: Group by document

    A-->>U: Search results with snippets
```

## 9. Multi-Tenancy Flow

### 9.1 Tenant Data Isolation

```mermaid
sequenceDiagram
    participant U1 as User (Tenant A)
    participant U2 as User (Tenant B)
    participant A as API Server
    participant DB as MySQL

    U1->>A: GET /api/v1/kb/list
    A->>A: Extract tenant_id from JWT
    Note over A: tenant_id = "tenant_a"
    A->>DB: SELECT * FROM kb WHERE tenant_id = 'tenant_a'
    DB-->>A: Tenant A's KBs only
    A-->>U1: KBs for Tenant A

    U2->>A: GET /api/v1/kb/list
    A->>A: Extract tenant_id from JWT
    Note over A: tenant_id = "tenant_b"
    A->>DB: SELECT * FROM kb WHERE tenant_id = 'tenant_b'
    DB-->>A: Tenant B's KBs only
    A-->>U2: KBs for Tenant B

    Note over U1,U2: Data is completely isolated
```

## 10. Connector Integration Flow

### 10.1 Confluence Connector Sync

```mermaid
sequenceDiagram
    participant U as User
    participant A as API Server
    participant C as Confluence Connector
    participant CF as Confluence API
    participant DB as MySQL
    participant Q as Task Queue

    U->>A: Setup Confluence connector
    Note over U,A: {url, username, api_token, space_key}

    A->>C: Initialize connector
    C->>CF: Authenticate
    CF-->>C: Auth success

    A->>DB: Save connector config
    A-->>U: Connector created

    U->>A: Start sync
    A->>Q: Queue sync task

    Q->>C: Execute sync
    C->>CF: GET /wiki/rest/api/content
    CF-->>C: Content list

    loop For each page
        C->>CF: GET page content
        CF-->>C: Page HTML
        C->>C: Convert to markdown
        C->>A: Create document
        A->>Q: Queue parsing task
    end

    C->>DB: Update sync status
    C-->>A: Sync complete
    A-->>U: Show sync results
```

## Tóm Tắt

| Flow | Thành phần chính | Mô tả |
|------|-----------------|-------|
| Authentication | User, API, DB, Redis | Đăng ký, đăng nhập với JWT |
| Knowledge Base | API, MySQL, ES | CRUD knowledge bases |
| Document Upload | API, MinIO, Queue, ES | Upload và index documents |
| Chat/Dialog | API, ES, Reranker, LLM | RAG-based chat với streaming |
| Agent Workflow | Canvas Engine, Components, LLM, Tools | Visual workflow execution |
| GraphRAG | Entity Extractor, Graph Store, LLM | Knowledge graph queries |
| Search | Embedding, ES | Hybrid vector + BM25 search |
| Connectors | Connector, External API | Sync external data sources |

### Các Pattern Thiết Kế Sử Dụng

1. **Event-Driven**: Task queue cho background processing
2. **Streaming**: SSE cho real-time chat responses
3. **Hybrid Search**: Kết hợp vector và text search
4. **ReAct Pattern**: Agent reasoning với tool use
5. **Multi-Tenancy**: Data isolation per tenant