- Add directory structure analysis (01_directory_structure.md) - Add system architecture with diagrams (02_system_architecture.md) - Add sequence diagrams for main flows (03_sequence_diagrams.md) - Add detailed modules analysis (04_modules_analysis.md) - Add tech stack documentation (05_tech_stack.md) - Add source code analysis (06_source_code_analysis.md) - Add README summary for personal_analyze folder This documentation provides: - Complete codebase structure overview - System architecture diagrams (ASCII art) - Sequence diagrams for authentication, RAG, chat, agent flows - Detailed analysis of API, RAG, DeepDoc, Agent, GraphRAG modules - Full tech stack with 150+ dependencies analyzed - Source code patterns and best practices analysis
18 KiB
18 KiB
RAGFlow - Sequence Diagrams
Tài liệu này mô tả các luồng xử lý chính trong hệ thống RAGFlow thông qua sequence diagrams.
1. User Authentication Flow
1.1 User Registration
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant DB as MySQL
participant R as Redis
U->>W: Click Register
W->>W: Show registration form
U->>W: Enter email, password, nickname
W->>A: POST /api/v1/user/register
A->>A: Validate input data
A->>DB: Check if email exists
alt Email exists
DB-->>A: User found
A-->>W: 400 - Email already registered
W-->>U: Show error message
else Email not exists
DB-->>A: No user found
A->>A: Hash password (bcrypt)
A->>A: Generate user ID
A->>DB: INSERT User
A->>DB: CREATE Tenant for user
A->>DB: CREATE UserTenant association
DB-->>A: Success
A->>A: Generate JWT token
A->>R: Store session
A-->>W: 200 - Registration success + token
W->>W: Store token in localStorage
W-->>U: Redirect to dashboard
end
1.2 User Login
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant DB as MySQL
participant R as Redis
U->>W: Enter email/password
W->>A: POST /api/v1/user/login
A->>DB: SELECT User WHERE email
alt User not found
DB-->>A: No user
A-->>W: 401 - Invalid credentials
W-->>U: Show error
else User found
DB-->>A: User record
A->>A: Verify password (bcrypt)
alt Password invalid
A-->>W: 401 - Invalid credentials
W-->>U: Show error
else Password valid
A->>A: Generate JWT (access_token)
A->>A: Generate refresh_token
A->>R: Store session data
A->>DB: Update last_login_time
A-->>W: 200 - Login success
Note over A,W: Response: {access_token, refresh_token, user_info}
W->>W: Store tokens
W-->>U: Redirect to dashboard
end
end
2. Knowledge Base Management
2.1 Create Knowledge Base
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant DB as MySQL
participant ES as Elasticsearch
U->>W: Click "Create Knowledge Base"
W->>W: Show KB creation modal
U->>W: Enter name, description, settings
W->>A: POST /api/v1/kb/create
Note over W,A: Headers: Authorization: Bearer {token}
A->>A: Validate JWT token
A->>A: Extract tenant_id from token
A->>DB: Check KB name uniqueness in tenant
alt Name exists
A-->>W: 400 - Name already exists
W-->>U: Show error
else Name unique
A->>A: Generate KB ID
A->>DB: INSERT Knowledgebase
Note over A,DB: {id, name, tenant_id, embd_id, parser_id, ...}
A->>ES: CREATE Index for KB
Note over A,ES: Index: ragflow_{kb_id}
ES-->>A: Index created
DB-->>A: KB record saved
A-->>W: 200 - KB created
Note over A,W: {kb_id, name, created_at}
W-->>U: Show success, refresh KB list
end
2.2 List Knowledge Bases
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant DB as MySQL
U->>W: Open Knowledge Base page
W->>A: GET /api/v1/kb/list?page=1&size=10
A->>A: Validate JWT, extract tenant_id
A->>DB: SELECT * FROM knowledgebase WHERE tenant_id
A->>DB: COUNT total KBs
DB-->>A: KB list + count
loop For each KB
A->>DB: COUNT documents in KB
A->>DB: SUM chunk_num for KB
end
A->>A: Build response with stats
A-->>W: 200 - KB list with pagination
Note over A,W: {data: [...], total, page, size}
W->>W: Render KB cards
W-->>U: Display knowledge bases
3. Document Upload & Processing
3.1 Document Upload Flow
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant M as MinIO
participant DB as MySQL
participant Q as Task Queue (Redis)
U->>W: Select files to upload
W->>W: Validate file types/sizes
loop For each file
W->>A: POST /api/v1/document/upload
Note over W,A: multipart/form-data: file, kb_id
A->>A: Validate file type
A->>A: Generate file_id, doc_id
A->>M: Upload file to bucket
Note over A,M: Bucket: ragflow, Key: {tenant_id}/{kb_id}/{file_id}
M-->>A: Upload success, file_key
A->>DB: INSERT File record
Note over A,DB: {id, name, size, location, tenant_id}
A->>DB: INSERT Document record
Note over A,DB: {id, kb_id, name, status: 'UNSTART'}
A->>Q: PUSH parsing task
Note over A,Q: {doc_id, file_location, parser_config}
A-->>W: 200 - Upload success
Note over A,W: {doc_id, file_id, status}
end
W-->>U: Show upload progress/success
3.2 Document Parsing Flow (Background Task)
sequenceDiagram
participant Q as Task Queue
participant W as Worker
participant M as MinIO
participant P as Parser (DeepDoc)
participant E as Embedding Model
participant ES as Elasticsearch
participant DB as MySQL
Q->>W: POP task from queue
W->>DB: UPDATE doc status = 'RUNNING'
W->>M: Download file
M-->>W: File content
W->>P: Parse document
Note over W,P: Based on file type (PDF, DOCX, etc.)
P->>P: Extract text content
P->>P: Extract tables
P->>P: Extract images (if any)
P->>P: Layout analysis (for PDF)
P-->>W: Parsed content
W->>W: Apply chunking strategy
Note over W: Token-based, sentence-based, or page-based
W->>W: Generate chunks
loop For each chunk batch
W->>E: Generate embeddings
Note over W,E: batch_size typically 32
E-->>W: Vector embeddings [1536 dim]
W->>ES: Bulk index chunks
Note over W,ES: {chunk_id, content, embedding, doc_id, kb_id}
ES-->>W: Index success
W->>DB: INSERT Chunk records
end
W->>DB: UPDATE Document
Note over W,DB: status='FINISHED', chunk_num, token_num
W->>DB: UPDATE Task status = 'SUCCESS'
4. Chat/Dialog Flow
4.1 Create Chat Session
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant DB as MySQL
U->>W: Click "New Chat"
W->>A: POST /api/v1/dialog/create
Note over W,A: {name, kb_ids[], llm_id, prompt_config}
A->>A: Validate KB access
A->>DB: INSERT Dialog record
Note over A,DB: {id, name, tenant_id, kb_ids, llm_id, ...}
DB-->>A: Dialog created
A-->>W: 200 - Dialog created
Note over A,W: {dialog_id, name, created_at}
W-->>U: Open chat interface
4.2 Chat Message Flow (RAG)
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant ES as Elasticsearch
participant RR as Reranker
participant LLM as LLM Provider
participant DB as MySQL
U->>W: Type question
W->>A: POST /api/v1/dialog/chat (SSE)
Note over W,A: {dialog_id, conversation_id, question}
A->>DB: Load dialog config
Note over A,DB: Get kb_ids, llm_config, prompt
A->>DB: Load conversation history
rect rgb(200, 220, 240)
Note over A,ES: RETRIEVAL PHASE
A->>A: Query understanding
A->>A: Generate query embedding
A->>ES: Hybrid search
Note over A,ES: Vector similarity + BM25 full-text
ES-->>A: Top 100 candidates
A->>RR: Rerank candidates
Note over A,RR: Cross-encoder scoring
RR-->>A: Top K chunks (typically 5-10)
end
rect rgb(220, 240, 200)
Note over A,LLM: GENERATION PHASE
A->>A: Build prompt with context
Note over A: System prompt + Retrieved chunks + Question
A->>LLM: Stream completion request
loop Streaming response
LLM-->>A: Token chunk
A-->>W: SSE: data chunk
W-->>U: Display token
end
LLM-->>A: [DONE]
end
A->>DB: Save conversation message
Note over A,DB: {role, content, doc_ids[], conversation_id}
A-->>W: SSE: [DONE] + sources
W-->>U: Show sources/citations
4.3 Streaming Response Detail
sequenceDiagram
participant W as Web Frontend
participant A as API Server
participant LLM as LLM Provider
W->>A: POST /api/v1/dialog/chat
Note over W,A: Accept: text/event-stream
A->>A: Process retrieval...
A->>LLM: POST /v1/chat/completions
Note over A,LLM: stream: true
loop Until complete
LLM-->>A: data: {"choices":[{"delta":{"content":"..."}}]}
A->>A: Extract content
A-->>W: data: {"answer": "...", "reference": {...}}
W->>W: Append to display
end
LLM-->>A: data: [DONE]
A-->>W: data: {"answer": "", "reference": {...}, "done": true}
W->>W: Show final state
5. Agent Workflow Execution
5.1 Canvas Workflow Execution
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant C as Canvas Engine
participant Comp as Components
participant LLM as LLM Provider
participant Tools as External Tools
U->>W: Run workflow
W->>A: POST /api/v1/canvas/run
Note over W,A: {canvas_id, input_data}
A->>C: Initialize canvas execution
C->>C: Parse workflow DSL
C->>C: Build execution graph
rect rgb(240, 220, 200)
Note over C,Comp: BEGIN Component
C->>Comp: Execute BEGIN
Comp->>Comp: Initialize variables
Comp-->>C: {user_input: "..."}
end
rect rgb(200, 220, 240)
Note over C,Comp: RETRIEVAL Component
C->>Comp: Execute RETRIEVAL
Comp->>A: Search knowledge bases
A-->>Comp: Retrieved chunks
Comp-->>C: {context: [...]}
end
rect rgb(220, 240, 200)
Note over C,LLM: LLM Component
C->>Comp: Execute LLM
Comp->>Comp: Build prompt with variables
Comp->>LLM: Chat completion
LLM-->>Comp: Response
Comp-->>C: {llm_output: "..."}
end
rect rgb(240, 240, 200)
Note over C,Tools: TOOL Component (optional)
C->>Comp: Execute TOOL (e.g., Tavily)
Comp->>Tools: API call
Tools-->>Comp: Tool result
Comp-->>C: {tool_output: {...}}
end
rect rgb(220, 220, 240)
Note over C,Comp: CATEGORIZE Component
C->>Comp: Execute CATEGORIZE
Comp->>Comp: Evaluate conditions
Comp-->>C: {next_node: "node_id"}
end
C->>C: Continue to next component...
C-->>A: Workflow complete
A-->>W: SSE: Final output
W-->>U: Display result
5.2 Agent with Tools Flow
sequenceDiagram
participant U as User
participant A as Agent Engine
participant LLM as LLM Provider
participant T1 as Tavily Search
participant T2 as Wikipedia
participant T3 as Code Executor
U->>A: Question requiring tools
A->>LLM: Initial prompt + available tools
Note over A,LLM: Tools: [tavily_search, wikipedia, code_exec]
loop ReAct Loop
LLM-->>A: Thought + Action
Note over LLM,A: Action: {"tool": "tavily_search", "input": "..."}
alt Tool: tavily_search
A->>T1: Search query
T1-->>A: Search results
else Tool: wikipedia
A->>T2: Page lookup
T2-->>A: Wikipedia content
else Tool: code_exec
A->>T3: Execute code
T3-->>A: Execution result
end
A->>LLM: Observation from tool
alt LLM decides more tools needed
LLM-->>A: Another Action
else LLM ready to answer
LLM-->>A: Final Answer
end
end
A-->>U: Final response with sources
6. GraphRAG Flow
6.1 Knowledge Graph Construction
sequenceDiagram
participant D as Document
participant E as Entity Extractor
participant LLM as LLM Provider
participant ER as Entity Resolution
participant G as Graph Store
D->>E: Document chunks
loop For each chunk
E->>LLM: Extract entities prompt
Note over E,LLM: "Extract entities and relationships..."
LLM-->>E: Entities + Relations
Note over LLM,E: [{entity, type, properties}, {src, rel, dst}]
end
E->>ER: All extracted entities
ER->>ER: Cluster similar entities
ER->>LLM: Entity resolution prompt
Note over ER,LLM: "Are these the same entity?"
LLM-->>ER: Resolution decisions
ER->>ER: Merge duplicate entities
ER-->>G: Resolved entities + relations
G->>G: Build graph structure
G->>G: Create entity embeddings
G->>G: Index for search
6.2 GraphRAG Query Flow
sequenceDiagram
participant U as User
participant Q as Query Analyzer
participant G as Graph Store
participant V as Vector Search
participant LLM as LLM Provider
U->>Q: Natural language query
Q->>LLM: Analyze query
Note over Q,LLM: Extract entities, intent, constraints
LLM-->>Q: Query analysis
par Graph Search
Q->>G: Find related entities
G->>G: Traverse relationships
G-->>Q: Subgraph context
and Vector Search
Q->>V: Semantic search
V-->>Q: Relevant chunks
end
Q->>Q: Merge graph + vector results
Q->>Q: Build unified context
Q->>LLM: Generate with context
Note over Q,LLM: Context includes entity relations
LLM-->>Q: Response with graph insights
Q-->>U: Answer + entity graph visualization
7. File Operations
7.1 File Download Flow
sequenceDiagram
participant U as User
participant W as Web Frontend
participant A as API Server
participant M as MinIO
participant DB as MySQL
U->>W: Click download
W->>A: GET /api/v1/file/download/{file_id}
A->>A: Validate JWT
A->>DB: Get file record
A->>A: Check user permission
alt No permission
A-->>W: 403 Forbidden
else Has permission
A->>M: Get file from storage
M-->>A: File stream
A-->>W: File stream with headers
Note over A,W: Content-Disposition: attachment
W-->>U: Download starts
end
8. Search Operations
8.1 Hybrid Search Flow
sequenceDiagram
participant U as User
participant A as API Server
participant E as Embedding Model
participant ES as Elasticsearch
U->>A: Search query
A->>E: Embed query text
E-->>A: Query vector [1536]
A->>ES: Hybrid query
Note over A,ES: script_score (vector) + bool (BM25)
ES->>ES: Vector similarity search
Note over ES: cosine_similarity on dense_vector
ES->>ES: BM25 full-text search
Note over ES: match on content field
ES->>ES: Combine scores
Note over ES: final = vector_score * weight + bm25_score * weight
ES-->>A: Ranked results
A->>A: Post-process results
A->>A: Add highlights
A->>A: Group by document
A-->>U: Search results with snippets
9. Multi-Tenancy Flow
9.1 Tenant Data Isolation
sequenceDiagram
participant U1 as User (Tenant A)
participant U2 as User (Tenant B)
participant A as API Server
participant DB as MySQL
U1->>A: GET /api/v1/kb/list
A->>A: Extract tenant_id from JWT
Note over A: tenant_id = "tenant_a"
A->>DB: SELECT * FROM kb WHERE tenant_id = 'tenant_a'
DB-->>A: Tenant A's KBs only
A-->>U1: KBs for Tenant A
U2->>A: GET /api/v1/kb/list
A->>A: Extract tenant_id from JWT
Note over A: tenant_id = "tenant_b"
A->>DB: SELECT * FROM kb WHERE tenant_id = 'tenant_b'
DB-->>A: Tenant B's KBs only
A-->>U2: KBs for Tenant B
Note over U1,U2: Data is completely isolated
10. Connector Integration Flow
10.1 Confluence Connector Sync
sequenceDiagram
participant U as User
participant A as API Server
participant C as Confluence Connector
participant CF as Confluence API
participant DB as MySQL
participant Q as Task Queue
U->>A: Setup Confluence connector
Note over U,A: {url, username, api_token, space_key}
A->>C: Initialize connector
C->>CF: Authenticate
CF-->>C: Auth success
A->>DB: Save connector config
A-->>U: Connector created
U->>A: Start sync
A->>Q: Queue sync task
Q->>C: Execute sync
C->>CF: GET /wiki/rest/api/content
CF-->>C: Content list
loop For each page
C->>CF: GET page content
CF-->>C: Page HTML
C->>C: Convert to markdown
C->>A: Create document
A->>Q: Queue parsing task
end
C->>DB: Update sync status
C-->>A: Sync complete
A-->>U: Show sync results
Tóm Tắt
| Flow | Thành phần chính | Mô tả |
|---|---|---|
| Authentication | User, API, DB, Redis | Đăng ký, đăng nhập với JWT |
| Knowledge Base | API, MySQL, ES | CRUD knowledge bases |
| Document Upload | API, MinIO, Queue, ES | Upload và index documents |
| Chat/Dialog | API, ES, Reranker, LLM | RAG-based chat với streaming |
| Agent Workflow | Canvas Engine, Components, LLM, Tools | Visual workflow execution |
| GraphRAG | Entity Extractor, Graph Store, LLM | Knowledge graph queries |
| Search | Embedding, ES | Hybrid vector + BM25 search |
| Connectors | Connector, External API | Sync external data sources |
Các Pattern Thiết Kế Sử Dụng
- Event-Driven: Task queue cho background processing
- Streaming: SSE cho real-time chat responses
- Hybrid Search: Kết hợp vector và text search
- ReAct Pattern: Agent reasoning với tool use
- Multi-Tenancy: Data isolation per tenant