ragflow/personal_analyze/04_modules_analysis.md

# RAGFlow - Phân Tích Chi Tiết Các Module

## 1. Module API (`/api/`)

### 1.1 Tổng Quan

Module API là trung tâm xử lý tất cả HTTP requests của hệ thống. Được xây dựng trên Flask/Quart framework với kiến trúc Blueprint.

### 1.2 Cấu Trúc

```
api/
├── ragflow_server.py      # Entry point - Khởi tạo Flask app
├── settings.py            # Cấu hình server
├── constants.py           # API_VERSION = "v1"
├── validation.py          # Request validation
│
├── apps/                  # API Blueprints
├── db/                    # Database layer
└── utils/                 # Utilities
```

### 1.3 Chi Tiết Các Blueprint (API Apps)

#### 1.3.1 `kb_app.py` - Knowledge Base Management
**Chức năng**: Quản lý Knowledge Base (tạo, xóa, sửa, liệt kê)

**Endpoints chính**:
| Method | Endpoint | Mô tả |
|--------|----------|-------|
| POST | `/api/v1/kb/create` | Tạo KB mới |
| GET | `/api/v1/kb/list` | Liệt kê KBs |
| PUT | `/api/v1/kb/update` | Cập nhật KB |
| DELETE | `/api/v1/kb/delete` | Xóa KB |
| GET | `/api/v1/kb/{id}` | Chi tiết KB |

**Logic chính**:
- Validation tenant permissions
- Tạo Elasticsearch index cho mỗi KB
- Quản lý embedding model settings
- Quản lý parser configurations

#### 1.3.2 `document_app.py` - Document Management
**Chức năng**: Upload, parsing, và quản lý documents

**Endpoints chính**:
| Method | Endpoint | Mô tả |
|--------|----------|-------|
| POST | `/api/v1/document/upload` | Upload file |
| POST | `/api/v1/document/run` | Trigger parsing |
| GET | `/api/v1/document/list` | Liệt kê docs |
| DELETE | `/api/v1/document/delete` | Xóa document |
| GET | `/api/v1/document/{id}/chunks` | Lấy chunks |

**Logic chính**:
- File type validation
- MinIO storage integration
- Background task queuing
- Parsing status tracking

#### 1.3.3 `dialog_app.py` - Chat/Dialog Management
**Chức năng**: Xử lý chat conversations với RAG

**Endpoints chính**:
| Method | Endpoint | Mô tả |
|--------|----------|-------|
| POST | `/api/v1/dialog/create` | Tạo dialog |
| POST | `/api/v1/dialog/chat` | Chat (SSE streaming) |
| POST | `/api/v1/dialog/completion` | Non-streaming chat |
| GET | `/api/v1/dialog/list` | Liệt kê dialogs |

**Logic chính**:
- RAG pipeline orchestration
- Streaming response (SSE)
- Conversation history management
- Multi-KB retrieval

#### 1.3.4 `canvas_app.py` - Agent Workflow
**Chức năng**: Visual workflow builder cho AI agents

**Endpoints chính**:
| Method | Endpoint | Mô tả |
|--------|----------|-------|
| POST | `/api/v1/canvas/create` | Tạo workflow |
| POST | `/api/v1/canvas/run` | Execute workflow |
| PUT | `/api/v1/canvas/update` | Cập nhật |
| GET | `/api/v1/canvas/list` | Liệt kê |

**Logic chính**:
- DSL parsing và validation
- Component orchestration
- Tool integration
- Variable passing between nodes

#### 1.3.5 `file_app.py` - File Management
**Chức năng**: Upload, download, quản lý files

**Endpoints chính**:
| Method | Endpoint | Mô tả |
|--------|----------|-------|
| POST | `/api/v1/file/upload` | Upload file |
| GET | `/api/v1/file/download/{id}` | Download |
| GET | `/api/v1/file/list` | Liệt kê files |
| DELETE | `/api/v1/file/delete` | Xóa file |

#### 1.3.6 `search_app.py` - Search Operations
**Chức năng**: Full-text và semantic search

**Endpoints chính**:
| Method | Endpoint | Mô tả |
|--------|----------|-------|
| POST | `/api/v1/search` | Hybrid search |
| GET | `/api/v1/search/history` | Search history |

### 1.4 Database Services (`/api/db/services/`)

#### `dialog_service.py` (37KB - Service phức tạp nhất)
```python
class DialogService:
    def chat(dialog_id, question, stream=True):
        """
        Main RAG chat function
        1. Load dialog configuration
        2. Get relevant documents (retrieval)
        3. Rerank results
        4. Build prompt with context
        5. Call LLM (streaming)
        6. Save conversation
        """

    def retrieval(dialog, question):
        """
        Hybrid retrieval from Elasticsearch
        - Vector similarity search
        - BM25 full-text search
        - Score combination
        """

    def rerank(chunks, question):
        """
        Cross-encoder reranking
        - Score each chunk against question
        - Return top-k
        """
```

#### `document_service.py` (39KB)
```python
class DocumentService:
    def upload(file, kb_id):
        """Upload file to MinIO, create DB record"""

    def parse(doc_id):
        """Queue document for background parsing"""

    def chunk(doc_id, chunks):
        """Save parsed chunks to ES and DB"""

    def delete(doc_id):
        """Remove doc, chunks, and file"""
```

#### `knowledgebase_service.py` (21KB)
```python
class KnowledgebaseService:
    def create(name, embedding_model, parser_id):
        """Create KB with ES index"""

    def update_parser_config(kb_id, config):
        """Update chunking/parsing settings"""

    def get_statistics(kb_id):
        """Get doc count, chunk count, etc."""
```

### 1.5 Database Models (`/api/db/db_models.py`)

**25+ Models quan trọng**:

```python
# User & Tenant
class User(BaseModel):
    id, email, password, nickname, avatar, status, login_channel

class Tenant(BaseModel):
    id, name, public_key, llm_id, embd_id, parser_id, credit

class UserTenant(BaseModel):
    user_id, tenant_id, role  # owner, admin, member

# Knowledge Management
class Knowledgebase(BaseModel):
    id, tenant_id, name, description, embd_id, parser_id,
    similarity_threshold, vector_similarity_weight, ...

class Document(BaseModel):
    id, kb_id, name, location, size, type, parser_id,
    status, progress, chunk_num, token_num, process_duation

class File(BaseModel):
    id, tenant_id, name, size, location, type, source_type

# Chat & Dialog
class Dialog(BaseModel):
    id, tenant_id, name, description, kb_ids, llm_id,
    prompt_config, similarity_threshold, top_n, top_k

class Conversation(BaseModel):
    id, dialog_id, name, message  # JSON array of messages

# Workflow
class UserCanvas(BaseModel):
    id, tenant_id, name, dsl, avatar  # DSL is workflow definition

class CanvasTemplate(BaseModel):
    id, name, dsl, avatar  # Pre-built templates

# Integration
class APIToken(BaseModel):
    id, tenant_id, token, dialog_id  # For external API access

class MCPServer(BaseModel):
    id, tenant_id, name, host, tools  # MCP server config
```

---

## 2. Module RAG (`/rag/`)

### 2.1 Tổng Quan

Core RAG processing engine - xử lý từ document parsing đến retrieval.

### 2.2 LLM Abstractions (`/rag/llm/`)

#### `chat_model.py` - Chat LLM Interface
```python
class Base:
    """Abstract base for all chat models"""
    def chat(messages, stream=True, **kwargs):
        """Generate chat completion"""

class OpenAIChat(Base):
    """OpenAI GPT models"""

class ClaudeChat(Base):
    """Anthropic Claude models"""

class QwenChat(Base):
    """Alibaba Qwen models"""

class OllamaChat(Base):
    """Local Ollama models"""

# Factory function
def get_chat_model(model_name, api_key, base_url):
    """Return appropriate chat model instance"""
```

**Supported Providers** (20+):
- OpenAI (GPT-3.5, GPT-4, GPT-4V)
- Anthropic (Claude 3)
- Google (Gemini)
- Alibaba (Qwen, Qwen-VL)
- Groq
- Mistral
- Cohere
- DeepSeek
- Zhipu (GLM)
- Moonshot
- Ollama (local)
- NVIDIA
- Bedrock (AWS)
- Azure OpenAI
- Hugging Face
- ...

#### `embedding_model.py` - Embedding Interface
```python
class Base:
    """Abstract base for embeddings"""
    def encode(texts: List[str]) -> List[List[float]]:
        """Generate embeddings for texts"""

class OpenAIEmbed(Base):
    """text-embedding-ada-002, text-embedding-3-*"""

class BGEEmbed(Base):
    """BAAI BGE models"""

class JinaEmbed(Base):
    """Jina AI embeddings"""

# Supported embedding models:
# - OpenAI: ada-002, embedding-3-small, embedding-3-large
# - BGE: bge-base, bge-large, bge-m3
# - Jina: jina-embeddings-v2
# - Cohere: embed-english-v3
# - HuggingFace: sentence-transformers
# - Local: Ollama embeddings
```

#### `rerank_model.py` - Reranking Interface
```python
class Base:
    """Abstract base for rerankers"""
    def rerank(query: str, documents: List[str]) -> List[float]:
        """Score documents against query"""

class CohereRerank(Base):
    """Cohere rerank models"""

class JinaRerank(Base):
    """Jina AI reranker"""

class BGERerank(Base):
    """BAAI BGE reranker"""
```

### 2.3 RAG Pipeline (`/rag/flow/`)

#### Pipeline Architecture
```
Document → Parser → Tokenizer → Splitter → Embedder → Index
```

#### `parser/parser.py`
```python
def parse(file_path, parser_config):
    """
    Parse document based on file type
    Returns: List of text segments with metadata
    """
    # Supported parsers:
    # - naive: Simple text extraction
    # - paper: Academic paper structure
    # - book: Book chapter detection
    # - laws: Legal document parsing
    # - presentation: PPT parsing
    # - qa: Q&A format extraction
    # - table: Table extraction
    # - picture: Image description
    # - one: Single chunk per doc
    # - audio: Audio transcription
    # - email: Email thread parsing
```

#### `splitter/splitter.py`
```python
class Splitter:
    """Document chunking strategies"""

    def split_by_tokens(text, chunk_size=512, overlap=128):
        """Token-based splitting"""

    def split_by_sentences(text, max_sentences=10):
        """Sentence-based splitting"""

    def split_by_delimiter(text, delimiter='\n\n'):
        """Delimiter-based splitting"""

    def split_semantic(text, threshold=0.5):
        """Semantic similarity based splitting"""
```

#### `tokenizer/tokenizer.py`
```python
class Tokenizer:
    """Text tokenization"""

    def tokenize(text):
        """Convert text to tokens"""

    def count_tokens(text):
        """Count tokens in text"""

    # Uses tiktoken for OpenAI models
    # Uses model-specific tokenizers for others
```

### 2.4 RAPTOR (`/rag/raptor.py`)

**RAPTOR** = Recursive Abstractive Processing for Tree-Organized Retrieval

```python
class RAPTOR:
    """
    Hierarchical document representation
    - Clusters similar chunks
    - Creates summaries of clusters
    - Builds tree structure for retrieval
    """

    def build_tree(chunks):
        """Build RAPTOR tree from chunks"""

    def retrieve(query, tree):
        """Retrieve from tree structure"""
```

---

## 3. Module DeepDoc (`/deepdoc/`)

### 3.1 Tổng Quan

Deep document understanding với layout analysis và OCR.

### 3.2 Document Parsers (`/deepdoc/parser/`)

#### `pdf_parser.py` - PDF Processing
```python
class PdfParser:
    """
    Advanced PDF parsing with:
    - OCR for scanned pages
    - Layout analysis (tables, figures, headers)
    - Multi-column detection
    - Image extraction
    """

    def __call__(file_path):
        """Parse PDF file"""
        # 1. Extract text with PyMuPDF
        # 2. Apply OCR if needed (Tesseract)
        # 3. Analyze layout (detectron2/layoutlm)
        # 4. Extract tables (camelot/tabula)
        # 5. Extract images
        # Return structured content
```

#### `docx_parser.py` - Word Documents
```python
class DocxParser:
    """
    Parse .docx files
    - Text extraction
    - Table extraction
    - Image extraction
    - Style preservation
    """
```

#### `excel_parser.py` - Spreadsheets
```python
class ExcelParser:
    """
    Parse .xlsx/.xls files
    - Sheet-by-sheet processing
    - Table structure preservation
    - Formula evaluation
    """
```

#### `html_parser.py` - Web Pages
```python
class HtmlParser:
    """
    Parse HTML content
    - Clean HTML
    - Extract main content
    - Handle tables
    - Remove scripts/styles
    """
```

### 3.3 Vision Module (`/deepdoc/vision/`)

```python
class LayoutAnalyzer:
    """
    Document layout analysis using ML
    - Detectron2 for object detection
    - LayoutLM for document understanding
    """

    def analyze(image):
        """
        Detect document regions:
        - Title
        - Paragraph
        - Table
        - Figure
        - Header/Footer
        - List
        """
```

---

## 4. Module Agent (`/agent/`)

### 4.1 Tổng Quan

Agentic workflow system với visual canvas builder.

### 4.2 Canvas Engine (`/agent/canvas.py`)

```python
class Canvas:
    """
    Main workflow orchestrator
    - Parse DSL definition
    - Execute components in order
    - Handle branching logic
    - Manage variables
    """

    def __init__(self, dsl):
        """Initialize from DSL"""
        self.components = self._parse_dsl(dsl)
        self.graph = self._build_graph()

    def run(self, input_data):
        """Execute workflow"""
        context = {"input": input_data}

        for component in self._topological_sort():
            result = component.execute(context)
            context.update(result)

        return context["output"]
```

### 4.3 Components (`/agent/component/`)

#### `begin.py` - Workflow Start
```python
class BeginComponent:
    """
    Entry point of workflow
    - Initialize variables
    - Receive user input
    """
    def execute(self, context):
        return {"user_input": context["input"]}
```

#### `llm.py` - LLM Component
```python
class LLMComponent:
    """
    Call LLM with configured prompt
    - Template variable substitution
    - Streaming support
    - Output parsing
    """
    def execute(self, context):
        prompt = self.template.format(**context)
        response = self.llm.chat(prompt)
        return {"llm_output": response}
```

#### `retrieval.py` - Retrieval Component
```python
class RetrievalComponent:
    """
    Search knowledge bases
    - Multi-KB search
    - Configurable top_k
    - Score threshold
    """
    def execute(self, context):
        query = context["user_input"]
        results = self.search(query, self.kb_ids)
        return {"retrieved_docs": results}
```

#### `categorize.py` - Conditional Branching
```python
class CategorizeComponent:
    """
    Route to different paths based on conditions
    - LLM-based classification
    - Rule-based matching
    """
    def execute(self, context):
        category = self._classify(context)
        return {"next_node": self.routes[category]}
```

#### `agent_with_tools.py` - Tool-Using Agent
```python
class AgentWithToolsComponent:
    """
    ReAct pattern agent
    - Tool selection
    - Iterative reasoning
    - Observation handling
    """
    def execute(self, context):
        while not done:
            action = self.llm.decide_action(context)
            if action.type == "tool":
                result = self.tools[action.tool].run(action.input)
                context["observation"] = result
            else:
                return {"output": action.response}
```

### 4.4 Tools (`/agent/tools/`)

#### External Tool Integrations

| Tool | File | Chức năng |
|------|------|-----------|
| Tavily | `tavily.py` | Web search API |
| ArXiv | `arxiv.py` | Academic paper search |
| Google | `google.py` | Google search |
| Wikipedia | `wikipedia.py` | Wikipedia lookup |
| GitHub | `github.py` | GitHub API |
| Email | `email.py` | Send emails |
| Code Exec | `code_exec.py` | Execute Python code |
| DeepL | `deepl.py` | Translation |
| Jin10 | `jin10.py` | Financial news |
| TuShare | `tushare.py` | Chinese stock data |
| Yahoo Finance | `yahoofinance.py` | Stock data |
| QWeather | `qweather.py` | Weather data |

```python
class BaseTool:
    """Base class for all tools"""
    name: str
    description: str

    def run(self, input: str) -> str:
        """Execute tool and return result"""

class TavilySearch(BaseTool):
    name = "tavily_search"
    description = "Search the web for current information"

    def run(self, query):
        response = tavily.search(query)
        return format_results(response)
```

---

## 5. Module GraphRAG (`/graphrag/`)

### 5.1 Tổng Quan

Knowledge graph construction và querying.

### 5.2 Entity Resolution (`/graphrag/entity_resolution.py`)

```python
class EntityResolution:
    """
    Entity extraction và linking
    - Extract entities from text
    - Cluster similar entities
    - Resolve duplicates
    """

    def extract_entities(text):
        """Extract named entities using LLM"""
        prompt = f"Extract entities from: {text}"
        return llm.chat(prompt)

    def resolve_entities(entities):
        """Merge duplicate entities"""
        clusters = self._cluster_similar(entities)
        return self._merge_clusters(clusters)
```

### 5.3 Graph Search (`/graphrag/search.py`)

```python
class GraphSearch:
    """
    Query knowledge graph
    - Entity-based search
    - Relationship traversal
    - Subgraph extraction
    """

    def search(query):
        """Find relevant subgraph for query"""
        # 1. Extract query entities
        # 2. Find matching graph entities
        # 3. Traverse relationships
        # 4. Return context subgraph
```

---

## 6. Module Frontend (`/web/`)

### 6.1 Tổng Quan

React/TypeScript SPA với UmiJS framework.

### 6.2 Pages (`/web/src/pages/`)

| Page | Chức năng |
|------|-----------|
| `/dataset` | Knowledge base management |
| `/datasets` | Dataset list view |
| `/next-chats` | Chat interface |
| `/next-searches` | Search interface |
| `/document-viewer` | Document preview |
| `/admin` | Admin dashboard |
| `/login` | Authentication |
| `/register` | User registration |

### 6.3 Components (`/web/src/components/`)

**Core Components**:
- `file-upload-modal/` - File upload UI
- `pdf-drawer/` - PDF preview drawer
- `prompt-editor/` - Prompt template editor
- `document-preview/` - Document viewer
- `llm-setting-items/` - LLM configuration UI
- `ui/` - Shadcn/UI base components

### 6.4 State Management

```typescript
// Using Zustand for state
import { create } from 'zustand';

interface KnowledgebaseStore {
  knowledgebases: Knowledgebase[];
  currentKb: Knowledgebase | null;
  fetchKnowledgebases: () => Promise<void>;
  createKnowledgebase: (data: CreateKbRequest) => Promise<void>;
}

export const useKnowledgebaseStore = create<KnowledgebaseStore>((set) => ({
  knowledgebases: [],
  currentKb: null,
  fetchKnowledgebases: async () => {
    const data = await api.get('/kb/list');
    set({ knowledgebases: data });
  },
  // ...
}));
```

### 6.5 API Services (`/web/src/services/`)

```typescript
// API client using Axios
import { request } from 'umi';

export async function createKnowledgebase(data: CreateKbRequest) {
  return request('/api/v1/kb/create', {
    method: 'POST',
    data,
  });
}

export async function chat(dialogId: string, question: string) {
  return request('/api/v1/dialog/chat', {
    method: 'POST',
    data: { dialog_id: dialogId, question },
    responseType: 'stream',
  });
}
```

---

## 7. Module Common (`/common/`)

### 7.1 Configuration (`/common/settings.py`)

```python
# Main configuration file
class Settings:
    # Database
    MYSQL_HOST = os.getenv('MYSQL_HOST', 'localhost')
    MYSQL_PORT = int(os.getenv('MYSQL_PORT', 5455))
    MYSQL_USER = os.getenv('MYSQL_USER', 'root')
    MYSQL_PASSWORD = os.getenv('MYSQL_PASSWORD', 'infini_rag_flow')
    MYSQL_DATABASE = os.getenv('MYSQL_DATABASE', 'ragflow')

    # Elasticsearch
    ES_HOSTS = os.getenv('ES_HOSTS', 'http://localhost:9200').split(',')

    # Redis
    REDIS_HOST = os.getenv('REDIS_HOST', 'localhost')
    REDIS_PORT = int(os.getenv('REDIS_PORT', 6379))

    # MinIO
    MINIO_HOST = os.getenv('MINIO_HOST', 'localhost:9000')
    MINIO_ACCESS_KEY = os.getenv('MINIO_USER', 'rag_flow')
    MINIO_SECRET_KEY = os.getenv('MINIO_PASSWORD', 'infini_rag_flow')

    # Document Engine
    DOC_ENGINE = os.getenv('DOC_ENGINE', 'elasticsearch')  # or 'infinity'
```

### 7.2 Data Source Connectors (`/common/data_source/`)

**Supported Connectors**:

| Connector | File | Chức năng |
|-----------|------|-----------|
| Confluence | `confluence_connector.py` (81KB) | Atlassian Confluence wiki |
| Notion | `notion_connector.py` (25KB) | Notion databases |
| Slack | `slack_connector.py` (22KB) | Slack messages |
| Gmail | `gmail_connector.py` | Gmail emails |
| Discord | `discord_connector.py` | Discord channels |
| SharePoint | `sharepoint_connector.py` | Microsoft SharePoint |
| Teams | `teams_connector.py` | Microsoft Teams |
| Dropbox | `dropbox_connector.py` | Dropbox files |
| Google Drive | `google_drive/` | Google Drive |
| WebDAV | `webdav_connector.py` | WebDAV servers |
| Moodle | `moodle_connector.py` | Moodle LMS |

```python
class BaseConnector:
    """Abstract base for connectors"""

    def authenticate(credentials):
        """Authenticate with external service"""

    def list_items():
        """List available items"""

    def sync():
        """Sync data to RAGFlow"""

class ConfluenceConnector(BaseConnector):
    """Confluence integration"""

    def __init__(self, url, username, api_token):
        self.client = Confluence(url, username, api_token)

    def sync_space(space_key):
        """Sync all pages from a space"""
        pages = self.client.get_all_pages(space_key)
        for page in pages:
            content = self._convert_to_markdown(page.body)
            yield Document(content=content, metadata=page.metadata)
```

---

## 8. Module SDK (`/sdk/python/`)

### 8.1 Python SDK

```python
from ragflow import RAGFlow

# Initialize client
client = RAGFlow(
    api_key="your-api-key",
    base_url="http://localhost:9380"
)

# Create knowledge base
kb = client.create_knowledgebase(
    name="My KB",
    embedding_model="text-embedding-3-small"
)

# Upload document
doc = kb.upload_document("path/to/document.pdf")

# Wait for parsing
doc.wait_for_ready()

# Create chat
chat = client.create_chat(
    name="My Chat",
    knowledgebase_ids=[kb.id]
)

# Send message
response = chat.send_message("What is this document about?")
print(response.answer)
```

---

## 9. Tóm Tắt Module Dependencies

```
┌─────────────────────────────────────────────────────────────────┐
│                         Frontend (web/)                          │
└─────────────────────────────┬───────────────────────────────────┘
                              │ HTTP/SSE
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                          API (api/)                              │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐           │
│   │ kb_app  │  │doc_app  │  │dialog_  │  │canvas_  │           │
│   │         │  │         │  │app      │  │app      │           │
│   └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘           │
│        └────────────┴───────────┴────────────┘                  │
│                              │                                   │
│   ┌──────────────────────────┴──────────────────────────┐      │
│   │                    Services Layer                    │      │
│   │  DialogService │ DocumentService │ KBService         │      │
│   └───────────────────────────┬─────────────────────────┘      │
└───────────────────────────────┼─────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
        ▼                       ▼                       ▼
┌───────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   RAG (rag/)  │    │  Agent (agent/)  │    │GraphRAG(graphrag)│
│               │    │                  │    │                  │
│ - LLM Models  │    │ - Canvas Engine  │    │ - Entity Res.    │
│ - Pipeline    │    │ - Components     │    │ - Graph Search   │
│ - Embeddings  │    │ - Tools          │    │                  │
└───────┬───────┘    └────────┬─────────┘    └────────┬─────────┘
        │                     │                       │
        └─────────────────────┼───────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DeepDoc (deepdoc/)                          │
│                                                                  │
│   PDF Parser │ DOCX Parser │ HTML Parser │ Vision/OCR           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                       Common (common/)                           │
│                                                                  │
│   Settings │ Utilities │ Data Source Connectors                 │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Data Stores                                 │
│                                                                  │
│   MySQL │ Elasticsearch/Infinity │ Redis │ MinIO                │
└─────────────────────────────────────────────────────────────────┘
```

## 10. Kích Thước Code Ước Tính

| Module | Lines of Code | Complexity |
|--------|--------------|------------|
| api/ | ~15,000 | High |
| rag/ | ~8,000 | High |
| deepdoc/ | ~5,000 | Medium |
| agent/ | ~6,000 | High |
| graphrag/ | ~3,000 | Medium |
| web/src/ | ~20,000 | High |
| common/ | ~5,000 | Medium |
| **Total** | **~62,000** | - |