docs: Add comprehensive RAGFlow analysis documentation

- Add directory structure analysis (01_directory_structure.md) - Add system architecture with diagrams (02_system_architecture.md) - Add sequence diagrams for main flows (03_sequence_diagrams.md) - Add detailed modules analysis (04_modules_analysis.md) - Add tech stack documentation (05_tech_stack.md) - Add source code analysis (06_source_code_analysis.md) - Add README summary for personal_analyze folder This documentation provides: - Complete codebase structure overview - System architecture diagrams (ASCII art) - Sequence diagrams for authentication, RAG, chat, agent flows - Detailed analysis of API, RAG, DeepDoc, Agent, GraphRAG modules - Full tech stack with 150+ dependencies analyzed - Source code patterns and best practices analysis
2025-11-26 10:20:05 +00:00 · 2025-11-26 10:20:05 +00:00 · c7cecf9a1f
commit c7cecf9a1f
parent 2fd5ac1031
7 changed files with 4841 additions and 0 deletions
--- a/personal_analyze/01_directory_structure.md
+++ b/personal_analyze/01_directory_structure.md
@ -0,0 +1,348 @@
+# RAGFlow - Cấu Trúc Thư Mục
+
+## Tổng Quan
+
+RAGFlow (v0.22.1) là một RAG (Retrieval-Augmented Generation) engine mã nguồn mở dựa trên deep document understanding. Dự án được xây dựng với kiến trúc full-stack bao gồm Python backend và React/TypeScript frontend.
+
+## Cấu Trúc Thư Mục Chi Tiết
+
+```
+ragflow/
+│
+├── api/                          # [BACKEND] Flask API Server
+│   ├── ragflow_server.py         # Entry point chính
+│   ├── settings.py               # Cấu hình server
+│   ├── constants.py              # Hằng số API (API_VERSION = "v1")
+│   ├── validation.py             # Request validation
+│   │
+│   ├── apps/                     # Flask Blueprints - API endpoints
+│   │   ├── kb_app.py             # Knowledge Base management
+│   │   ├── document_app.py       # Document processing
+│   │   ├── dialog_app.py         # Chat/Dialog handling
+│   │   ├── canvas_app.py         # Agent workflow canvas
+│   │   ├── file_app.py           # File upload/management
+│   │   ├── chunk_app.py          # Document chunking
+│   │   ├── conversation_app.py   # Conversation management
+│   │   ├── search_app.py         # Search functionality
+│   │   ├── system_app.py         # System configuration
+│   │   ├── llm_app.py            # LLM model management
+│   │   ├── connector_app.py      # Data source connectors
+│   │   ├── mcp_server_app.py     # MCP server integration
+│   │   ├── langfuse_app.py       # Langfuse observability
+│   │   ├── api_app.py            # API key management
+│   │   ├── plugin_app.py         # Plugin management
+│   │   ├── tenant_app.py         # Multi-tenancy
+│   │   ├── user_app.py           # User management
+│   │   │
+│   │   ├── auth/                 # Authentication modules
+│   │   │   ├── oauth.py          # OAuth base
+│   │   │   ├── github.py         # GitHub OAuth
+│   │   │   └── oidc.py           # OpenID Connect
+│   │   │
+│   │   └── sdk/                  # SDK REST API endpoints
+│   │       ├── dataset.py        # Dataset API
+│   │       ├── doc.py            # Document API
+│   │       ├── chat.py           # Chat API
+│   │       ├── session.py        # Session API
+│   │       ├── files.py          # File API
+│   │       ├── agents.py         # Agent API
+│   │       └── dify_retrieval.py # Dify integration
+│   │
+│   ├── db/                       # Database layer
+│   │   ├── db_models.py          # SQLAlchemy/Peewee models (54KB)
+│   │   ├── db_utils.py           # Database utilities
+│   │   ├── init_data.py          # Initial data seeding
+│   │   ├── runtime_config.py     # Runtime configuration
+│   │   │
+│   │   ├── services/             # Business logic services
+│   │   │   ├── user_service.py           # User operations
+│   │   │   ├── dialog_service.py         # Dialog logic (37KB)
+│   │   │   ├── document_service.py       # Document processing (39KB)
+│   │   │   ├── file_service.py           # File handling (22KB)
+│   │   │   ├── knowledgebase_service.py  # KB management (21KB)
+│   │   │   ├── task_service.py           # Task queue (20KB)
+│   │   │   ├── canvas_service.py         # Canvas logic (12KB)
+│   │   │   ├── conversation_service.py   # Conversation handling
+│   │   │   ├── connector_service.py      # Connector management
+│   │   │   ├── llm_service.py            # LLM operations
+│   │   │   ├── search_service.py         # Search operations
+│   │   │   └── api_service.py            # API token service
+│   │   │
+│   │   └── joint_services/       # Cross-service operations
+│   │
+│   └── utils/                    # API utilities
+│       ├── api_utils.py          # API helpers
+│       ├── file_utils.py         # File utilities
+│       ├── crypt.py              # Encryption
+│       └── log_utils.py          # Logging
+│
+├── rag/                          # [CORE] RAG Processing Engine
+│   ├── settings.py               # RAG configuration
+│   ├── raptor.py                 # RAPTOR algorithm
+│   ├── benchmark.py              # Performance testing
+│   │
+│   ├── llm/                      # LLM Model Abstractions
+│   │   ├── chat_model.py         # Chat LLM interface
+│   │   ├── embedding_model.py    # Embedding models
+│   │   ├── rerank_model.py       # Reranking models
+│   │   ├── cv_model.py           # Computer vision
+│   │   ├── tts_model.py          # Text-to-speech
+│   │   └── sequence2txt_model.py # Sequence to text
+│   │
+│   ├── flow/                     # RAG Pipeline
+│   │   ├── pipeline.py           # Main pipeline
+│   │   ├── file.py               # File processing
+│   │   │
+│   │   ├── parser/               # Document parsing
+│   │   │   ├── parser.py
+│   │   │   └── schema.py
+│   │   │
+│   │   ├── extractor/            # Information extraction
+│   │   │   ├── extractor.py
+│   │   │   └── schema.py
+│   │   │
+│   │   ├── tokenizer/            # Text tokenization
+│   │   │   ├── tokenizer.py
+│   │   │   └── schema.py
+│   │   │
+│   │   ├── splitter/             # Document chunking
+│   │   │   ├── splitter.py
+│   │   │   └── schema.py
+│   │   │
+│   │   └── hierarchical_merger/  # Hierarchical merging
+│   │       ├── hierarchical_merger.py
+│   │       └── schema.py
+│   │
+│   ├── app/                      # RAG application logic
+│   ├── nlp/                      # NLP utilities
+│   ├── utils/                    # RAG utilities
+│   └── prompts/                  # LLM prompt templates
+│
+├── deepdoc/                      # [DOCUMENT] Deep Document Understanding
+│   ├── parser/                   # Multi-format parsers
+│   │   ├── pdf_parser.py         # PDF with layout analysis
+│   │   ├── docx_parser.py        # Word documents
+│   │   ├── ppt_parser.py         # PowerPoint
+│   │   ├── excel_parser.py       # Excel spreadsheets
+│   │   ├── html_parser.py        # HTML pages
+│   │   ├── markdown_parser.py    # Markdown files
+│   │   ├── json_parser.py        # JSON data
+│   │   ├── txt_parser.py         # Plain text
+│   │   ├── figure_parser.py      # Image/figure extraction
+│   │   │
+│   │   └── resume/               # Resume parsing
+│   │       ├── step_one.py
+│   │       └── step_two.py
+│   │
+│   └── vision/                   # Computer vision modules
+│
+├── agent/                        # [AGENT] Agentic Workflow System
+│   ├── canvas.py                 # Canvas orchestration (25KB)
+│   ├── settings.py               # Agent configuration
+│   │
+│   ├── component/                # Workflow components
+│   │   ├── begin.py              # Workflow start
+│   │   ├── llm.py                # LLM invocation
+│   │   ├── agent_with_tools.py   # Agent with tools
+│   │   ├── retrieval.py          # Document retrieval
+│   │   ├── categorize.py         # Message categorization
+│   │   ├── message.py            # Message handling
+│   │   ├── webhook.py            # Webhook triggers
+│   │   ├── iteration.py          # Loop iteration
+│   │   └── variable_assigner.py  # Variable assignment
+│   │
+│   ├── tools/                    # External tool integrations
+│   │   ├── tavily.py             # Web search
+│   │   ├── arxiv.py              # Academic papers
+│   │   ├── github.py             # GitHub API
+│   │   ├── google.py             # Google Search
+│   │   ├── wikipedia.py          # Wikipedia
+│   │   ├── email.py              # Email sending
+│   │   ├── code_exec.py          # Code execution
+│   │   └── yahoofinance.py       # Financial data
+│   │
+│   └── templates/                # Pre-built workflows
+│
+├── graphrag/                     # [GRAPH] Knowledge Graph RAG
+│   ├── entity_resolution.py      # Entity linking (12KB)
+│   ├── search.py                 # Graph search (14KB)
+│   ├── utils.py                  # Graph utilities (23KB)
+│   ├── general/                  # General graph operations
+│   └── light/                    # Lightweight implementations
+│
+├── web/                          # [FRONTEND] React/TypeScript
+│   ├── package.json              # NPM dependencies (172 packages)
+│   ├── .umirc.ts                 # UmiJS configuration
+│   ├── tailwind.config.js        # Tailwind CSS config
+│   │
+│   └── src/
+│       ├── pages/                # UmiJS page routes
+│       │   ├── admin/            # Admin dashboard
+│       │   ├── dataset/          # Knowledge base management
+│       │   ├── datasets/         # Datasets list
+│       │   ├── knowledge/        # Knowledge management
+│       │   ├── next-chats/       # Chat interface
+│       │   ├── next-searches/    # Search interface
+│       │   ├── document-viewer/  # Document preview
+│       │   ├── login/            # Authentication
+│       │   └── register/         # User registration
+│       │
+│       ├── components/           # React components
+│       │   ├── file-upload-modal/
+│       │   ├── pdf-drawer/
+│       │   ├── prompt-editor/
+│       │   ├── document-preview/
+│       │   └── ui/               # Shadcn/UI components
+│       │
+│       ├── services/             # API client services
+│       ├── hooks/                # React hooks
+│       ├── interfaces/           # TypeScript interfaces
+│       ├── utils/                # Utility functions
+│       ├── constants/            # Constants
+│       └── locales/              # i18n translations
+│
+├── common/                       # [SHARED] Common Utilities
+│   ├── settings.py               # Main configuration (11KB)
+│   ├── config_utils.py           # Config utilities
+│   ├── connection_utils.py       # Database connections
+│   ├── constants.py              # Global constants
+│   ├── exceptions.py             # Exception definitions
+│   │
+│   ├── Utilities:
+│   │   ├── log_utils.py          # Logging setup
+│   │   ├── file_utils.py         # File operations
+│   │   ├── string_utils.py       # String utilities
+│   │   ├── token_utils.py        # Token operations
+│   │   └── time_utils.py         # Time utilities
+│   │
+│   └── data_source/              # Data source connectors
+│       ├── confluence_connector.py (81KB)
+│       ├── notion_connector.py (25KB)
+│       ├── slack_connector.py (22KB)
+│       ├── gmail_connector.py
+│       ├── discord_connector.py
+│       ├── sharepoint_connector.py
+│       ├── dropbox_connector.py
+│       └── google_drive/
+│
+├── sdk/                          # [SDK] Python Client Library
+│   └── python/
+│       └── ragflow_sdk/          # SDK implementation
+│
+├── mcp/                          # [MCP] Model Context Protocol
+│   ├── server/                   # MCP server
+│   │   └── server.py
+│   └── client/                   # MCP client
+│       └── client.py
+│
+├── admin/                        # [ADMIN] Admin Interface
+│   ├── server/                   # Admin backend
+│   └── client/                   # Admin frontend
+│
+├── plugin/                       # [PLUGIN] Plugin System
+│   ├── plugin_manager.py         # Plugin management
+│   ├── llm_tool_plugin.py        # LLM tool plugins
+│   └── embedded_plugins/         # Built-in plugins
+│
+├── docker/                       # [DEPLOYMENT] Docker Configuration
+│   ├── docker-compose.yml        # Main compose file
+│   ├── docker-compose-base.yml   # Base services
+│   ├── .env                      # Environment variables
+│   ├── entrypoint.sh             # Container entry
+│   ├── service_conf.yaml.template # Service config
+│   ├── nginx/                    # Nginx configuration
+│   │   └── nginx.conf
+│   └── init.sql                  # Database init
+│
+├── conf/                         # [CONFIG] Configuration Files
+│   ├── llm_factories.json        # LLM providers
+│   ├── mapping.json              # Field mappings
+│   ├── service_conf.yaml         # Service configuration
+│   ├── private.pem               # RSA private key
+│   └── public.pem                # RSA public key
+│
+├── test/                         # [TEST] Testing Suite
+│   ├── unit_test/                # Unit tests
+│   │   └── common/               # Common utilities tests
+│   │
+│   └── testcases/                # Integration tests
+│       ├── test_http_api/        # HTTP API tests
+│       ├── test_sdk_api/         # SDK tests
+│       └── test_web_api/         # Web API tests
+│
+├── example/                      # [EXAMPLES] Usage Examples
+│   ├── http/                     # HTTP API examples
+│   └── sdk/                      # SDK examples
+│
+├── intergrations/                # [INTEGRATIONS] Third-party
+│   ├── chatgpt-on-wechat/        # WeChat integration
+│   ├── extension_chrome/         # Chrome extension
+│   └── firecrawl/                # Web scraping
+│
+├── agentic_reasoning/            # [REASONING] Advanced reasoning
+├── sandbox/                      # [SANDBOX] Code execution
+├── helm/                         # [K8S] Kubernetes Helm charts
+├── docs/                         # [DOCS] Documentation
+│
+├── pyproject.toml                # Python project config
+├── CLAUDE.md                     # Development guidelines
+└── README.md                     # Project overview
+```
+
+## Mô Tả Chi Tiết Các Thư Mục Chính
+
+### 1. `/api/` - Backend API Server
+- **Vai trò**: Xử lý tất cả HTTP requests, authentication, và business logic
+- **Framework**: Flask/Quart (async ASGI)
+- **Port mặc định**: 9380
+- **Entry point**: `ragflow_server.py`
+
+### 2. `/rag/` - RAG Processing Engine
+- **Vai trò**: Xử lý pipeline RAG từ document parsing đến retrieval
+- **Chức năng chính**:
+  - Document parsing và extraction
+  - Text tokenization
+  - Semantic chunking
+  - Embedding generation
+  - Reranking
+
+### 3. `/deepdoc/` - Document Understanding
+- **Vai trò**: Deep document parsing với layout analysis
+- **Hỗ trợ formats**: PDF, Word, PPT, Excel, HTML, Markdown, JSON, TXT
+- **Đặc biệt**: OCR và layout analysis cho PDF
+
+### 4. `/agent/` - Agentic Workflow
+- **Vai trò**: Hệ thống workflow agent với visual canvas
+- **Components**: LLM, Retrieval, Categorize, Webhook, Iteration...
+- **Tools**: Tavily, Google, Wikipedia, GitHub, Email...
+
+### 5. `/graphrag/` - Knowledge Graph
+- **Vai trò**: Xây dựng và query knowledge graph
+- **Chức năng**: Entity resolution, graph search, relationship extraction
+
+### 6. `/web/` - Frontend
+- **Framework**: React + TypeScript + UmiJS
+- **UI**: Ant Design + Shadcn/UI + Tailwind CSS
+- **State**: Zustand
+- **Port**: 80/443 (qua Nginx)
+
+### 7. `/common/` - Shared Utilities
+- **Vai trò**: Utilities và connectors dùng chung
+- **Data sources**: Confluence, Notion, Slack, Gmail, SharePoint...
+
+### 8. `/docker/` - Deployment
+- **Services**: MySQL, Elasticsearch/Infinity, Redis, MinIO, Nginx
+- **Modes**: CPU/GPU, single/cluster
+
+## Tóm Tắt Thống Kê
+
+| Thư mục | Số files | Mô tả |
+|---------|----------|-------|
+| api/ | ~100+ | Backend API |
+| rag/ | ~50+ | RAG engine |
+| deepdoc/ | ~30+ | Document parsers |
+| agent/ | ~40+ | Agent system |
+| graphrag/ | ~20+ | Knowledge graph |
+| web/src/ | ~200+ | Frontend |
+| common/ | ~50+ | Shared utilities |
+| test/ | ~80+ | Test suite |
--- a/personal_analyze/02_system_architecture.md
+++ b/personal_analyze/02_system_architecture.md
@ -0,0 +1,567 @@
+# RAGFlow - Kiến Trúc Hệ Thống
+
+## 1. Tổng Quan Kiến Trúc
+
+RAGFlow sử dụng kiến trúc **Microservices** với các thành phần được container hóa bằng Docker. Hệ thống được thiết kế theo mô hình **3-tier architecture** kết hợp với **event-driven architecture** cho xử lý bất đồng bộ.
+
+## 2. Sơ Đồ Kiến Trúc Tổng Quan
+
+```
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                              CLIENT LAYER                                        │
+├─────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                  │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
+│  │   Web App    │  │  Mobile App  │  │  Python SDK  │  │  REST API    │         │
+│  │  (React/TS)  │  │   (Future)   │  │   Client     │  │   Client     │         │
+│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘         │
+│         │                 │                 │                 │                  │
+└─────────┼─────────────────┼─────────────────┼─────────────────┼──────────────────┘
+          │                 │                 │                 │
+          └─────────────────┴────────┬────────┴─────────────────┘
+                                     │
+                                     ▼
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                              GATEWAY LAYER                                       │
+├─────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                  │
+│  ┌─────────────────────────────────────────────────────────────────────────┐    │
+│  │                           NGINX Reverse Proxy                            │    │
+│  │                     (Load Balancing, SSL Termination)                    │    │
+│  │                          Port: 80/443                                    │    │
+│  └─────────────────────────────────────┬───────────────────────────────────┘    │
+│                                        │                                         │
+└────────────────────────────────────────┼─────────────────────────────────────────┘
+                                         │
+          ┌──────────────────────────────┼──────────────────────────────┐
+          │                              │                              │
+          ▼                              ▼                              ▼
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                           APPLICATION LAYER                                      │
+├─────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                  │
+│  ┌───────────────────────┐  ┌───────────────────────┐  ┌───────────────────┐    │
+│  │   RAGFlow Server      │  │   Admin Server        │  │   MCP Server      │    │
+│  │   (Flask/Quart)       │  │   (Flask)             │  │   (MCP Protocol)  │    │
+│  │   Port: 9380          │  │   Port: 9381          │  │   Port: 9382      │    │
+│  │                       │  │                       │  │                   │    │
+│  │  ┌─────────────────┐  │  │  ┌─────────────────┐  │  │  ┌─────────────┐  │    │
+│  │  │  API Blueprints │  │  │  │  Admin APIs     │  │  │  │ MCP Handler │  │    │
+│  │  │  - kb_app       │  │  │  │  - User Mgmt    │  │  │  │ - Tools     │  │    │
+│  │  │  - document_app │  │  │  │  - System Cfg   │  │  │  │ - Resources │  │    │
+│  │  │  - dialog_app   │  │  │  │  - Monitoring   │  │  │  └─────────────┘  │    │
+│  │  │  - canvas_app   │  │  │  └─────────────────┘  │  │                   │    │
+│  │  │  - search_app   │  │  │                       │  │                   │    │
+│  │  │  - file_app     │  │  │                       │  │                   │    │
+│  │  └─────────────────┘  │  │                       │  │                   │    │
+│  └───────────┬───────────┘  └───────────────────────┘  └───────────────────┘    │
+│              │                                                                   │
+└──────────────┼───────────────────────────────────────────────────────────────────┘
+               │
+               ▼
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                            SERVICE LAYER                                         │
+├─────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                  │
+│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                  │
+│  │ Business Logic  │  │  RAG Pipeline   │  │  Agent System   │                  │
+│  │   Services      │  │    Engine       │  │    Engine       │                  │
+│  │                 │  │                 │  │                 │                  │
+│  │ - UserService   │  │ - Parser        │  │ - Canvas        │                  │
+│  │ - DialogService │  │ - Tokenizer     │  │ - Components    │                  │
+│  │ - DocService    │  │ - Splitter      │  │ - Tools         │                  │
+│  │ - KBService     │  │ - Embedder      │  │ - Workflows     │                  │
+│  │ - TaskService   │  │ - Reranker      │  │                 │                  │
+│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘                  │
+│           │                    │                    │                            │
+│           └────────────────────┼────────────────────┘                            │
+│                                │                                                 │
+│  ┌─────────────────────────────┼─────────────────────────────────────────────┐  │
+│  │                    DeepDoc Processing Engine                               │  │
+│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐        │  │
+│  │  │   PDF    │ │   DOCX   │ │   PPT    │ │  Excel   │ │   HTML   │        │  │
+│  │  │  Parser  │ │  Parser  │ │  Parser  │ │  Parser  │ │  Parser  │        │  │
+│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘        │  │
+│  │                                                                           │  │
+│  │  ┌──────────────────────────────────────────────────────────────┐        │  │
+│  │  │              Vision/OCR Processing (Layout Analysis)          │        │  │
+│  │  └──────────────────────────────────────────────────────────────┘        │  │
+│  └───────────────────────────────────────────────────────────────────────────┘  │
+│                                                                                  │
+└──────────────────────────────────────────────────────────────────────────────────┘
+                                         │
+                                         ▼
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                              DATA LAYER                                          │
+├─────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                  │
+│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐                  │
+│  │     MySQL       │  │   Redis/Valkey  │  │     MinIO       │                  │
+│  │   (Primary DB)  │  │    (Cache)      │  │ (Object Store)  │                  │
+│  │   Port: 5455    │  │   Port: 6379    │  │ Port: 9000/9001 │                  │
+│  │                 │  │                 │  │                 │                  │
+│  │ - Users         │  │ - Sessions      │  │ - Documents     │                  │
+│  │ - Tenants       │  │ - Cache         │  │ - Files         │                  │
+│  │ - Knowledgebase │  │ - Rate Limit    │  │ - Chunks        │                  │
+│  │ - Documents     │  │ - Task Queue    │  │ - Images        │                  │
+│  │ - Dialogs       │  │                 │  │                 │                  │
+│  └─────────────────┘  └─────────────────┘  └─────────────────┘                  │
+│                                                                                  │
+│  ┌──────────────────────────────────────────────────────────────────────────┐   │
+│  │                        Vector Database Layer                              │   │
+│  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐           │   │
+│  │  │  Elasticsearch  │  │    Infinity     │  │   OpenSearch    │           │   │
+│  │  │    (Default)    │  │  (Alternative)  │  │  (Alternative)  │           │   │
+│  │  │                 │  │                 │  │                 │           │   │
+│  │  │ - Vector Search │  │ - Hybrid Search │  │ - Vector Search │           │   │
+│  │  │ - Full-text     │  │ - Full-text     │  │ - Full-text     │           │   │
+│  │  │ - BM25          │  │ - BM25          │  │ - BM25          │           │   │
+│  │  └─────────────────┘  └─────────────────┘  └─────────────────┘           │   │
+│  └──────────────────────────────────────────────────────────────────────────┘   │
+│                                                                                  │
+└──────────────────────────────────────────────────────────────────────────────────┘
+                                         │
+                                         ▼
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                            EXTERNAL SERVICES                                     │
+├─────────────────────────────────────────────────────────────────────────────────┤
+│                                                                                  │
+│  ┌─────────────────────────────────────────────────────────────────────────┐    │
+│  │                          LLM Providers                                   │    │
+│  │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐     │    │
+│  │  │ OpenAI │ │ Claude │ │ Gemini │ │  Qwen  │ │  Groq  │ │ Ollama │     │    │
+│  │  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘     │    │
+│  └─────────────────────────────────────────────────────────────────────────┘    │
+│                                                                                  │
+│  ┌─────────────────────────────────────────────────────────────────────────┐    │
+│  │                       Data Source Connectors                             │    │
+│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐      │    │
+│  │  │Confluence│ │  Notion  │ │  Slack   │ │  Gmail   │ │SharePoint│      │    │
+│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘      │    │
+│  └─────────────────────────────────────────────────────────────────────────┘    │
+│                                                                                  │
+│  ┌─────────────────────────────────────────────────────────────────────────┐    │
+│  │                        Agent Tools & APIs                                │    │
+│  │  ┌────────┐ ┌─────────┐ ┌────────┐ ┌────────┐ ┌─────────┐ ┌────────┐   │    │
+│  │  │ Tavily │ │ Google  │ │ ArXiv  │ │ GitHub │ │Wikipedia│ │ Weather│   │    │
+│  │  └────────┘ └─────────┘ └────────┘ └────────┘ └─────────┘ └────────┘   │    │
+│  └─────────────────────────────────────────────────────────────────────────┘    │
+│                                                                                  │
+└──────────────────────────────────────────────────────────────────────────────────┘
+```
+
+## 3. Kiến Trúc Chi Tiết Các Thành Phần
+
+### 3.1 API Server Architecture
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│                     RAGFlow API Server                        │
+│                    (ragflow_server.py)                        │
+├──────────────────────────────────────────────────────────────┤
+│                                                               │
+│  ┌─────────────────────────────────────────────────────────┐ │
+│  │                  Flask/Quart Application                 │ │
+│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐  │ │
+│  │  │    CORS     │  │   Session   │  │   JWT Auth      │  │ │
+│  │  │  Middleware │  │  Middleware │  │   Middleware    │  │ │
+│  │  └─────────────┘  └─────────────┘  └─────────────────┘  │ │
+│  └─────────────────────────────────────────────────────────┘ │
+│                              │                                │
+│  ┌───────────────────────────┼───────────────────────────────┐
+│  │                    API Blueprints                         │
+│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
+│  │  │  kb_app  │ │ doc_app  │ │dialog_app│ │canvas_app│    │
+│  │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘    │
+│  │       │            │            │            │           │
+│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
+│  │  │file_app  │ │search_app│ │ llm_app  │ │ user_app │    │
+│  │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘    │
+│  └───────┼────────────┼────────────┼────────────┼───────────┘
+│          │            │            │            │            │
+│  ┌───────┴────────────┴────────────┴────────────┴───────────┐
+│  │                    Service Layer                          │
+│  │  ┌────────────────┐  ┌────────────────┐                  │
+│  │  │ UserService    │  │ DialogService  │                  │
+│  │  │ - register()   │  │ - chat()       │                  │
+│  │  │ - login()      │  │ - stream()     │                  │
+│  │  │ - get_user()   │  │ - completion() │                  │
+│  │  └────────────────┘  └────────────────┘                  │
+│  │                                                          │
+│  │  ┌────────────────┐  ┌────────────────┐                  │
+│  │  │ DocumentService│  │ KBService      │                  │
+│  │  │ - upload()     │  │ - create()     │                  │
+│  │  │ - parse()      │  │ - list()       │                  │
+│  │  │ - chunk()      │  │ - delete()     │                  │
+│  │  └────────────────┘  └────────────────┘                  │
+│  └──────────────────────────────────────────────────────────┘
+│                              │                                │
+│  ┌───────────────────────────┴───────────────────────────────┐
+│  │                    Database Layer (Peewee ORM)            │
+│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐        │
+│  │  │  User   │ │ Tenant  │ │Document │ │  Dialog │        │
+│  │  │  Model  │ │  Model  │ │  Model  │ │  Model  │        │
+│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘        │
+│  └──────────────────────────────────────────────────────────┘
+│                                                               │
+└──────────────────────────────────────────────────────────────┘
+```
+
+### 3.2 RAG Pipeline Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│                         RAG Processing Pipeline                           │
+├──────────────────────────────────────────────────────────────────────────┤
+│                                                                           │
+│  ┌─────────────────────────────────────────────────────────────────────┐ │
+│  │                         INGESTION PIPELINE                          │ │
+│  │                                                                     │ │
+│  │   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    │ │
+│  │   │  File    │───▶│  Parser  │───▶│Tokenizer │───▶│ Splitter │    │ │
+│  │   │  Upload  │    │          │    │          │    │ (Chunker)│    │ │
+│  │   └──────────┘    └──────────┘    └──────────┘    └────┬─────┘    │ │
+│  │                                                        │          │ │
+│  │   ┌──────────────────────────────────────────────────┘          │ │
+│  │   │                                                               │ │
+│  │   ▼                                                               │ │
+│  │   ┌──────────┐    ┌──────────┐    ┌──────────┐                   │ │
+│  │   │ Embedding│───▶│  Index   │───▶│  Store   │                   │ │
+│  │   │  Model   │    │ Creation │    │ (ES/Inf) │                   │ │
+│  │   └──────────┘    └──────────┘    └──────────┘                   │ │
+│  │                                                                   │ │
+│  └───────────────────────────────────────────────────────────────────┘ │
+│                                                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐ │
+│  │                         RETRIEVAL PIPELINE                          │ │
+│  │                                                                     │ │
+│  │   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    │ │
+│  │   │  Query   │───▶│  Query   │───▶│ Embedding│───▶│  Hybrid  │    │ │
+│  │   │  Input   │    │ Analysis │    │  Query   │    │  Search  │    │ │
+│  │   └──────────┘    └──────────┘    └──────────┘    └────┬─────┘    │ │
+│  │                                                        │          │ │
+│  │   ┌──────────────────────────────────────────────────┘          │ │
+│  │   │                                                               │ │
+│  │   ▼                                                               │ │
+│  │   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐   │ │
+│  │   │ Candidate│───▶│ Reranker │───▶│  Context │───▶│   LLM    │   │ │
+│  │   │  Chunks  │    │          │    │ Building │    │ Response │   │ │
+│  │   └──────────┘    └──────────┘    └──────────┘    └──────────┘   │ │
+│  │                                                                   │ │
+│  └───────────────────────────────────────────────────────────────────┘ │
+│                                                                         │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+### 3.3 Agent Workflow Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│                        Agent Canvas Architecture                          │
+├──────────────────────────────────────────────────────────────────────────┤
+│                                                                           │
+│  ┌─────────────────────────────────────────────────────────────────────┐ │
+│  │                         Canvas Orchestrator                         │ │
+│  │                          (canvas.py)                                │ │
+│  └──────────────────────────────┬──────────────────────────────────────┘ │
+│                                 │                                         │
+│     ┌───────────────────────────┼───────────────────────────┐            │
+│     │                           │                           │            │
+│     ▼                           ▼                           ▼            │
+│  ┌─────────┐              ┌─────────┐              ┌─────────┐          │
+│  │  BEGIN  │─────────────▶│   LLM   │─────────────▶│RETRIEVAL│          │
+│  │Component│              │Component│              │Component│          │
+│  └─────────┘              └─────────┘              └─────────┘          │
+│       │                        │                        │                │
+│       │    ┌───────────────────┼───────────────────────┘                │
+│       │    │                   │                                         │
+│       ▼    ▼                   ▼                                         │
+│  ┌─────────────┐         ┌─────────┐         ┌─────────────┐            │
+│  │ CATEGORIZE  │         │ MESSAGE │         │  WEBHOOK    │            │
+│  │  Component  │         │Component│         │  Component  │            │
+│  └─────────────┘         └─────────┘         └─────────────┘            │
+│       │                        │                    │                    │
+│       └────────────────────────┼────────────────────┘                    │
+│                                │                                         │
+│                                ▼                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐ │
+│  │                         TOOLS INTEGRATION                           │ │
+│  │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐            │ │
+│  │  │ Tavily │ │ ArXiv  │ │ GitHub │ │ Email  │ │Code    │            │ │
+│  │  │ Search │ │ Search │ │  API   │ │ Send   │ │Executor│            │ │
+│  │  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘            │ │
+│  └─────────────────────────────────────────────────────────────────────┘ │
+│                                                                           │
+└───────────────────────────────────────────────────────────────────────────┘
+```
+
+## 4. Data Flow Architecture
+
+### 4.1 Document Ingestion Flow
+
+```
+┌────────────────────────────────────────────────────────────────────────────┐
+│                         Document Ingestion Flow                             │
+├────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  User Upload                                                                │
+│      │                                                                      │
+│      ▼                                                                      │
+│  ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐          │
+│  │  API     │────▶│  File    │────▶│  MinIO   │────▶│  Task    │          │
+│  │ Endpoint │     │ Service  │     │  Storage │     │  Queue   │          │
+│  └──────────┘     └──────────┘     └──────────┘     └────┬─────┘          │
+│                                                          │                 │
+│                        ┌────────────────────────────────┘                 │
+│                        │                                                   │
+│                        ▼                                                   │
+│  ┌──────────────────────────────────────────────────────────────────┐     │
+│  │                    Background Task Processor                      │     │
+│  │                                                                   │     │
+│  │   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐  │     │
+│  │   │  Parser  │───▶│Extractor │───▶│ Chunker  │───▶│ Embedder │  │     │
+│  │   │          │    │          │    │          │    │          │  │     │
+│  │   │ - PDF    │    │ - Text   │    │ - Token  │    │ - OpenAI │  │     │
+│  │   │ - DOCX   │    │ - Table  │    │ - Sent   │    │ - BGE    │  │     │
+│  │   │ - HTML   │    │ - Image  │    │ - Page   │    │ - Cohere │  │     │
+│  │   └──────────┘    └──────────┘    └──────────┘    └────┬─────┘  │     │
+│  │                                                        │        │     │
+│  └────────────────────────────────────────────────────────┼────────┘     │
+│                                                           │               │
+│                        ┌─────────────────────────────────┘               │
+│                        │                                                  │
+│                        ▼                                                  │
+│  ┌──────────────────────────────────────────────────────────────────┐    │
+│  │                      Storage Layer                                │    │
+│  │   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐       │    │
+│  │   │    MySQL     │    │ Elasticsearch│    │    MinIO     │       │    │
+│  │   │  (Metadata)  │    │  (Vectors)   │    │   (Files)    │       │    │
+│  │   └──────────────┘    └──────────────┘    └──────────────┘       │    │
+│  └──────────────────────────────────────────────────────────────────┘    │
+│                                                                           │
+└───────────────────────────────────────────────────────────────────────────┘
+```
+
+### 4.2 Query Processing Flow
+
+```
+┌────────────────────────────────────────────────────────────────────────────┐
+│                          Query Processing Flow                              │
+├────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  User Query: "What is the revenue for Q3 2024?"                            │
+│      │                                                                      │
+│      ▼                                                                      │
+│  ┌──────────────────────────────────────────────────────────────────────┐  │
+│  │  1. QUERY UNDERSTANDING                                               │  │
+│  │     ┌──────────────┐                                                  │  │
+│  │     │ Query Parser │──▶ Extract: entities, intent, keywords          │  │
+│  │     └──────────────┘                                                  │  │
+│  └──────────────────────────────────────────────────────────────────────┘  │
+│      │                                                                      │
+│      ▼                                                                      │
+│  ┌──────────────────────────────────────────────────────────────────────┐  │
+│  │  2. RETRIEVAL                                                         │  │
+│  │     ┌────────────┐    ┌────────────┐    ┌────────────┐               │  │
+│  │     │  Embedding │───▶│  Hybrid    │───▶│  Candidate │               │  │
+│  │     │   Query    │    │  Search    │    │   Chunks   │               │  │
+│  │     └────────────┘    │            │    │  (Top 100) │               │  │
+│  │                       │ Vector+BM25│    └────────────┘               │  │
+│  │                       └────────────┘                                  │  │
+│  └──────────────────────────────────────────────────────────────────────┘  │
+│      │                                                                      │
+│      ▼                                                                      │
+│  ┌──────────────────────────────────────────────────────────────────────┐  │
+│  │  3. RERANKING                                                         │  │
+│  │     ┌────────────┐    ┌────────────┐                                 │  │
+│  │     │  Reranker  │───▶│  Top-K     │                                 │  │
+│  │     │   Model    │    │  Chunks    │                                 │  │
+│  │     │            │    │  (Top 5)   │                                 │  │
+│  │     └────────────┘    └────────────┘                                 │  │
+│  └──────────────────────────────────────────────────────────────────────┘  │
+│      │                                                                      │
+│      ▼                                                                      │
+│  ┌──────────────────────────────────────────────────────────────────────┐  │
+│  │  4. GENERATION                                                        │  │
+│  │     ┌────────────┐    ┌────────────┐    ┌────────────┐               │  │
+│  │     │  Prompt    │───▶│    LLM     │───▶│  Response  │               │  │
+│  │     │  Builder   │    │  (GPT-4)   │    │ + Sources  │               │  │
+│  │     └────────────┘    └────────────┘    └────────────┘               │  │
+│  └──────────────────────────────────────────────────────────────────────┘  │
+│      │                                                                      │
+│      ▼                                                                      │
+│  Response: "The revenue for Q3 2024 was $X million... [source: doc.pdf]"   │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+## 5. Deployment Architecture
+
+### 5.1 Docker Compose Deployment
+
+```
+┌──────────────────────────────────────────────────────────────────────────────┐
+│                        Docker Compose Deployment                              │
+├──────────────────────────────────────────────────────────────────────────────┤
+│                                                                               │
+│  ┌─────────────────────────────────────────────────────────────────────────┐ │
+│  │                           Docker Network                                 │ │
+│  │                          (ragflow-network)                               │ │
+│  └─────────────────────────────────────────────────────────────────────────┘ │
+│                                      │                                        │
+│      ┌───────────────────────────────┼───────────────────────────────┐       │
+│      │                               │                               │       │
+│      ▼                               ▼                               ▼       │
+│  ┌──────────┐                 ┌──────────┐                    ┌──────────┐   │
+│  │  nginx   │ ◀──────────────▶│ ragflow- │◀──────────────────▶│ ragflow- │   │
+│  │  :80/443 │                 │  server  │                    │  admin   │   │
+│  └──────────┘                 │  :9380   │                    │  :9381   │   │
+│       │                       └────┬─────┘                    └──────────┘   │
+│       │                            │                                          │
+│       │         ┌──────────────────┼──────────────────────┐                  │
+│       │         │                  │                      │                  │
+│       │         ▼                  ▼                      ▼                  │
+│       │    ┌──────────┐      ┌──────────┐          ┌──────────┐             │
+│       │    │  mysql   │      │  redis   │          │elasticsearch│            │
+│       │    │  :5455   │      │  :6379   │          │  :9200     │            │
+│       │    └──────────┘      └──────────┘          └──────────┘             │
+│       │                                                                      │
+│       │    ┌──────────┐      ┌──────────┐          ┌──────────┐             │
+│       │    │  minio   │      │ sandbox  │          │   tei    │             │
+│       │    │:9000/9001│      │  :9385   │          │  :6380   │             │
+│       │    └──────────┘      └──────────┘          └──────────┘             │
+│       │                                                                      │
+│  ┌────┴─────────────────────────────────────────────────────────────────┐   │
+│  │                        Volumes                                        │   │
+│  │  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐        │   │
+│  │  │mysql_data  │ │ es_data    │ │minio_data  │ │ redis_data │        │   │
+│  │  └────────────┘ └────────────┘ └────────────┘ └────────────┘        │   │
+│  └──────────────────────────────────────────────────────────────────────┘   │
+│                                                                               │
+└───────────────────────────────────────────────────────────────────────────────┘
+```
+
+## 6. Security Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────────────────┐
+│                           Security Architecture                               │
+├──────────────────────────────────────────────────────────────────────────────┤
+│                                                                               │
+│  ┌─────────────────────────────────────────────────────────────────────────┐ │
+│  │                      Authentication Layer                                │ │
+│  │                                                                         │ │
+│  │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                │ │
+│  │   │    JWT      │    │   OAuth     │    │    API      │                │ │
+│  │   │   Tokens    │    │  (GitHub,   │    │   Tokens    │                │ │
+│  │   │             │    │   OIDC)     │    │             │                │ │
+│  │   └─────────────┘    └─────────────┘    └─────────────┘                │ │
+│  │                                                                         │ │
+│  └─────────────────────────────────────────────────────────────────────────┘ │
+│                                      │                                        │
+│                                      ▼                                        │
+│  ┌─────────────────────────────────────────────────────────────────────────┐ │
+│  │                      Authorization Layer                                 │ │
+│  │                                                                         │ │
+│  │   ┌─────────────────────────────────────────────────────────────────┐  │ │
+│  │   │                   Multi-Tenancy Model                            │  │ │
+│  │   │                                                                  │  │ │
+│  │   │   Tenant A          Tenant B          Tenant C                   │  │ │
+│  │   │   ┌──────┐          ┌──────┐          ┌──────┐                  │  │ │
+│  │   │   │Users │          │Users │          │Users │                  │  │ │
+│  │   │   │KBs   │          │KBs   │          │KBs   │                  │  │ │
+│  │   │   │Docs  │          │Docs  │          │Docs  │                  │  │ │
+│  │   │   └──────┘          └──────┘          └──────┘                  │  │ │
+│  │   │                                                                  │  │ │
+│  │   └─────────────────────────────────────────────────────────────────┘  │ │
+│  │                                                                         │ │
+│  │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                │ │
+│  │   │  Role-Based │    │  Team       │    │  Resource   │                │ │
+│  │   │  Access     │    │  Permissions│    │  Ownership  │                │ │
+│  │   │  Control    │    │             │    │             │                │ │
+│  │   └─────────────┘    └─────────────┘    └─────────────┘                │ │
+│  │                                                                         │ │
+│  └─────────────────────────────────────────────────────────────────────────┘ │
+│                                      │                                        │
+│                                      ▼                                        │
+│  ┌─────────────────────────────────────────────────────────────────────────┐ │
+│  │                        Encryption Layer                                  │ │
+│  │                                                                         │ │
+│  │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                │ │
+│  │   │   RSA       │    │   HTTPS     │    │  Password   │                │ │
+│  │   │ Key Pair    │    │   (TLS)     │    │   Bcrypt    │                │ │
+│  │   │ (conf/*.pem)│    │             │    │             │                │ │
+│  │   └─────────────┘    └─────────────┘    └─────────────┘                │ │
+│  │                                                                         │ │
+│  └─────────────────────────────────────────────────────────────────────────┘ │
+│                                                                               │
+└───────────────────────────────────────────────────────────────────────────────┘
+```
+
+## 7. Scalability Architecture
+
+```
+┌──────────────────────────────────────────────────────────────────────────────┐
+│                         Scalability Architecture                              │
+├──────────────────────────────────────────────────────────────────────────────┤
+│                                                                               │
+│  ┌─────────────────────────────────────────────────────────────────────────┐ │
+│  │                    Horizontal Scaling                                    │ │
+│  │                                                                         │ │
+│  │                      Load Balancer (Nginx)                              │ │
+│  │                             │                                           │ │
+│  │          ┌──────────────────┼──────────────────┐                       │ │
+│  │          │                  │                  │                       │ │
+│  │          ▼                  ▼                  ▼                       │ │
+│  │   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐              │ │
+│  │   │  Server #1   │   │  Server #2   │   │  Server #N   │              │ │
+│  │   │  (Instance)  │   │  (Instance)  │   │  (Instance)  │              │ │
+│  │   └──────────────┘   └──────────────┘   └──────────────┘              │ │
+│  │                                                                         │ │
+│  └─────────────────────────────────────────────────────────────────────────┘ │
+│                                                                               │
+│  ┌─────────────────────────────────────────────────────────────────────────┐ │
+│  │                    Database Scaling                                      │ │
+│  │                                                                         │ │
+│  │   MySQL:              Elasticsearch:          Redis:                    │ │
+│  │   - Read Replicas     - Cluster Mode          - Sentinel               │ │
+│  │   - Connection Pool   - Sharding              - Cluster Mode           │ │
+│  │                       - Index Partitioning                              │ │
+│  │                                                                         │ │
+│  └─────────────────────────────────────────────────────────────────────────┘ │
+│                                                                               │
+│  ┌─────────────────────────────────────────────────────────────────────────┐ │
+│  │                    Async Processing                                      │ │
+│  │                                                                         │ │
+│  │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                │ │
+│  │   │   Task      │───▶│   Redis     │───▶│   Worker    │                │ │
+│  │   │   Producer  │    │   Queue     │    │   Consumer  │                │ │
+│  │   └─────────────┘    └─────────────┘    └─────────────┘                │ │
+│  │                                                                         │ │
+│  │   Tasks: Document parsing, Embedding, Indexing                          │ │
+│  │                                                                         │ │
+│  └─────────────────────────────────────────────────────────────────────────┘ │
+│                                                                               │
+└───────────────────────────────────────────────────────────────────────────────┘
+```
+
+## 8. Tóm Tắt Kiến Trúc
+
+| Layer | Components | Technology |
+|-------|------------|------------|
+| Client | Web App, SDK, API | React, Python SDK, REST |
+| Gateway | Reverse Proxy | Nginx |
+| Application | API Server, Admin | Flask/Quart |
+| Service | Business Logic | Python Services |
+| Processing | RAG, DeepDoc, Agent | Python, ML Models |
+| Data | Storage, Cache, Vector | MySQL, Redis, ES, MinIO |
+| External | LLM, Connectors, Tools | OpenAI, Claude, APIs |
+
+### Đặc Điểm Nổi Bật
+
+1. **Microservices**: Các service độc lập, dễ scale
+2. **Event-Driven**: Xử lý async cho document processing
+3. **Multi-Tenant**: Hỗ trợ nhiều tenants với data isolation
+4. **Hybrid Search**: Kết hợp vector search và full-text search
+5. **Pluggable**: Hỗ trợ multiple LLM providers và vector stores
+6. **Containerized**: Full Docker deployment với orchestration
--- a/personal_analyze/03_sequence_diagrams.md
+++ b/personal_analyze/03_sequence_diagrams.md
@ -0,0 +1,700 @@
+# RAGFlow - Sequence Diagrams
+
+Tài liệu này mô tả các luồng xử lý chính trong hệ thống RAGFlow thông qua sequence diagrams.
+
+## 1. User Authentication Flow
+
+### 1.1 User Registration
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant DB as MySQL
+    participant R as Redis
+
+    U->>W: Click Register
+    W->>W: Show registration form
+    U->>W: Enter email, password, nickname
+    W->>A: POST /api/v1/user/register
+
+    A->>A: Validate input data
+    A->>DB: Check if email exists
+
+    alt Email exists
+        DB-->>A: User found
+        A-->>W: 400 - Email already registered
+        W-->>U: Show error message
+    else Email not exists
+        DB-->>A: No user found
+        A->>A: Hash password (bcrypt)
+        A->>A: Generate user ID
+        A->>DB: INSERT User
+        A->>DB: CREATE Tenant for user
+        A->>DB: CREATE UserTenant association
+        DB-->>A: Success
+        A->>A: Generate JWT token
+        A->>R: Store session
+        A-->>W: 200 - Registration success + token
+        W->>W: Store token in localStorage
+        W-->>U: Redirect to dashboard
+    end
+```
+
+### 1.2 User Login
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant DB as MySQL
+    participant R as Redis
+
+    U->>W: Enter email/password
+    W->>A: POST /api/v1/user/login
+
+    A->>DB: SELECT User WHERE email
+
+    alt User not found
+        DB-->>A: No user
+        A-->>W: 401 - Invalid credentials
+        W-->>U: Show error
+    else User found
+        DB-->>A: User record
+        A->>A: Verify password (bcrypt)
+
+        alt Password invalid
+            A-->>W: 401 - Invalid credentials
+            W-->>U: Show error
+        else Password valid
+            A->>A: Generate JWT (access_token)
+            A->>A: Generate refresh_token
+            A->>R: Store session data
+            A->>DB: Update last_login_time
+            A-->>W: 200 - Login success
+            Note over A,W: Response: {access_token, refresh_token, user_info}
+            W->>W: Store tokens
+            W-->>U: Redirect to dashboard
+        end
+    end
+```
+
+## 2. Knowledge Base Management
+
+### 2.1 Create Knowledge Base
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant DB as MySQL
+    participant ES as Elasticsearch
+
+    U->>W: Click "Create Knowledge Base"
+    W->>W: Show KB creation modal
+    U->>W: Enter name, description, settings
+    W->>A: POST /api/v1/kb/create
+    Note over W,A: Headers: Authorization: Bearer {token}
+
+    A->>A: Validate JWT token
+    A->>A: Extract tenant_id from token
+    A->>DB: Check KB name uniqueness in tenant
+
+    alt Name exists
+        A-->>W: 400 - Name already exists
+        W-->>U: Show error
+    else Name unique
+        A->>A: Generate KB ID
+        A->>DB: INSERT Knowledgebase
+        Note over A,DB: {id, name, tenant_id, embd_id, parser_id, ...}
+
+        A->>ES: CREATE Index for KB
+        Note over A,ES: Index: ragflow_{kb_id}
+        ES-->>A: Index created
+
+        DB-->>A: KB record saved
+        A-->>W: 200 - KB created
+        Note over A,W: {kb_id, name, created_at}
+        W-->>U: Show success, refresh KB list
+    end
+```
+
+### 2.2 List Knowledge Bases
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant DB as MySQL
+
+    U->>W: Open Knowledge Base page
+    W->>A: GET /api/v1/kb/list?page=1&size=10
+
+    A->>A: Validate JWT, extract tenant_id
+    A->>DB: SELECT * FROM knowledgebase WHERE tenant_id
+    A->>DB: COUNT total KBs
+
+    DB-->>A: KB list + count
+
+    loop For each KB
+        A->>DB: COUNT documents in KB
+        A->>DB: SUM chunk_num for KB
+    end
+
+    A->>A: Build response with stats
+    A-->>W: 200 - KB list with pagination
+    Note over A,W: {data: [...], total, page, size}
+
+    W->>W: Render KB cards
+    W-->>U: Display knowledge bases
+```
+
+## 3. Document Upload & Processing
+
+### 3.1 Document Upload Flow
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant M as MinIO
+    participant DB as MySQL
+    participant Q as Task Queue (Redis)
+
+    U->>W: Select files to upload
+    W->>W: Validate file types/sizes
+
+    loop For each file
+        W->>A: POST /api/v1/document/upload
+        Note over W,A: multipart/form-data: file, kb_id
+
+        A->>A: Validate file type
+        A->>A: Generate file_id, doc_id
+
+        A->>M: Upload file to bucket
+        Note over A,M: Bucket: ragflow, Key: {tenant_id}/{kb_id}/{file_id}
+        M-->>A: Upload success, file_key
+
+        A->>DB: INSERT File record
+        Note over A,DB: {id, name, size, location, tenant_id}
+
+        A->>DB: INSERT Document record
+        Note over A,DB: {id, kb_id, name, status: 'UNSTART'}
+
+        A->>Q: PUSH parsing task
+        Note over A,Q: {doc_id, file_location, parser_config}
+
+        A-->>W: 200 - Upload success
+        Note over A,W: {doc_id, file_id, status}
+    end
+
+    W-->>U: Show upload progress/success
+```
+
+### 3.2 Document Parsing Flow (Background Task)
+
+```mermaid
+sequenceDiagram
+    participant Q as Task Queue
+    participant W as Worker
+    participant M as MinIO
+    participant P as Parser (DeepDoc)
+    participant E as Embedding Model
+    participant ES as Elasticsearch
+    participant DB as MySQL
+
+    Q->>W: POP task from queue
+    W->>DB: UPDATE doc status = 'RUNNING'
+
+    W->>M: Download file
+    M-->>W: File content
+
+    W->>P: Parse document
+    Note over W,P: Based on file type (PDF, DOCX, etc.)
+
+    P->>P: Extract text content
+    P->>P: Extract tables
+    P->>P: Extract images (if any)
+    P->>P: Layout analysis (for PDF)
+    P-->>W: Parsed content
+
+    W->>W: Apply chunking strategy
+    Note over W: Token-based, sentence-based, or page-based
+
+    W->>W: Generate chunks
+
+    loop For each chunk batch
+        W->>E: Generate embeddings
+        Note over W,E: batch_size typically 32
+        E-->>W: Vector embeddings [1536 dim]
+
+        W->>ES: Bulk index chunks
+        Note over W,ES: {chunk_id, content, embedding, doc_id, kb_id}
+        ES-->>W: Index success
+
+        W->>DB: INSERT Chunk records
+    end
+
+    W->>DB: UPDATE Document
+    Note over W,DB: status='FINISHED', chunk_num, token_num
+
+    W->>DB: UPDATE Task status = 'SUCCESS'
+```
+
+## 4. Chat/Dialog Flow
+
+### 4.1 Create Chat Session
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant DB as MySQL
+
+    U->>W: Click "New Chat"
+    W->>A: POST /api/v1/dialog/create
+    Note over W,A: {name, kb_ids[], llm_id, prompt_config}
+
+    A->>A: Validate KB access
+    A->>DB: INSERT Dialog record
+    Note over A,DB: {id, name, tenant_id, kb_ids, llm_id, ...}
+
+    DB-->>A: Dialog created
+    A-->>W: 200 - Dialog created
+    Note over A,W: {dialog_id, name, created_at}
+
+    W-->>U: Open chat interface
+```
+
+### 4.2 Chat Message Flow (RAG)
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant ES as Elasticsearch
+    participant RR as Reranker
+    participant LLM as LLM Provider
+    participant DB as MySQL
+
+    U->>W: Type question
+    W->>A: POST /api/v1/dialog/chat (SSE)
+    Note over W,A: {dialog_id, conversation_id, question}
+
+    A->>DB: Load dialog config
+    Note over A,DB: Get kb_ids, llm_config, prompt
+
+    A->>DB: Load conversation history
+
+    rect rgb(200, 220, 240)
+        Note over A,ES: RETRIEVAL PHASE
+        A->>A: Query understanding
+        A->>A: Generate query embedding
+
+        A->>ES: Hybrid search
+        Note over A,ES: Vector similarity + BM25 full-text
+        ES-->>A: Top 100 candidates
+
+        A->>RR: Rerank candidates
+        Note over A,RR: Cross-encoder scoring
+        RR-->>A: Top K chunks (typically 5-10)
+    end
+
+    rect rgb(220, 240, 200)
+        Note over A,LLM: GENERATION PHASE
+        A->>A: Build prompt with context
+        Note over A: System prompt + Retrieved chunks + Question
+
+        A->>LLM: Stream completion request
+
+        loop Streaming response
+            LLM-->>A: Token chunk
+            A-->>W: SSE: data chunk
+            W-->>U: Display token
+        end
+
+        LLM-->>A: [DONE]
+    end
+
+    A->>DB: Save conversation message
+    Note over A,DB: {role, content, doc_ids[], conversation_id}
+
+    A-->>W: SSE: [DONE] + sources
+    W-->>U: Show sources/citations
+```
+
+### 4.3 Streaming Response Detail
+
+```mermaid
+sequenceDiagram
+    participant W as Web Frontend
+    participant A as API Server
+    participant LLM as LLM Provider
+
+    W->>A: POST /api/v1/dialog/chat
+    Note over W,A: Accept: text/event-stream
+
+    A->>A: Process retrieval...
+
+    A->>LLM: POST /v1/chat/completions
+    Note over A,LLM: stream: true
+
+    loop Until complete
+        LLM-->>A: data: {"choices":[{"delta":{"content":"..."}}]}
+        A->>A: Extract content
+        A-->>W: data: {"answer": "...", "reference": {...}}
+        W->>W: Append to display
+    end
+
+    LLM-->>A: data: [DONE]
+    A-->>W: data: {"answer": "", "reference": {...}, "done": true}
+    W->>W: Show final state
+```
+
+## 5. Agent Workflow Execution
+
+### 5.1 Canvas Workflow Execution
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant C as Canvas Engine
+    participant Comp as Components
+    participant LLM as LLM Provider
+    participant Tools as External Tools
+
+    U->>W: Run workflow
+    W->>A: POST /api/v1/canvas/run
+    Note over W,A: {canvas_id, input_data}
+
+    A->>C: Initialize canvas execution
+    C->>C: Parse workflow DSL
+    C->>C: Build execution graph
+
+    rect rgb(240, 220, 200)
+        Note over C,Comp: BEGIN Component
+        C->>Comp: Execute BEGIN
+        Comp->>Comp: Initialize variables
+        Comp-->>C: {user_input: "..."}
+    end
+
+    rect rgb(200, 220, 240)
+        Note over C,Comp: RETRIEVAL Component
+        C->>Comp: Execute RETRIEVAL
+        Comp->>A: Search knowledge bases
+        A-->>Comp: Retrieved chunks
+        Comp-->>C: {context: [...]}
+    end
+
+    rect rgb(220, 240, 200)
+        Note over C,LLM: LLM Component
+        C->>Comp: Execute LLM
+        Comp->>Comp: Build prompt with variables
+        Comp->>LLM: Chat completion
+        LLM-->>Comp: Response
+        Comp-->>C: {llm_output: "..."}
+    end
+
+    rect rgb(240, 240, 200)
+        Note over C,Tools: TOOL Component (optional)
+        C->>Comp: Execute TOOL (e.g., Tavily)
+        Comp->>Tools: API call
+        Tools-->>Comp: Tool result
+        Comp-->>C: {tool_output: {...}}
+    end
+
+    rect rgb(220, 220, 240)
+        Note over C,Comp: CATEGORIZE Component
+        C->>Comp: Execute CATEGORIZE
+        Comp->>Comp: Evaluate conditions
+        Comp-->>C: {next_node: "node_id"}
+    end
+
+    C->>C: Continue to next component...
+
+    C-->>A: Workflow complete
+    A-->>W: SSE: Final output
+    W-->>U: Display result
+```
+
+### 5.2 Agent with Tools Flow
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant A as Agent Engine
+    participant LLM as LLM Provider
+    participant T1 as Tavily Search
+    participant T2 as Wikipedia
+    participant T3 as Code Executor
+
+    U->>A: Question requiring tools
+
+    A->>LLM: Initial prompt + available tools
+    Note over A,LLM: Tools: [tavily_search, wikipedia, code_exec]
+
+    loop ReAct Loop
+        LLM-->>A: Thought + Action
+        Note over LLM,A: Action: {"tool": "tavily_search", "input": "..."}
+
+        alt Tool: tavily_search
+            A->>T1: Search query
+            T1-->>A: Search results
+        else Tool: wikipedia
+            A->>T2: Page lookup
+            T2-->>A: Wikipedia content
+        else Tool: code_exec
+            A->>T3: Execute code
+            T3-->>A: Execution result
+        end
+
+        A->>LLM: Observation from tool
+
+        alt LLM decides more tools needed
+            LLM-->>A: Another Action
+        else LLM ready to answer
+            LLM-->>A: Final Answer
+        end
+    end
+
+    A-->>U: Final response with sources
+```
+
+## 6. GraphRAG Flow
+
+### 6.1 Knowledge Graph Construction
+
+```mermaid
+sequenceDiagram
+    participant D as Document
+    participant E as Entity Extractor
+    participant LLM as LLM Provider
+    participant ER as Entity Resolution
+    participant G as Graph Store
+
+    D->>E: Document chunks
+
+    loop For each chunk
+        E->>LLM: Extract entities prompt
+        Note over E,LLM: "Extract entities and relationships..."
+        LLM-->>E: Entities + Relations
+        Note over LLM,E: [{entity, type, properties}, {src, rel, dst}]
+    end
+
+    E->>ER: All extracted entities
+
+    ER->>ER: Cluster similar entities
+    ER->>LLM: Entity resolution prompt
+    Note over ER,LLM: "Are these the same entity?"
+    LLM-->>ER: Resolution decisions
+
+    ER->>ER: Merge duplicate entities
+    ER-->>G: Resolved entities + relations
+
+    G->>G: Build graph structure
+    G->>G: Create entity embeddings
+    G->>G: Index for search
+```
+
+### 6.2 GraphRAG Query Flow
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant Q as Query Analyzer
+    participant G as Graph Store
+    participant V as Vector Search
+    participant LLM as LLM Provider
+
+    U->>Q: Natural language query
+
+    Q->>LLM: Analyze query
+    Note over Q,LLM: Extract entities, intent, constraints
+    LLM-->>Q: Query analysis
+
+    par Graph Search
+        Q->>G: Find related entities
+        G->>G: Traverse relationships
+        G-->>Q: Subgraph context
+    and Vector Search
+        Q->>V: Semantic search
+        V-->>Q: Relevant chunks
+    end
+
+    Q->>Q: Merge graph + vector results
+    Q->>Q: Build unified context
+
+    Q->>LLM: Generate with context
+    Note over Q,LLM: Context includes entity relations
+    LLM-->>Q: Response with graph insights
+
+    Q-->>U: Answer + entity graph visualization
+```
+
+## 7. File Operations
+
+### 7.1 File Download Flow
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant W as Web Frontend
+    participant A as API Server
+    participant M as MinIO
+    participant DB as MySQL
+
+    U->>W: Click download
+    W->>A: GET /api/v1/file/download/{file_id}
+
+    A->>A: Validate JWT
+    A->>DB: Get file record
+    A->>A: Check user permission
+
+    alt No permission
+        A-->>W: 403 Forbidden
+    else Has permission
+        A->>M: Get file from storage
+        M-->>A: File stream
+        A-->>W: File stream with headers
+        Note over A,W: Content-Disposition: attachment
+        W-->>U: Download starts
+    end
+```
+
+## 8. Search Operations
+
+### 8.1 Hybrid Search Flow
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant A as API Server
+    participant E as Embedding Model
+    participant ES as Elasticsearch
+
+    U->>A: Search query
+
+    A->>E: Embed query text
+    E-->>A: Query vector [1536]
+
+    A->>ES: Hybrid query
+    Note over A,ES: script_score (vector) + bool (BM25)
+
+    ES->>ES: Vector similarity search
+    Note over ES: cosine_similarity on dense_vector
+
+    ES->>ES: BM25 full-text search
+    Note over ES: match on content field
+
+    ES->>ES: Combine scores
+    Note over ES: final = vector_score * weight + bm25_score * weight
+
+    ES-->>A: Ranked results
+
+    A->>A: Post-process results
+    A->>A: Add highlights
+    A->>A: Group by document
+
+    A-->>U: Search results with snippets
+```
+
+## 9. Multi-Tenancy Flow
+
+### 9.1 Tenant Data Isolation
+
+```mermaid
+sequenceDiagram
+    participant U1 as User (Tenant A)
+    participant U2 as User (Tenant B)
+    participant A as API Server
+    participant DB as MySQL
+
+    U1->>A: GET /api/v1/kb/list
+    A->>A: Extract tenant_id from JWT
+    Note over A: tenant_id = "tenant_a"
+    A->>DB: SELECT * FROM kb WHERE tenant_id = 'tenant_a'
+    DB-->>A: Tenant A's KBs only
+    A-->>U1: KBs for Tenant A
+
+    U2->>A: GET /api/v1/kb/list
+    A->>A: Extract tenant_id from JWT
+    Note over A: tenant_id = "tenant_b"
+    A->>DB: SELECT * FROM kb WHERE tenant_id = 'tenant_b'
+    DB-->>A: Tenant B's KBs only
+    A-->>U2: KBs for Tenant B
+
+    Note over U1,U2: Data is completely isolated
+```
+
+## 10. Connector Integration Flow
+
+### 10.1 Confluence Connector Sync
+
+```mermaid
+sequenceDiagram
+    participant U as User
+    participant A as API Server
+    participant C as Confluence Connector
+    participant CF as Confluence API
+    participant DB as MySQL
+    participant Q as Task Queue
+
+    U->>A: Setup Confluence connector
+    Note over U,A: {url, username, api_token, space_key}
+
+    A->>C: Initialize connector
+    C->>CF: Authenticate
+    CF-->>C: Auth success
+
+    A->>DB: Save connector config
+    A-->>U: Connector created
+
+    U->>A: Start sync
+    A->>Q: Queue sync task
+
+    Q->>C: Execute sync
+    C->>CF: GET /wiki/rest/api/content
+    CF-->>C: Content list
+
+    loop For each page
+        C->>CF: GET page content
+        CF-->>C: Page HTML
+        C->>C: Convert to markdown
+        C->>A: Create document
+        A->>Q: Queue parsing task
+    end
+
+    C->>DB: Update sync status
+    C-->>A: Sync complete
+    A-->>U: Show sync results
+```
+
+## Tóm Tắt
+
+| Flow | Thành phần chính | Mô tả |
+|------|-----------------|-------|
+| Authentication | User, API, DB, Redis | Đăng ký, đăng nhập với JWT |
+| Knowledge Base | API, MySQL, ES | CRUD knowledge bases |
+| Document Upload | API, MinIO, Queue, ES | Upload và index documents |
+| Chat/Dialog | API, ES, Reranker, LLM | RAG-based chat với streaming |
+| Agent Workflow | Canvas Engine, Components, LLM, Tools | Visual workflow execution |
+| GraphRAG | Entity Extractor, Graph Store, LLM | Knowledge graph queries |
+| Search | Embedding, ES | Hybrid vector + BM25 search |
+| Connectors | Connector, External API | Sync external data sources |
+
+### Các Pattern Thiết Kế Sử Dụng
+
+1. **Event-Driven**: Task queue cho background processing
+2. **Streaming**: SSE cho real-time chat responses
+3. **Hybrid Search**: Kết hợp vector và text search
+4. **ReAct Pattern**: Agent reasoning với tool use
+5. **Multi-Tenancy**: Data isolation per tenant
--- a/personal_analyze/04_modules_analysis.md
+++ b/personal_analyze/04_modules_analysis.md
@ -0,0 +1,949 @@
+# RAGFlow - Phân Tích Chi Tiết Các Module
+
+## 1. Module API (`/api/`)
+
+### 1.1 Tổng Quan
+
+Module API là trung tâm xử lý tất cả HTTP requests của hệ thống. Được xây dựng trên Flask/Quart framework với kiến trúc Blueprint.
+
+### 1.2 Cấu Trúc
+
+```
+api/
+├── ragflow_server.py      # Entry point - Khởi tạo Flask app
+├── settings.py            # Cấu hình server
+├── constants.py           # API_VERSION = "v1"
+├── validation.py          # Request validation
+│
+├── apps/                  # API Blueprints
+├── db/                    # Database layer
+└── utils/                 # Utilities
+```
+
+### 1.3 Chi Tiết Các Blueprint (API Apps)
+
+#### 1.3.1 `kb_app.py` - Knowledge Base Management
+**Chức năng**: Quản lý Knowledge Base (tạo, xóa, sửa, liệt kê)
+
+**Endpoints chính**:
+| Method | Endpoint | Mô tả |
+|--------|----------|-------|
+| POST | `/api/v1/kb/create` | Tạo KB mới |
+| GET | `/api/v1/kb/list` | Liệt kê KBs |
+| PUT | `/api/v1/kb/update` | Cập nhật KB |
+| DELETE | `/api/v1/kb/delete` | Xóa KB |
+| GET | `/api/v1/kb/{id}` | Chi tiết KB |
+
+**Logic chính**:
+- Validation tenant permissions
+- Tạo Elasticsearch index cho mỗi KB
+- Quản lý embedding model settings
+- Quản lý parser configurations
+
+#### 1.3.2 `document_app.py` - Document Management
+**Chức năng**: Upload, parsing, và quản lý documents
+
+**Endpoints chính**:
+| Method | Endpoint | Mô tả |
+|--------|----------|-------|
+| POST | `/api/v1/document/upload` | Upload file |
+| POST | `/api/v1/document/run` | Trigger parsing |
+| GET | `/api/v1/document/list` | Liệt kê docs |
+| DELETE | `/api/v1/document/delete` | Xóa document |
+| GET | `/api/v1/document/{id}/chunks` | Lấy chunks |
+
+**Logic chính**:
+- File type validation
+- MinIO storage integration
+- Background task queuing
+- Parsing status tracking
+
+#### 1.3.3 `dialog_app.py` - Chat/Dialog Management
+**Chức năng**: Xử lý chat conversations với RAG
+
+**Endpoints chính**:
+| Method | Endpoint | Mô tả |
+|--------|----------|-------|
+| POST | `/api/v1/dialog/create` | Tạo dialog |
+| POST | `/api/v1/dialog/chat` | Chat (SSE streaming) |
+| POST | `/api/v1/dialog/completion` | Non-streaming chat |
+| GET | `/api/v1/dialog/list` | Liệt kê dialogs |
+
+**Logic chính**:
+- RAG pipeline orchestration
+- Streaming response (SSE)
+- Conversation history management
+- Multi-KB retrieval
+
+#### 1.3.4 `canvas_app.py` - Agent Workflow
+**Chức năng**: Visual workflow builder cho AI agents
+
+**Endpoints chính**:
+| Method | Endpoint | Mô tả |
+|--------|----------|-------|
+| POST | `/api/v1/canvas/create` | Tạo workflow |
+| POST | `/api/v1/canvas/run` | Execute workflow |
+| PUT | `/api/v1/canvas/update` | Cập nhật |
+| GET | `/api/v1/canvas/list` | Liệt kê |
+
+**Logic chính**:
+- DSL parsing và validation
+- Component orchestration
+- Tool integration
+- Variable passing between nodes
+
+#### 1.3.5 `file_app.py` - File Management
+**Chức năng**: Upload, download, quản lý files
+
+**Endpoints chính**:
+| Method | Endpoint | Mô tả |
+|--------|----------|-------|
+| POST | `/api/v1/file/upload` | Upload file |
+| GET | `/api/v1/file/download/{id}` | Download |
+| GET | `/api/v1/file/list` | Liệt kê files |
+| DELETE | `/api/v1/file/delete` | Xóa file |
+
+#### 1.3.6 `search_app.py` - Search Operations
+**Chức năng**: Full-text và semantic search
+
+**Endpoints chính**:
+| Method | Endpoint | Mô tả |
+|--------|----------|-------|
+| POST | `/api/v1/search` | Hybrid search |
+| GET | `/api/v1/search/history` | Search history |
+
+### 1.4 Database Services (`/api/db/services/`)
+
+#### `dialog_service.py` (37KB - Service phức tạp nhất)
+```python
+class DialogService:
+    def chat(dialog_id, question, stream=True):
+        """
+        Main RAG chat function
+        1. Load dialog configuration
+        2. Get relevant documents (retrieval)
+        3. Rerank results
+        4. Build prompt with context
+        5. Call LLM (streaming)
+        6. Save conversation
+        """
+
+    def retrieval(dialog, question):
+        """
+        Hybrid retrieval from Elasticsearch
+        - Vector similarity search
+        - BM25 full-text search
+        - Score combination
+        """
+
+    def rerank(chunks, question):
+        """
+        Cross-encoder reranking
+        - Score each chunk against question
+        - Return top-k
+        """
+```
+
+#### `document_service.py` (39KB)
+```python
+class DocumentService:
+    def upload(file, kb_id):
+        """Upload file to MinIO, create DB record"""
+
+    def parse(doc_id):
+        """Queue document for background parsing"""
+
+    def chunk(doc_id, chunks):
+        """Save parsed chunks to ES and DB"""
+
+    def delete(doc_id):
+        """Remove doc, chunks, and file"""
+```
+
+#### `knowledgebase_service.py` (21KB)
+```python
+class KnowledgebaseService:
+    def create(name, embedding_model, parser_id):
+        """Create KB with ES index"""
+
+    def update_parser_config(kb_id, config):
+        """Update chunking/parsing settings"""
+
+    def get_statistics(kb_id):
+        """Get doc count, chunk count, etc."""
+```
+
+### 1.5 Database Models (`/api/db/db_models.py`)
+
+**25+ Models quan trọng**:
+
+```python
+# User & Tenant
+class User(BaseModel):
+    id, email, password, nickname, avatar, status, login_channel
+
+class Tenant(BaseModel):
+    id, name, public_key, llm_id, embd_id, parser_id, credit
+
+class UserTenant(BaseModel):
+    user_id, tenant_id, role  # owner, admin, member
+
+# Knowledge Management
+class Knowledgebase(BaseModel):
+    id, tenant_id, name, description, embd_id, parser_id,
+    similarity_threshold, vector_similarity_weight, ...
+
+class Document(BaseModel):
+    id, kb_id, name, location, size, type, parser_id,
+    status, progress, chunk_num, token_num, process_duation
+
+class File(BaseModel):
+    id, tenant_id, name, size, location, type, source_type
+
+# Chat & Dialog
+class Dialog(BaseModel):
+    id, tenant_id, name, description, kb_ids, llm_id,
+    prompt_config, similarity_threshold, top_n, top_k
+
+class Conversation(BaseModel):
+    id, dialog_id, name, message  # JSON array of messages
+
+# Workflow
+class UserCanvas(BaseModel):
+    id, tenant_id, name, dsl, avatar  # DSL is workflow definition
+
+class CanvasTemplate(BaseModel):
+    id, name, dsl, avatar  # Pre-built templates
+
+# Integration
+class APIToken(BaseModel):
+    id, tenant_id, token, dialog_id  # For external API access
+
+class MCPServer(BaseModel):
+    id, tenant_id, name, host, tools  # MCP server config
+```
+
+---
+
+## 2. Module RAG (`/rag/`)
+
+### 2.1 Tổng Quan
+
+Core RAG processing engine - xử lý từ document parsing đến retrieval.
+
+### 2.2 LLM Abstractions (`/rag/llm/`)
+
+#### `chat_model.py` - Chat LLM Interface
+```python
+class Base:
+    """Abstract base for all chat models"""
+    def chat(messages, stream=True, **kwargs):
+        """Generate chat completion"""
+
+class OpenAIChat(Base):
+    """OpenAI GPT models"""
+
+class ClaudeChat(Base):
+    """Anthropic Claude models"""
+
+class QwenChat(Base):
+    """Alibaba Qwen models"""
+
+class OllamaChat(Base):
+    """Local Ollama models"""
+
+# Factory function
+def get_chat_model(model_name, api_key, base_url):
+    """Return appropriate chat model instance"""
+```
+
+**Supported Providers** (20+):
+- OpenAI (GPT-3.5, GPT-4, GPT-4V)
+- Anthropic (Claude 3)
+- Google (Gemini)
+- Alibaba (Qwen, Qwen-VL)
+- Groq
+- Mistral
+- Cohere
+- DeepSeek
+- Zhipu (GLM)
+- Moonshot
+- Ollama (local)
+- NVIDIA
+- Bedrock (AWS)
+- Azure OpenAI
+- Hugging Face
+- ...
+
+#### `embedding_model.py` - Embedding Interface
+```python
+class Base:
+    """Abstract base for embeddings"""
+    def encode(texts: List[str]) -> List[List[float]]:
+        """Generate embeddings for texts"""
+
+class OpenAIEmbed(Base):
+    """text-embedding-ada-002, text-embedding-3-*"""
+
+class BGEEmbed(Base):
+    """BAAI BGE models"""
+
+class JinaEmbed(Base):
+    """Jina AI embeddings"""
+
+# Supported embedding models:
+# - OpenAI: ada-002, embedding-3-small, embedding-3-large
+# - BGE: bge-base, bge-large, bge-m3
+# - Jina: jina-embeddings-v2
+# - Cohere: embed-english-v3
+# - HuggingFace: sentence-transformers
+# - Local: Ollama embeddings
+```
+
+#### `rerank_model.py` - Reranking Interface
+```python
+class Base:
+    """Abstract base for rerankers"""
+    def rerank(query: str, documents: List[str]) -> List[float]:
+        """Score documents against query"""
+
+class CohereRerank(Base):
+    """Cohere rerank models"""
+
+class JinaRerank(Base):
+    """Jina AI reranker"""
+
+class BGERerank(Base):
+    """BAAI BGE reranker"""
+```
+
+### 2.3 RAG Pipeline (`/rag/flow/`)
+
+#### Pipeline Architecture
+```
+Document → Parser → Tokenizer → Splitter → Embedder → Index
+```
+
+#### `parser/parser.py`
+```python
+def parse(file_path, parser_config):
+    """
+    Parse document based on file type
+    Returns: List of text segments with metadata
+    """
+    # Supported parsers:
+    # - naive: Simple text extraction
+    # - paper: Academic paper structure
+    # - book: Book chapter detection
+    # - laws: Legal document parsing
+    # - presentation: PPT parsing
+    # - qa: Q&A format extraction
+    # - table: Table extraction
+    # - picture: Image description
+    # - one: Single chunk per doc
+    # - audio: Audio transcription
+    # - email: Email thread parsing
+```
+
+#### `splitter/splitter.py`
+```python
+class Splitter:
+    """Document chunking strategies"""
+
+    def split_by_tokens(text, chunk_size=512, overlap=128):
+        """Token-based splitting"""
+
+    def split_by_sentences(text, max_sentences=10):
+        """Sentence-based splitting"""
+
+    def split_by_delimiter(text, delimiter='\n\n'):
+        """Delimiter-based splitting"""
+
+    def split_semantic(text, threshold=0.5):
+        """Semantic similarity based splitting"""
+```
+
+#### `tokenizer/tokenizer.py`
+```python
+class Tokenizer:
+    """Text tokenization"""
+
+    def tokenize(text):
+        """Convert text to tokens"""
+
+    def count_tokens(text):
+        """Count tokens in text"""
+
+    # Uses tiktoken for OpenAI models
+    # Uses model-specific tokenizers for others
+```
+
+### 2.4 RAPTOR (`/rag/raptor.py`)
+
+**RAPTOR** = Recursive Abstractive Processing for Tree-Organized Retrieval
+
+```python
+class RAPTOR:
+    """
+    Hierarchical document representation
+    - Clusters similar chunks
+    - Creates summaries of clusters
+    - Builds tree structure for retrieval
+    """
+
+    def build_tree(chunks):
+        """Build RAPTOR tree from chunks"""
+
+    def retrieve(query, tree):
+        """Retrieve from tree structure"""
+```
+
+---
+
+## 3. Module DeepDoc (`/deepdoc/`)
+
+### 3.1 Tổng Quan
+
+Deep document understanding với layout analysis và OCR.
+
+### 3.2 Document Parsers (`/deepdoc/parser/`)
+
+#### `pdf_parser.py` - PDF Processing
+```python
+class PdfParser:
+    """
+    Advanced PDF parsing with:
+    - OCR for scanned pages
+    - Layout analysis (tables, figures, headers)
+    - Multi-column detection
+    - Image extraction
+    """
+
+    def __call__(file_path):
+        """Parse PDF file"""
+        # 1. Extract text with PyMuPDF
+        # 2. Apply OCR if needed (Tesseract)
+        # 3. Analyze layout (detectron2/layoutlm)
+        # 4. Extract tables (camelot/tabula)
+        # 5. Extract images
+        # Return structured content
+```
+
+#### `docx_parser.py` - Word Documents
+```python
+class DocxParser:
+    """
+    Parse .docx files
+    - Text extraction
+    - Table extraction
+    - Image extraction
+    - Style preservation
+    """
+```
+
+#### `excel_parser.py` - Spreadsheets
+```python
+class ExcelParser:
+    """
+    Parse .xlsx/.xls files
+    - Sheet-by-sheet processing
+    - Table structure preservation
+    - Formula evaluation
+    """
+```
+
+#### `html_parser.py` - Web Pages
+```python
+class HtmlParser:
+    """
+    Parse HTML content
+    - Clean HTML
+    - Extract main content
+    - Handle tables
+    - Remove scripts/styles
+    """
+```
+
+### 3.3 Vision Module (`/deepdoc/vision/`)
+
+```python
+class LayoutAnalyzer:
+    """
+    Document layout analysis using ML
+    - Detectron2 for object detection
+    - LayoutLM for document understanding
+    """
+
+    def analyze(image):
+        """
+        Detect document regions:
+        - Title
+        - Paragraph
+        - Table
+        - Figure
+        - Header/Footer
+        - List
+        """
+```
+
+---
+
+## 4. Module Agent (`/agent/`)
+
+### 4.1 Tổng Quan
+
+Agentic workflow system với visual canvas builder.
+
+### 4.2 Canvas Engine (`/agent/canvas.py`)
+
+```python
+class Canvas:
+    """
+    Main workflow orchestrator
+    - Parse DSL definition
+    - Execute components in order
+    - Handle branching logic
+    - Manage variables
+    """
+
+    def __init__(self, dsl):
+        """Initialize from DSL"""
+        self.components = self._parse_dsl(dsl)
+        self.graph = self._build_graph()
+
+    def run(self, input_data):
+        """Execute workflow"""
+        context = {"input": input_data}
+
+        for component in self._topological_sort():
+            result = component.execute(context)
+            context.update(result)
+
+        return context["output"]
+```
+
+### 4.3 Components (`/agent/component/`)
+
+#### `begin.py` - Workflow Start
+```python
+class BeginComponent:
+    """
+    Entry point of workflow
+    - Initialize variables
+    - Receive user input
+    """
+    def execute(self, context):
+        return {"user_input": context["input"]}
+```
+
+#### `llm.py` - LLM Component
+```python
+class LLMComponent:
+    """
+    Call LLM with configured prompt
+    - Template variable substitution
+    - Streaming support
+    - Output parsing
+    """
+    def execute(self, context):
+        prompt = self.template.format(**context)
+        response = self.llm.chat(prompt)
+        return {"llm_output": response}
+```
+
+#### `retrieval.py` - Retrieval Component
+```python
+class RetrievalComponent:
+    """
+    Search knowledge bases
+    - Multi-KB search
+    - Configurable top_k
+    - Score threshold
+    """
+    def execute(self, context):
+        query = context["user_input"]
+        results = self.search(query, self.kb_ids)
+        return {"retrieved_docs": results}
+```
+
+#### `categorize.py` - Conditional Branching
+```python
+class CategorizeComponent:
+    """
+    Route to different paths based on conditions
+    - LLM-based classification
+    - Rule-based matching
+    """
+    def execute(self, context):
+        category = self._classify(context)
+        return {"next_node": self.routes[category]}
+```
+
+#### `agent_with_tools.py` - Tool-Using Agent
+```python
+class AgentWithToolsComponent:
+    """
+    ReAct pattern agent
+    - Tool selection
+    - Iterative reasoning
+    - Observation handling
+    """
+    def execute(self, context):
+        while not done:
+            action = self.llm.decide_action(context)
+            if action.type == "tool":
+                result = self.tools[action.tool].run(action.input)
+                context["observation"] = result
+            else:
+                return {"output": action.response}
+```
+
+### 4.4 Tools (`/agent/tools/`)
+
+#### External Tool Integrations
+
+| Tool | File | Chức năng |
+|------|------|-----------|
+| Tavily | `tavily.py` | Web search API |
+| ArXiv | `arxiv.py` | Academic paper search |
+| Google | `google.py` | Google search |
+| Wikipedia | `wikipedia.py` | Wikipedia lookup |
+| GitHub | `github.py` | GitHub API |
+| Email | `email.py` | Send emails |
+| Code Exec | `code_exec.py` | Execute Python code |
+| DeepL | `deepl.py` | Translation |
+| Jin10 | `jin10.py` | Financial news |
+| TuShare | `tushare.py` | Chinese stock data |
+| Yahoo Finance | `yahoofinance.py` | Stock data |
+| QWeather | `qweather.py` | Weather data |
+
+```python
+class BaseTool:
+    """Base class for all tools"""
+    name: str
+    description: str
+
+    def run(self, input: str) -> str:
+        """Execute tool and return result"""
+
+class TavilySearch(BaseTool):
+    name = "tavily_search"
+    description = "Search the web for current information"
+
+    def run(self, query):
+        response = tavily.search(query)
+        return format_results(response)
+```
+
+---
+
+## 5. Module GraphRAG (`/graphrag/`)
+
+### 5.1 Tổng Quan
+
+Knowledge graph construction và querying.
+
+### 5.2 Entity Resolution (`/graphrag/entity_resolution.py`)
+
+```python
+class EntityResolution:
+    """
+    Entity extraction và linking
+    - Extract entities from text
+    - Cluster similar entities
+    - Resolve duplicates
+    """
+
+    def extract_entities(text):
+        """Extract named entities using LLM"""
+        prompt = f"Extract entities from: {text}"
+        return llm.chat(prompt)
+
+    def resolve_entities(entities):
+        """Merge duplicate entities"""
+        clusters = self._cluster_similar(entities)
+        return self._merge_clusters(clusters)
+```
+
+### 5.3 Graph Search (`/graphrag/search.py`)
+
+```python
+class GraphSearch:
+    """
+    Query knowledge graph
+    - Entity-based search
+    - Relationship traversal
+    - Subgraph extraction
+    """
+
+    def search(query):
+        """Find relevant subgraph for query"""
+        # 1. Extract query entities
+        # 2. Find matching graph entities
+        # 3. Traverse relationships
+        # 4. Return context subgraph
+```
+
+---
+
+## 6. Module Frontend (`/web/`)
+
+### 6.1 Tổng Quan
+
+React/TypeScript SPA với UmiJS framework.
+
+### 6.2 Pages (`/web/src/pages/`)
+
+| Page | Chức năng |
+|------|-----------|
+| `/dataset` | Knowledge base management |
+| `/datasets` | Dataset list view |
+| `/next-chats` | Chat interface |
+| `/next-searches` | Search interface |
+| `/document-viewer` | Document preview |
+| `/admin` | Admin dashboard |
+| `/login` | Authentication |
+| `/register` | User registration |
+
+### 6.3 Components (`/web/src/components/`)
+
+**Core Components**:
+- `file-upload-modal/` - File upload UI
+- `pdf-drawer/` - PDF preview drawer
+- `prompt-editor/` - Prompt template editor
+- `document-preview/` - Document viewer
+- `llm-setting-items/` - LLM configuration UI
+- `ui/` - Shadcn/UI base components
+
+### 6.4 State Management
+
+```typescript
+// Using Zustand for state
+import { create } from 'zustand';
+
+interface KnowledgebaseStore {
+  knowledgebases: Knowledgebase[];
+  currentKb: Knowledgebase | null;
+  fetchKnowledgebases: () => Promise<void>;
+  createKnowledgebase: (data: CreateKbRequest) => Promise<void>;
+}
+
+export const useKnowledgebaseStore = create<KnowledgebaseStore>((set) => ({
+  knowledgebases: [],
+  currentKb: null,
+  fetchKnowledgebases: async () => {
+    const data = await api.get('/kb/list');
+    set({ knowledgebases: data });
+  },
+  // ...
+}));
+```
+
+### 6.5 API Services (`/web/src/services/`)
+
+```typescript
+// API client using Axios
+import { request } from 'umi';
+
+export async function createKnowledgebase(data: CreateKbRequest) {
+  return request('/api/v1/kb/create', {
+    method: 'POST',
+    data,
+  });
+}
+
+export async function chat(dialogId: string, question: string) {
+  return request('/api/v1/dialog/chat', {
+    method: 'POST',
+    data: { dialog_id: dialogId, question },
+    responseType: 'stream',
+  });
+}
+```
+
+---
+
+## 7. Module Common (`/common/`)
+
+### 7.1 Configuration (`/common/settings.py`)
+
+```python
+# Main configuration file
+class Settings:
+    # Database
+    MYSQL_HOST = os.getenv('MYSQL_HOST', 'localhost')
+    MYSQL_PORT = int(os.getenv('MYSQL_PORT', 5455))
+    MYSQL_USER = os.getenv('MYSQL_USER', 'root')
+    MYSQL_PASSWORD = os.getenv('MYSQL_PASSWORD', 'infini_rag_flow')
+    MYSQL_DATABASE = os.getenv('MYSQL_DATABASE', 'ragflow')
+
+    # Elasticsearch
+    ES_HOSTS = os.getenv('ES_HOSTS', 'http://localhost:9200').split(',')
+
+    # Redis
+    REDIS_HOST = os.getenv('REDIS_HOST', 'localhost')
+    REDIS_PORT = int(os.getenv('REDIS_PORT', 6379))
+
+    # MinIO
+    MINIO_HOST = os.getenv('MINIO_HOST', 'localhost:9000')
+    MINIO_ACCESS_KEY = os.getenv('MINIO_USER', 'rag_flow')
+    MINIO_SECRET_KEY = os.getenv('MINIO_PASSWORD', 'infini_rag_flow')
+
+    # Document Engine
+    DOC_ENGINE = os.getenv('DOC_ENGINE', 'elasticsearch')  # or 'infinity'
+```
+
+### 7.2 Data Source Connectors (`/common/data_source/`)
+
+**Supported Connectors**:
+
+| Connector | File | Chức năng |
+|-----------|------|-----------|
+| Confluence | `confluence_connector.py` (81KB) | Atlassian Confluence wiki |
+| Notion | `notion_connector.py` (25KB) | Notion databases |
+| Slack | `slack_connector.py` (22KB) | Slack messages |
+| Gmail | `gmail_connector.py` | Gmail emails |
+| Discord | `discord_connector.py` | Discord channels |
+| SharePoint | `sharepoint_connector.py` | Microsoft SharePoint |
+| Teams | `teams_connector.py` | Microsoft Teams |
+| Dropbox | `dropbox_connector.py` | Dropbox files |
+| Google Drive | `google_drive/` | Google Drive |
+| WebDAV | `webdav_connector.py` | WebDAV servers |
+| Moodle | `moodle_connector.py` | Moodle LMS |
+
+```python
+class BaseConnector:
+    """Abstract base for connectors"""
+
+    def authenticate(credentials):
+        """Authenticate with external service"""
+
+    def list_items():
+        """List available items"""
+
+    def sync():
+        """Sync data to RAGFlow"""
+
+class ConfluenceConnector(BaseConnector):
+    """Confluence integration"""
+
+    def __init__(self, url, username, api_token):
+        self.client = Confluence(url, username, api_token)
+
+    def sync_space(space_key):
+        """Sync all pages from a space"""
+        pages = self.client.get_all_pages(space_key)
+        for page in pages:
+            content = self._convert_to_markdown(page.body)
+            yield Document(content=content, metadata=page.metadata)
+```
+
+---
+
+## 8. Module SDK (`/sdk/python/`)
+
+### 8.1 Python SDK
+
+```python
+from ragflow import RAGFlow
+
+# Initialize client
+client = RAGFlow(
+    api_key="your-api-key",
+    base_url="http://localhost:9380"
+)
+
+# Create knowledge base
+kb = client.create_knowledgebase(
+    name="My KB",
+    embedding_model="text-embedding-3-small"
+)
+
+# Upload document
+doc = kb.upload_document("path/to/document.pdf")
+
+# Wait for parsing
+doc.wait_for_ready()
+
+# Create chat
+chat = client.create_chat(
+    name="My Chat",
+    knowledgebase_ids=[kb.id]
+)
+
+# Send message
+response = chat.send_message("What is this document about?")
+print(response.answer)
+```
+
+---
+
+## 9. Tóm Tắt Module Dependencies
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         Frontend (web/)                          │
+└─────────────────────────────┬───────────────────────────────────┘
+                              │ HTTP/SSE
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                          API (api/)                              │
+│   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐           │
+│   │ kb_app  │  │doc_app  │  │dialog_  │  │canvas_  │           │
+│   │         │  │         │  │app      │  │app      │           │
+│   └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘           │
+│        └────────────┴───────────┴────────────┘                  │
+│                              │                                   │
+│   ┌──────────────────────────┴──────────────────────────┐      │
+│   │                    Services Layer                    │      │
+│   │  DialogService │ DocumentService │ KBService         │      │
+│   └───────────────────────────┬─────────────────────────┘      │
+└───────────────────────────────┼─────────────────────────────────┘
+                                │
+        ┌───────────────────────┼───────────────────────┐
+        │                       │                       │
+        ▼                       ▼                       ▼
+┌───────────────┐    ┌──────────────────┐    ┌──────────────────┐
+│   RAG (rag/)  │    │  Agent (agent/)  │    │GraphRAG(graphrag)│
+│               │    │                  │    │                  │
+│ - LLM Models  │    │ - Canvas Engine  │    │ - Entity Res.    │
+│ - Pipeline    │    │ - Components     │    │ - Graph Search   │
+│ - Embeddings  │    │ - Tools          │    │                  │
+└───────┬───────┘    └────────┬─────────┘    └────────┬─────────┘
+        │                     │                       │
+        └─────────────────────┼───────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      DeepDoc (deepdoc/)                          │
+│                                                                  │
+│   PDF Parser │ DOCX Parser │ HTML Parser │ Vision/OCR           │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                       Common (common/)                           │
+│                                                                  │
+│   Settings │ Utilities │ Data Source Connectors                 │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      Data Stores                                 │
+│                                                                  │
+│   MySQL │ Elasticsearch/Infinity │ Redis │ MinIO                │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## 10. Kích Thước Code Ước Tính
+
+| Module | Lines of Code | Complexity |
+|--------|--------------|------------|
+| api/ | ~15,000 | High |
+| rag/ | ~8,000 | High |
+| deepdoc/ | ~5,000 | Medium |
+| agent/ | ~6,000 | High |
+| graphrag/ | ~3,000 | Medium |
+| web/src/ | ~20,000 | High |
+| common/ | ~5,000 | Medium |
+| **Total** | **~62,000** | - |
--- a/personal_analyze/05_tech_stack.md
+++ b/personal_analyze/05_tech_stack.md
@ -0,0 +1,634 @@
+# RAGFlow - Tech Stack Analysis
+
+## 1. Tổng Quan Tech Stack
+
+RAGFlow sử dụng một tech stack hiện đại, được thiết kế để xử lý các workload AI/ML nặng với khả năng scale tốt.
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                           TECH STACK OVERVIEW                            │
+├─────────────────────────────────────────────────────────────────────────┤
+│                                                                          │
+│  ┌────────────────────────────────────────────────────────────────────┐ │
+│  │                        FRONTEND                                     │ │
+│  │  React 18 │ TypeScript │ UmiJS │ Ant Design │ Tailwind CSS         │ │
+│  │  Zustand │ TanStack Query │ XYFlow │ Monaco Editor                 │ │
+│  └────────────────────────────────────────────────────────────────────┘ │
+│                                                                          │
+│  ┌────────────────────────────────────────────────────────────────────┐ │
+│  │                        BACKEND                                      │ │
+│  │  Python 3.10-3.12 │ Flask/Quart │ Peewee ORM │ Celery              │ │
+│  │  AsyncIO │ JWT │ SSE Streaming                                      │ │
+│  └────────────────────────────────────────────────────────────────────┘ │
+│                                                                          │
+│  ┌────────────────────────────────────────────────────────────────────┐ │
+│  │                        AI/ML                                        │ │
+│  │  LangChain │ OpenAI │ Sentence Transformers │ Hugging Face         │ │
+│  │  PyTorch │ Detectron2 │ Tesseract OCR                              │ │
+│  └────────────────────────────────────────────────────────────────────┘ │
+│                                                                          │
+│  ┌────────────────────────────────────────────────────────────────────┐ │
+│  │                        DATA LAYER                                   │ │
+│  │  MySQL 8 │ Elasticsearch 8 │ Redis │ MinIO │ Infinity              │ │
+│  └────────────────────────────────────────────────────────────────────┘ │
+│                                                                          │
+│  ┌────────────────────────────────────────────────────────────────────┐ │
+│  │                        INFRASTRUCTURE                               │ │
+│  │  Docker │ Docker Compose │ Kubernetes │ Nginx │ Helm               │ │
+│  └────────────────────────────────────────────────────────────────────┘ │
+│                                                                          │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 2. Frontend Technologies
+
+### 2.1 Core Framework
+
+| Technology | Version | Mục đích |
+|------------|---------|----------|
+| **React** | 18.x | UI library chính |
+| **TypeScript** | 5.x | Type-safe JavaScript |
+| **UmiJS** | 4.x | React framework (Ant Design ecosystem) |
+| **Vite** | 5.x | Build tool (nhanh hơn Webpack) |
+
+### 2.2 UI Libraries
+
+| Library | Version | Mục đích |
+|---------|---------|----------|
+| **Ant Design** | 5.x | Primary UI component library |
+| **Shadcn/UI** | Latest | Modern, customizable components |
+| **Radix UI** | Latest | Headless UI primitives |
+| **Tailwind CSS** | 3.x | Utility-first CSS framework |
+| **LESS** | 4.x | CSS preprocessor (legacy) |
+
+### 2.3 State Management & Data Fetching
+
+| Library | Mục đích |
+|---------|----------|
+| **Zustand** | Lightweight state management |
+| **TanStack React Query** | Server state & caching |
+| **Axios** | HTTP client |
+
+### 2.4 Specialized Libraries
+
+| Library | Mục đích |
+|---------|----------|
+| **XYFlow (React Flow)** | Workflow/canvas visualization |
+| **Monaco Editor** | Code editor (VS Code core) |
+| **AntV G2/G6** | Data visualization & graphs |
+| **Recharts** | Charts and analytics |
+| **Lexical** | Rich text editor (Facebook) |
+| **React Markdown** | Markdown rendering |
+| **i18next** | Internationalization |
+| **React Hook Form** | Form handling |
+| **Zod** | Schema validation |
+
+### 2.5 Package.json Dependencies (172 packages)
+
+```json
+{
+  "dependencies": {
+    "react": "^18.2.0",
+    "react-dom": "^18.2.0",
+    "umi": "^4.0.0",
+    "antd": "^5.0.0",
+    "@tanstack/react-query": "^5.0.0",
+    "zustand": "^4.0.0",
+    "axios": "^1.0.0",
+    "tailwindcss": "^3.0.0",
+    "@xyflow/react": "^12.0.0",
+    "@monaco-editor/react": "^4.0.0",
+    "lexical": "^0.12.0",
+    "react-markdown": "^9.0.0",
+    "i18next": "^23.0.0",
+    "react-hook-form": "^7.0.0",
+    "zod": "^3.0.0",
+    "@radix-ui/react-*": "latest",
+    "@ant-design/icons": "^5.0.0",
+    "@antv/g2": "^5.0.0",
+    "@antv/g6": "^5.0.0"
+  }
+}
+```
+
+---
+
+## 3. Backend Technologies
+
+### 3.1 Core Framework
+
+| Technology | Version | Mục đích |
+|------------|---------|----------|
+| **Python** | 3.10-3.12 | Programming language |
+| **Flask** | 3.x | Web framework |
+| **Quart** | 0.19.x | Async Flask (ASGI) |
+| **Hypercorn** | Latest | ASGI server |
+
+### 3.2 Database & ORM
+
+| Technology | Mục đích |
+|------------|----------|
+| **Peewee** | Lightweight ORM (primary) |
+| **SQLAlchemy** | Advanced ORM operations |
+| **PyMySQL** | MySQL driver |
+
+### 3.3 Authentication & Security
+
+| Library | Mục đích |
+|---------|----------|
+| **PyJWT** | JWT token handling |
+| **bcrypt** | Password hashing |
+| **python-jose** | JOSE implementation |
+| **Authlib** | OAuth integration |
+
+### 3.4 Async & Background Tasks
+
+| Library | Mục đích |
+|---------|----------|
+| **asyncio** | Async I/O |
+| **aiohttp** | Async HTTP client |
+| **Redis/Valkey** | Task queue & caching |
+| **APScheduler** | Job scheduling |
+
+### 3.5 API & Documentation
+
+| Library | Mục đích |
+|---------|----------|
+| **Flasgger** | Swagger/OpenAPI docs |
+| **Flask-CORS** | CORS handling |
+| **Werkzeug** | WSGI utilities |
+
+### 3.6 pyproject.toml Dependencies (150+ packages)
+
+```toml
+[project]
+name = "ragflow"
+version = "0.22.1"
+requires-python = ">=3.10,<3.13"
+
+dependencies = [
+    # Web Framework
+    "flask>=3.0.0",
+    "quart>=0.19.0",
+    "hypercorn>=0.17.0",
+    "flask-cors>=4.0.0",
+    "flasgger>=0.9.0",
+
+    # Database
+    "peewee>=3.17.0",
+    "pymysql>=1.1.0",
+
+    # Authentication
+    "pyjwt>=2.8.0",
+    "bcrypt>=4.1.0",
+
+    # Async
+    "aiohttp>=3.9.0",
+    "httpx>=0.27.0",
+
+    # Data Processing
+    "pandas>=2.0.0",
+    "numpy>=1.26.0",
+
+    # AI/ML (see section 4)
+    ...
+]
+```
+
+---
+
+## 4. AI/ML Technologies
+
+### 4.1 LLM Integration
+
+| Provider | Library | Models Supported |
+|----------|---------|-----------------|
+| **OpenAI** | `openai>=1.0` | GPT-3.5, GPT-4, GPT-4V |
+| **Anthropic** | `anthropic>=0.20` | Claude 3 family |
+| **Google** | `google-generativeai` | Gemini Pro |
+| **Cohere** | `cohere>=5.0` | Command, Embed, Rerank |
+| **Groq** | `groq>=0.4` | LLaMA, Mixtral |
+| **Mistral** | `mistralai>=0.1` | Mistral 7B, Mixtral |
+| **Ollama** | `ollama>=0.1` | Local models |
+| **HuggingFace** | `huggingface_hub` | Open source models |
+
+### 4.2 Embedding Models
+
+| Library | Models |
+|---------|--------|
+| **Sentence Transformers** | all-MiniLM, all-mpnet, etc. |
+| **OpenAI Embeddings** | text-embedding-3-small/large |
+| **BGE** | bge-base, bge-large, bge-m3 |
+| **Jina** | jina-embeddings-v2 |
+| **Cohere** | embed-english-v3 |
+
+```python
+# Embedding configuration
+EMBEDDING_MODELS = {
+    "openai": {
+        "text-embedding-3-small": {"dim": 1536, "max_tokens": 8191},
+        "text-embedding-3-large": {"dim": 3072, "max_tokens": 8191},
+    },
+    "bge": {
+        "bge-base-en-v1.5": {"dim": 768, "max_tokens": 512},
+        "bge-large-en-v1.5": {"dim": 1024, "max_tokens": 512},
+        "bge-m3": {"dim": 1024, "max_tokens": 8192},
+    },
+    "sentence-transformers": {
+        "all-MiniLM-L6-v2": {"dim": 384, "max_tokens": 256},
+        "all-mpnet-base-v2": {"dim": 768, "max_tokens": 384},
+    }
+}
+```
+
+### 4.3 Document Processing
+
+| Library | Mục đích |
+|---------|----------|
+| **PyMuPDF (fitz)** | PDF text extraction |
+| **pdf2image** | PDF to image conversion |
+| **Tesseract (pytesseract)** | OCR |
+| **python-docx** | Word document parsing |
+| **openpyxl** | Excel parsing |
+| **python-pptx** | PowerPoint parsing |
+| **BeautifulSoup4** | HTML parsing |
+| **markdown** | Markdown processing |
+| **camelot-py** | Table extraction from PDF |
+| **tabula-py** | Alternative table extraction |
+
+### 4.4 Computer Vision
+
+| Library | Mục đích |
+|---------|----------|
+| **Detectron2** | Layout analysis |
+| **LayoutLM** | Document understanding |
+| **OpenCV** | Image processing |
+| **Pillow** | Image manipulation |
+| **YOLO** | Object detection |
+
+### 4.5 NLP & Text Processing
+
+| Library | Mục đích |
+|---------|----------|
+| **tiktoken** | OpenAI tokenization |
+| **nltk** | Natural language toolkit |
+| **spaCy** | NLP pipeline |
+| **regex** | Advanced regex |
+| **chardet** | Character encoding detection |
+
+### 4.6 Vector Operations
+
+| Library | Mục đích |
+|---------|----------|
+| **NumPy** | Numerical operations |
+| **SciPy** | Scientific computing |
+| **scikit-learn** | ML utilities, clustering |
+| **faiss-cpu/gpu** | Vector similarity search |
+
+---
+
+## 5. Data Storage Technologies
+
+### 5.1 Relational Database
+
+| Technology | Mục đích | Configuration |
+|------------|----------|---------------|
+| **MySQL 8.0** | Primary database | Port 5455 |
+| **PostgreSQL** | Alternative (supported) | - |
+
+**MySQL Schema Design**:
+- InnoDB engine
+- UTF8MB4 character set
+- JSON columns for flexible data
+- Foreign keys for integrity
+
+### 5.2 Vector/Search Database
+
+| Technology | Mục đích | Configuration |
+|------------|----------|---------------|
+| **Elasticsearch 8.12** | Default vector store | Port 9200 |
+| **Infinity** | Alternative (in-house) | Port 23817 |
+| **OpenSearch** | Alternative | Port 9200 |
+| **OceanBase** | Alternative (distributed) | - |
+
+**Elasticsearch Configuration**:
+```json
+{
+  "settings": {
+    "number_of_shards": 1,
+    "number_of_replicas": 0,
+    "analysis": {
+      "analyzer": {
+        "ik_smart": { "type": "ik_smart" },
+        "ik_max_word": { "type": "ik_max_word" }
+      }
+    }
+  },
+  "mappings": {
+    "properties": {
+      "content": { "type": "text", "analyzer": "ik_smart" },
+      "embedding": {
+        "type": "dense_vector",
+        "dims": 1536,
+        "index": true,
+        "similarity": "cosine"
+      }
+    }
+  }
+}
+```
+
+### 5.3 Cache & Message Queue
+
+| Technology | Mục đích | Configuration |
+|------------|----------|---------------|
+| **Redis 7.x** | Cache, sessions, queue | Port 6379 |
+| **Valkey** | Redis alternative | Port 6379 |
+
+**Redis Usage**:
+- Session storage
+- Rate limiting
+- Task queue (custom implementation)
+- Cache layer
+
+### 5.4 Object Storage
+
+| Technology | Mục đích | Configuration |
+|------------|----------|---------------|
+| **MinIO** | S3-compatible storage | Port 9000/9001 |
+| **AWS S3** | Cloud storage option | - |
+| **Azure Blob** | Cloud storage option | - |
+
+**MinIO Structure**:
+```
+ragflow/                    # Bucket
+├── {tenant_id}/
+│   ├── {kb_id}/
+│   │   ├── {file_id}      # Original files
+│   │   └── chunks/        # Processed chunks
+│   └── temp/              # Temporary files
+└── system/                # System files
+```
+
+---
+
+## 6. Infrastructure Technologies
+
+### 6.1 Containerization
+
+| Technology | Mục đích |
+|------------|----------|
+| **Docker** | Container runtime |
+| **Docker Compose** | Multi-container orchestration |
+| **BuildKit** | Efficient image building |
+
+**Docker Images**:
+```yaml
+services:
+  ragflow-server:
+    image: infiniflow/ragflow:latest
+    # or: ragflow:nightly for development
+
+  mysql:
+    image: mysql:8.0
+
+  elasticsearch:
+    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
+
+  redis:
+    image: redis:7-alpine
+
+  minio:
+    image: minio/minio:latest
+```
+
+### 6.2 Web Server & Proxy
+
+| Technology | Mục đích | Configuration |
+|------------|----------|---------------|
+| **Nginx** | Reverse proxy, static files | Port 80/443 |
+| **Hypercorn** | ASGI server | Port 9380 |
+
+**Nginx Configuration**:
+```nginx
+upstream ragflow {
+    server ragflow-server:9380;
+}
+
+server {
+    listen 80;
+
+    location /api/ {
+        proxy_pass http://ragflow;
+        proxy_http_version 1.1;
+        proxy_set_header Connection "";
+    }
+
+    location / {
+        root /usr/share/nginx/html;
+        try_files $uri $uri/ /index.html;
+    }
+}
+```
+
+### 6.3 Kubernetes Deployment
+
+| Technology | Mục đích |
+|------------|----------|
+| **Kubernetes** | Container orchestration |
+| **Helm** | K8s package manager |
+
+**Helm Chart Structure**:
+```
+helm/
+├── Chart.yaml
+├── values.yaml
+├── templates/
+│   ├── deployment.yaml
+│   ├── service.yaml
+│   ├── configmap.yaml
+│   └── ingress.yaml
+```
+
+---
+
+## 7. Development Tools
+
+### 7.1 Python Development
+
+| Tool | Mục đích |
+|------|----------|
+| **uv** | Package manager (fast) |
+| **pip** | Traditional package manager |
+| **pre-commit** | Git hooks |
+| **ruff** | Linter & formatter |
+| **pytest** | Testing framework |
+| **mypy** | Type checking |
+
+### 7.2 Frontend Development
+
+| Tool | Mục đích |
+|------|----------|
+| **npm/pnpm** | Package manager |
+| **ESLint** | Linting |
+| **Prettier** | Code formatting |
+| **Jest** | Testing |
+| **Storybook** | Component development |
+| **Husky** | Git hooks |
+
+### 7.3 Version Control & CI/CD
+
+| Tool | Mục đích |
+|------|----------|
+| **Git** | Version control |
+| **GitHub Actions** | CI/CD |
+| **Docker Hub** | Image registry |
+
+---
+
+## 8. Monitoring & Observability
+
+### 8.1 Logging
+
+| Library | Mục đích |
+|---------|----------|
+| **Python logging** | Standard logging |
+| **structlog** | Structured logging |
+
+### 8.2 Tracing
+
+| Integration | Mục đích |
+|-------------|----------|
+| **Langfuse** | LLM observability |
+| **OpenTelemetry** | Distributed tracing |
+
+### 8.3 Metrics
+
+| Tool | Mục đích |
+|------|----------|
+| **Prometheus** | Metrics collection |
+| **Grafana** | Visualization |
+
+---
+
+## 9. Third-party Integrations
+
+### 9.1 LLM Providers
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    LLM Provider Support                      │
+├─────────────────────────────────────────────────────────────┤
+│                                                              │
+│  Commercial APIs:                                            │
+│  ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐        │
+│  │OpenAI │ │Claude │ │Gemini │ │Cohere │ │ Groq  │        │
+│  └───────┘ └───────┘ └───────┘ └───────┘ └───────┘        │
+│                                                              │
+│  China Providers:                                            │
+│  ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐        │
+│  │ Qwen  │ │Zhipu  │ │Baichuan│ │Spark  │ │ERNIE  │        │
+│  └───────┘ └───────┘ └───────┘ └───────┘ └───────┘        │
+│                                                              │
+│  Self-hosted:                                                │
+│  ┌───────┐ ┌───────┐ ┌───────┐                             │
+│  │Ollama │ │ vLLM  │ │LocalAI│                             │
+│  └───────┘ └───────┘ └───────┘                             │
+│                                                              │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### 9.2 Data Source Connectors
+
+| Category | Services |
+|----------|----------|
+| **Enterprise Wiki** | Confluence, Notion, SharePoint |
+| **Communication** | Slack, Discord, Gmail, Teams |
+| **Cloud Storage** | Google Drive, Dropbox, S3, WebDAV |
+| **Development** | GitHub, Jira |
+| **Education** | Moodle |
+| **Finance** | TuShare, AkShare, Yahoo Finance |
+
+### 9.3 Search APIs
+
+| Service | Mục đích |
+|---------|----------|
+| **Tavily** | AI-optimized web search |
+| **Google Search** | Web search |
+| **Google Scholar** | Academic search |
+| **SearXNG** | Meta search |
+| **ArXiv** | Academic papers |
+| **Wikipedia** | Knowledge lookup |
+
+---
+
+## 10. System Requirements
+
+### 10.1 Minimum Requirements
+
+| Resource | Minimum | Recommended |
+|----------|---------|-------------|
+| **CPU** | 4 cores | 8+ cores |
+| **RAM** | 16 GB | 32+ GB |
+| **Disk** | 50 GB | 200+ GB SSD |
+| **GPU** | - | NVIDIA 8GB+ VRAM |
+
+### 10.2 Software Requirements
+
+| Software | Version |
+|----------|---------|
+| **Docker** | 20.10+ |
+| **Docker Compose** | 2.0+ |
+| **Python** | 3.10-3.12 |
+| **Node.js** | 18.20.4+ |
+
+### 10.3 Port Requirements
+
+| Port | Service |
+|------|---------|
+| 80/443 | Nginx (HTTP/HTTPS) |
+| 9380 | RAGFlow API |
+| 9381 | Admin Server |
+| 9200 | Elasticsearch |
+| 5455 | MySQL |
+| 6379 | Redis |
+| 9000/9001 | MinIO |
+
+---
+
+## 11. Tóm Tắt Tech Stack
+
+### Production Stack
+
+```
+Frontend:     React 18 + TypeScript + UmiJS + Ant Design + Tailwind
+Backend:      Python 3.11 + Flask/Quart + Peewee
+AI/ML:        OpenAI + Sentence Transformers + Detectron2
+Database:     MySQL 8 + Elasticsearch 8
+Cache:        Redis 7
+Storage:      MinIO
+Proxy:        Nginx
+Container:    Docker + Docker Compose
+Orchestration: Kubernetes + Helm
+```
+
+### Development Stack
+
+```
+Package Mgmt: uv (Python), npm (Node.js)
+Linting:      ruff (Python), ESLint (JS/TS)
+Testing:      pytest (Python), Jest (JS/TS)
+CI/CD:        GitHub Actions
+Version Ctrl: Git
+```
+
+### Key Architectural Choices
+
+1. **Async-first**: Quart ASGI cho high concurrency
+2. **Hybrid Search**: Vector + BM25 trong Elasticsearch
+3. **Multi-tenant**: Data isolation per tenant
+4. **Pluggable LLMs**: Abstract interface cho nhiều providers
+5. **Containerized**: Full Docker deployment
+6. **Event-driven**: Background processing với Redis queue
--- a/personal_analyze/06_source_code_analysis.md
+++ b/personal_analyze/06_source_code_analysis.md
--- a/personal_analyze/README.md
+++ b/personal_analyze/README.md
@ -0,0 +1,134 @@
+# RAGFlow Analysis Documentation
+
+Tài liệu phân tích chi tiết về RAGFlow - Open-source RAG Engine.
+
+## Tổng Quan RAGFlow
+
+**RAGFlow** (v0.22.1) là một **Retrieval-Augmented Generation (RAG) engine** mã nguồn mở, được xây dựng dựa trên **deep document understanding**. Đây là một ứng dụng full-stack với:
+
+- **Backend**: Python (Flask/Quart)
+- **Frontend**: React/TypeScript (UmiJS)
+- **Kiến trúc**: Microservices với Docker
+- **Data Stores**: MySQL, Elasticsearch/Infinity, Redis, MinIO
+
+## Danh Sách Tài Liệu
+
+| File | Nội dung |
+|------|----------|
+| [01_directory_structure.md](./01_directory_structure.md) | Cấu trúc cây thư mục chi tiết |
+| [02_system_architecture.md](./02_system_architecture.md) | Kiến trúc hệ thống với diagrams |
+| [03_sequence_diagrams.md](./03_sequence_diagrams.md) | Sequence diagrams cho các flows chính |
+| [04_modules_analysis.md](./04_modules_analysis.md) | Phân tích chi tiết từng module |
+| [05_tech_stack.md](./05_tech_stack.md) | Tech stack và dependencies |
+| [06_source_code_analysis.md](./06_source_code_analysis.md) | Phân tích source code chi tiết |
+
+## Tóm Tắt Chức Năng Chính
+
+### 1. Document Processing
+- Upload và parse nhiều định dạng (PDF, Word, Excel, PPT, HTML...)
+- OCR và layout analysis cho PDF
+- Intelligent chunking strategies
+
+### 2. RAG Pipeline
+- Hybrid search (Vector + BM25)
+- Multiple embedding models support
+- Reranking với cross-encoder
+
+### 3. Chat/Dialog
+- Streaming responses (SSE)
+- Multi-knowledge base retrieval
+- Conversation history
+
+### 4. Agent Workflows
+- Visual canvas builder
+- 15+ built-in components
+- 20+ external tool integrations
+
+### 5. Knowledge Graph (GraphRAG)
+- Entity extraction và resolution
+- Graph-based retrieval
+- Relationship visualization
+
+## Kiến Trúc High-Level
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         CLIENTS                                  │
+│        Web App │ Mobile │ Python SDK │ REST API                 │
+└────────────────────────────┬────────────────────────────────────┘
+                             │
+┌────────────────────────────┼────────────────────────────────────┐
+│                       NGINX (Gateway)                            │
+└────────────────────────────┬────────────────────────────────────┘
+                             │
+┌────────────────────────────┼────────────────────────────────────┐
+│                    APPLICATION LAYER                             │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
+│  │RAGFlow Server│  │ Admin Server │  │  MCP Server  │          │
+│  │  (Port 9380) │  │  (Port 9381) │  │  (Port 9382) │          │
+│  └──────────────┘  └──────────────┘  └──────────────┘          │
+└────────────────────────────┬────────────────────────────────────┘
+                             │
+┌────────────────────────────┼────────────────────────────────────┐
+│                     SERVICE LAYER                                │
+│  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐        │
+│  │  RAG   │ │DeepDoc │ │ Agent  │ │GraphRAG│ │Services│        │
+│  │Pipeline│ │Parsers │ │ Canvas │ │ Engine │ │ Layer  │        │
+│  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘        │
+└────────────────────────────┬────────────────────────────────────┘
+                             │
+┌────────────────────────────┼────────────────────────────────────┐
+│                      DATA LAYER                                  │
+│  ┌────────┐ ┌────────────┐ ┌────────┐ ┌────────┐               │
+│  │ MySQL  │ │Elasticsearch│ │ Redis  │ │ MinIO  │               │
+│  │(5455)  │ │   (9200)    │ │ (6379) │ │ (9000) │               │
+│  └────────┘ └────────────┘ └────────┘ └────────┘               │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Tech Stack Summary
+
+| Layer | Technologies |
+|-------|-------------|
+| **Frontend** | React 18, TypeScript, UmiJS, Ant Design, Tailwind CSS |
+| **Backend** | Python 3.10-3.12, Flask/Quart, Peewee ORM |
+| **AI/ML** | OpenAI, Sentence Transformers, Detectron2, PyTorch |
+| **Database** | MySQL 8, Elasticsearch 8, Redis 7 |
+| **Storage** | MinIO (S3-compatible) |
+| **Infrastructure** | Docker, Nginx, Kubernetes/Helm |
+
+## LLM Providers Supported
+
+- OpenAI (GPT-3.5, GPT-4, GPT-4V)
+- Anthropic (Claude 3)
+- Google (Gemini)
+- Alibaba (Qwen)
+- Groq, Mistral, Cohere
+- Ollama (local models)
+- 20+ more providers
+
+## Data Connectors
+
+- Enterprise: Confluence, Notion, SharePoint, Jira
+- Communication: Slack, Discord, Gmail, Teams
+- Storage: Google Drive, Dropbox, S3, WebDAV
+
+## Quick Stats
+
+| Metric | Value |
+|--------|-------|
+| Total LOC | ~62,000+ |
+| Python Files | ~300+ |
+| TS/JS Files | ~400+ |
+| Database Models | 25+ |
+| API Endpoints | ~50+ |
+| LLM Providers | 20+ |
+| Data Connectors | 15+ |
+
+## License
+
+RAGFlow is open-source under Apache 2.0 license.
+
+---
+
+*Documentation generated: 2025-11-26*