ragflow/personal_analyze
Claude 6d4dbbfe2c
docs: Add comprehensive DeepDoc deep guide documentation
Created in-depth documentation for understanding the deepdoc module:

- README.md: Complete deep guide with:
  - Big picture explanation (what problem deepdoc solves)
  - Data flow diagrams (Input → Processing → Output)
  - Detailed code analysis with line numbers
  - Technical explanations (ONNX, CTC, NMS, etc.)
  - Design reasoning (why certain technologies chosen)
  - Difficult terms glossary
  - Extension examples

- ocr_deep_dive.md: Deep dive into OCR subsystem
  - DBNet text detection architecture
  - CRNN text recognition
  - CTC decoding algorithm
  - Rotation handling
  - Performance optimization

- layout_table_deep_dive.md: Deep dive into layout/table recognition
  - YOLOv10 layout detection
  - Table structure recognition
  - Grid construction algorithm
  - Spanning cell handling
  - HTML/descriptive output generation
2025-11-27 03:46:14 +00:00
..
01-API-LAYER docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
02-SERVICE-LAYER docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
03-RAG-ENGINE docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
04-AGENT-SYSTEM docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
05-DOCUMENT-PROCESSING docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
06-ALGORITHMS docs: Add comprehensive algorithm documentation (50+ algorithms) 2025-11-27 03:34:49 +00:00
07-DEEPDOC-DEEP-GUIDE docs: Add comprehensive DeepDoc deep guide documentation 2025-11-27 03:46:14 +00:00
00-OVERVIEW.md docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
01_directory_structure.md docs: Add comprehensive RAGFlow analysis documentation 2025-11-26 10:20:05 +00:00
02_system_architecture.md docs: Add comprehensive RAGFlow analysis documentation 2025-11-26 10:20:05 +00:00
03_sequence_diagrams.md docs: Add comprehensive RAGFlow analysis documentation 2025-11-26 10:20:05 +00:00
04_modules_analysis.md docs: Add comprehensive RAGFlow analysis documentation 2025-11-26 10:20:05 +00:00
05_tech_stack.md docs: Add comprehensive RAGFlow analysis documentation 2025-11-26 10:20:05 +00:00
06_source_code_analysis.md docs: Add comprehensive RAGFlow analysis documentation 2025-11-26 10:20:05 +00:00
README.md docs: Add comprehensive RAGFlow analysis documentation 2025-11-26 10:20:05 +00:00

RAGFlow Analysis Documentation

Tài liệu phân tích chi tiết về RAGFlow - Open-source RAG Engine.

Tổng Quan RAGFlow

RAGFlow (v0.22.1) là một Retrieval-Augmented Generation (RAG) engine mã nguồn mở, được xây dựng dựa trên deep document understanding. Đây là một ứng dụng full-stack với:

  • Backend: Python (Flask/Quart)
  • Frontend: React/TypeScript (UmiJS)
  • Kiến trúc: Microservices với Docker
  • Data Stores: MySQL, Elasticsearch/Infinity, Redis, MinIO

Danh Sách Tài Liệu

File Nội dung
01_directory_structure.md Cấu trúc cây thư mục chi tiết
02_system_architecture.md Kiến trúc hệ thống với diagrams
03_sequence_diagrams.md Sequence diagrams cho các flows chính
04_modules_analysis.md Phân tích chi tiết từng module
05_tech_stack.md Tech stack và dependencies
06_source_code_analysis.md Phân tích source code chi tiết

Tóm Tắt Chức Năng Chính

1. Document Processing

  • Upload và parse nhiều định dạng (PDF, Word, Excel, PPT, HTML...)
  • OCR và layout analysis cho PDF
  • Intelligent chunking strategies

2. RAG Pipeline

  • Hybrid search (Vector + BM25)
  • Multiple embedding models support
  • Reranking với cross-encoder

3. Chat/Dialog

  • Streaming responses (SSE)
  • Multi-knowledge base retrieval
  • Conversation history

4. Agent Workflows

  • Visual canvas builder
  • 15+ built-in components
  • 20+ external tool integrations

5. Knowledge Graph (GraphRAG)

  • Entity extraction và resolution
  • Graph-based retrieval
  • Relationship visualization

Kiến Trúc High-Level

┌─────────────────────────────────────────────────────────────────┐
│                         CLIENTS                                  │
│        Web App │ Mobile │ Python SDK │ REST API                 │
└────────────────────────────┬────────────────────────────────────┘
                             │
┌────────────────────────────┼────────────────────────────────────┐
│                       NGINX (Gateway)                            │
└────────────────────────────┬────────────────────────────────────┘
                             │
┌────────────────────────────┼────────────────────────────────────┐
│                    APPLICATION LAYER                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │RAGFlow Server│  │ Admin Server │  │  MCP Server  │          │
│  │  (Port 9380) │  │  (Port 9381) │  │  (Port 9382) │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
└────────────────────────────┬────────────────────────────────────┘
                             │
┌────────────────────────────┼────────────────────────────────────┐
│                     SERVICE LAYER                                │
│  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐        │
│  │  RAG   │ │DeepDoc │ │ Agent  │ │GraphRAG│ │Services│        │
│  │Pipeline│ │Parsers │ │ Canvas │ │ Engine │ │ Layer  │        │
│  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘        │
└────────────────────────────┬────────────────────────────────────┘
                             │
┌────────────────────────────┼────────────────────────────────────┐
│                      DATA LAYER                                  │
│  ┌────────┐ ┌────────────┐ ┌────────┐ ┌────────┐               │
│  │ MySQL  │ │Elasticsearch│ │ Redis  │ │ MinIO  │               │
│  │(5455)  │ │   (9200)    │ │ (6379) │ │ (9000) │               │
│  └────────┘ └────────────┘ └────────┘ └────────┘               │
└─────────────────────────────────────────────────────────────────┘

Tech Stack Summary

Layer Technologies
Frontend React 18, TypeScript, UmiJS, Ant Design, Tailwind CSS
Backend Python 3.10-3.12, Flask/Quart, Peewee ORM
AI/ML OpenAI, Sentence Transformers, Detectron2, PyTorch
Database MySQL 8, Elasticsearch 8, Redis 7
Storage MinIO (S3-compatible)
Infrastructure Docker, Nginx, Kubernetes/Helm

LLM Providers Supported

  • OpenAI (GPT-3.5, GPT-4, GPT-4V)
  • Anthropic (Claude 3)
  • Google (Gemini)
  • Alibaba (Qwen)
  • Groq, Mistral, Cohere
  • Ollama (local models)
  • 20+ more providers

Data Connectors

  • Enterprise: Confluence, Notion, SharePoint, Jira
  • Communication: Slack, Discord, Gmail, Teams
  • Storage: Google Drive, Dropbox, S3, WebDAV

Quick Stats

Metric Value
Total LOC ~62,000+
Python Files ~300+
TS/JS Files ~400+
Database Models 25+
API Endpoints ~50+
LLM Providers 20+
Data Connectors 15+

License

RAGFlow is open-source under Apache 2.0 license.


Documentation generated: 2025-11-26