docs: Add comprehensive RAGFlow analysis documentation

- Add directory structure analysis (01_directory_structure.md)
- Add system architecture with diagrams (02_system_architecture.md)
- Add sequence diagrams for main flows (03_sequence_diagrams.md)
- Add detailed modules analysis (04_modules_analysis.md)
- Add tech stack documentation (05_tech_stack.md)
- Add source code analysis (06_source_code_analysis.md)
- Add README summary for personal_analyze folder

This documentation provides:
- Complete codebase structure overview
- System architecture diagrams (ASCII art)
- Sequence diagrams for authentication, RAG, chat, agent flows
- Detailed analysis of API, RAG, DeepDoc, Agent, GraphRAG modules
- Full tech stack with 150+ dependencies analyzed
- Source code patterns and best practices analysis

2025-11-26 10:20:05 +00:00

27 KiB

Raw Blame History

RAGFlow - Phân Tích Chi Tiết Các Module

1. Module API (`/api/`)

1.1 Tổng Quan

Module API là trung tâm xử lý tất cả HTTP requests của hệ thống. Được xây dựng trên Flask/Quart framework với kiến trúc Blueprint.

1.2 Cấu Trúc

api/
├── ragflow_server.py      # Entry point - Khởi tạo Flask app
├── settings.py            # Cấu hình server
├── constants.py           # API_VERSION = "v1"
├── validation.py          # Request validation
│
├── apps/                  # API Blueprints
├── db/                    # Database layer
└── utils/                 # Utilities

1.3 Chi Tiết Các Blueprint (API Apps)

1.3.1 `kb_app.py` - Knowledge Base Management

Chức năng: Quản lý Knowledge Base (tạo, xóa, sửa, liệt kê)

Endpoints chính:

Method	Endpoint	Mô tả
POST	`/api/v1/kb/create`	Tạo KB mới
GET	`/api/v1/kb/list`	Liệt kê KBs
PUT	`/api/v1/kb/update`	Cập nhật KB
DELETE	`/api/v1/kb/delete`	Xóa KB
GET	`/api/v1/kb/{id}`	Chi tiết KB

Logic chính:

Validation tenant permissions
Tạo Elasticsearch index cho mỗi KB
Quản lý embedding model settings
Quản lý parser configurations

1.3.2 `document_app.py` - Document Management

Chức năng: Upload, parsing, và quản lý documents

Endpoints chính:

Method	Endpoint	Mô tả
POST	`/api/v1/document/upload`	Upload file
POST	`/api/v1/document/run`	Trigger parsing
GET	`/api/v1/document/list`	Liệt kê docs
DELETE	`/api/v1/document/delete`	Xóa document
GET	`/api/v1/document/{id}/chunks`	Lấy chunks

Logic chính:

File type validation
MinIO storage integration
Background task queuing
Parsing status tracking

1.3.3 `dialog_app.py` - Chat/Dialog Management

Chức năng: Xử lý chat conversations với RAG

Endpoints chính:

Method	Endpoint	Mô tả
POST	`/api/v1/dialog/create`	Tạo dialog
POST	`/api/v1/dialog/chat`	Chat (SSE streaming)
POST	`/api/v1/dialog/completion`	Non-streaming chat
GET	`/api/v1/dialog/list`	Liệt kê dialogs

Logic chính:

RAG pipeline orchestration
Streaming response (SSE)
Conversation history management
Multi-KB retrieval

1.3.4 `canvas_app.py` - Agent Workflow

Chức năng: Visual workflow builder cho AI agents

Endpoints chính:

Method	Endpoint	Mô tả
POST	`/api/v1/canvas/create`	Tạo workflow
POST	`/api/v1/canvas/run`	Execute workflow
PUT	`/api/v1/canvas/update`	Cập nhật
GET	`/api/v1/canvas/list`	Liệt kê

Logic chính:

DSL parsing và validation
Component orchestration
Tool integration
Variable passing between nodes

1.3.5 `file_app.py` - File Management

Chức năng: Upload, download, quản lý files

Endpoints chính:

Method	Endpoint	Mô tả
POST	`/api/v1/file/upload`	Upload file
GET	`/api/v1/file/download/{id}`	Download
GET	`/api/v1/file/list`	Liệt kê files
DELETE	`/api/v1/file/delete`	Xóa file

1.3.6 `search_app.py` - Search Operations

Chức năng: Full-text và semantic search

Endpoints chính:

Method	Endpoint	Mô tả
POST	`/api/v1/search`	Hybrid search
GET	`/api/v1/search/history`	Search history

1.4 Database Services (`/api/db/services/`)

`dialog_service.py` (37KB - Service phức tạp nhất)

class DialogService:
    def chat(dialog_id, question, stream=True):
        """
        Main RAG chat function
        1. Load dialog configuration
        2. Get relevant documents (retrieval)
        3. Rerank results
        4. Build prompt with context
        5. Call LLM (streaming)
        6. Save conversation
        """

    def retrieval(dialog, question):
        """
        Hybrid retrieval from Elasticsearch
        - Vector similarity search
        - BM25 full-text search
        - Score combination
        """

    def rerank(chunks, question):
        """
        Cross-encoder reranking
        - Score each chunk against question
        - Return top-k
        """

`document_service.py` (39KB)

class DocumentService:
    def upload(file, kb_id):
        """Upload file to MinIO, create DB record"""

    def parse(doc_id):
        """Queue document for background parsing"""

    def chunk(doc_id, chunks):
        """Save parsed chunks to ES and DB"""

    def delete(doc_id):
        """Remove doc, chunks, and file"""

`knowledgebase_service.py` (21KB)

class KnowledgebaseService:
    def create(name, embedding_model, parser_id):
        """Create KB with ES index"""

    def update_parser_config(kb_id, config):
        """Update chunking/parsing settings"""

    def get_statistics(kb_id):
        """Get doc count, chunk count, etc."""

1.5 Database Models (`/api/db/db_models.py`)

25+ Models quan trọng:

# User & Tenant
class User(BaseModel):
    id, email, password, nickname, avatar, status, login_channel

class Tenant(BaseModel):
    id, name, public_key, llm_id, embd_id, parser_id, credit

class UserTenant(BaseModel):
    user_id, tenant_id, role  # owner, admin, member

# Knowledge Management
class Knowledgebase(BaseModel):
    id, tenant_id, name, description, embd_id, parser_id,
    similarity_threshold, vector_similarity_weight, ...

class Document(BaseModel):
    id, kb_id, name, location, size, type, parser_id,
    status, progress, chunk_num, token_num, process_duation

class File(BaseModel):
    id, tenant_id, name, size, location, type, source_type

# Chat & Dialog
class Dialog(BaseModel):
    id, tenant_id, name, description, kb_ids, llm_id,
    prompt_config, similarity_threshold, top_n, top_k

class Conversation(BaseModel):
    id, dialog_id, name, message  # JSON array of messages

# Workflow
class UserCanvas(BaseModel):
    id, tenant_id, name, dsl, avatar  # DSL is workflow definition

class CanvasTemplate(BaseModel):
    id, name, dsl, avatar  # Pre-built templates

# Integration
class APIToken(BaseModel):
    id, tenant_id, token, dialog_id  # For external API access

class MCPServer(BaseModel):
    id, tenant_id, name, host, tools  # MCP server config

2. Module RAG (`/rag/`)

2.1 Tổng Quan

Core RAG processing engine - xử lý từ document parsing đến retrieval.

2.2 LLM Abstractions (`/rag/llm/`)

`chat_model.py` - Chat LLM Interface

class Base:
    """Abstract base for all chat models"""
    def chat(messages, stream=True, **kwargs):
        """Generate chat completion"""

class OpenAIChat(Base):
    """OpenAI GPT models"""

class ClaudeChat(Base):
    """Anthropic Claude models"""

class QwenChat(Base):
    """Alibaba Qwen models"""

class OllamaChat(Base):
    """Local Ollama models"""

# Factory function
def get_chat_model(model_name, api_key, base_url):
    """Return appropriate chat model instance"""

Supported Providers (20+):

OpenAI (GPT-3.5, GPT-4, GPT-4V)
Anthropic (Claude 3)
Google (Gemini)
Alibaba (Qwen, Qwen-VL)
Groq
Mistral
Cohere
DeepSeek
Zhipu (GLM)
Moonshot
Ollama (local)
NVIDIA
Bedrock (AWS)
Azure OpenAI
Hugging Face
...

`embedding_model.py` - Embedding Interface

class Base:
    """Abstract base for embeddings"""
    def encode(texts: List[str]) -> List[List[float]]:
        """Generate embeddings for texts"""

class OpenAIEmbed(Base):
    """text-embedding-ada-002, text-embedding-3-*"""

class BGEEmbed(Base):
    """BAAI BGE models"""

class JinaEmbed(Base):
    """Jina AI embeddings"""

# Supported embedding models:
# - OpenAI: ada-002, embedding-3-small, embedding-3-large
# - BGE: bge-base, bge-large, bge-m3
# - Jina: jina-embeddings-v2
# - Cohere: embed-english-v3
# - HuggingFace: sentence-transformers
# - Local: Ollama embeddings

`rerank_model.py` - Reranking Interface

class Base:
    """Abstract base for rerankers"""
    def rerank(query: str, documents: List[str]) -> List[float]:
        """Score documents against query"""

class CohereRerank(Base):
    """Cohere rerank models"""

class JinaRerank(Base):
    """Jina AI reranker"""

class BGERerank(Base):
    """BAAI BGE reranker"""

2.3 RAG Pipeline (`/rag/flow/`)

Pipeline Architecture

Document → Parser → Tokenizer → Splitter → Embedder → Index

`parser/parser.py`

def parse(file_path, parser_config):
    """
    Parse document based on file type
    Returns: List of text segments with metadata
    """
    # Supported parsers:
    # - naive: Simple text extraction
    # - paper: Academic paper structure
    # - book: Book chapter detection
    # - laws: Legal document parsing
    # - presentation: PPT parsing
    # - qa: Q&A format extraction
    # - table: Table extraction
    # - picture: Image description
    # - one: Single chunk per doc
    # - audio: Audio transcription
    # - email: Email thread parsing

`splitter/splitter.py`

class Splitter:
    """Document chunking strategies"""

    def split_by_tokens(text, chunk_size=512, overlap=128):
        """Token-based splitting"""

    def split_by_sentences(text, max_sentences=10):
        """Sentence-based splitting"""

    def split_by_delimiter(text, delimiter='\n\n'):
        """Delimiter-based splitting"""

    def split_semantic(text, threshold=0.5):
        """Semantic similarity based splitting"""

`tokenizer/tokenizer.py`

class Tokenizer:
    """Text tokenization"""

    def tokenize(text):
        """Convert text to tokens"""

    def count_tokens(text):
        """Count tokens in text"""

    # Uses tiktoken for OpenAI models
    # Uses model-specific tokenizers for others

2.4 RAPTOR (`/rag/raptor.py`)

RAPTOR = Recursive Abstractive Processing for Tree-Organized Retrieval

class RAPTOR:
    """
    Hierarchical document representation
    - Clusters similar chunks
    - Creates summaries of clusters
    - Builds tree structure for retrieval
    """

    def build_tree(chunks):
        """Build RAPTOR tree from chunks"""

    def retrieve(query, tree):
        """Retrieve from tree structure"""

3. Module DeepDoc (`/deepdoc/`)

3.1 Tổng Quan

Deep document understanding với layout analysis và OCR.

3.2 Document Parsers (`/deepdoc/parser/`)

`pdf_parser.py` - PDF Processing

class PdfParser:
    """
    Advanced PDF parsing with:
    - OCR for scanned pages
    - Layout analysis (tables, figures, headers)
    - Multi-column detection
    - Image extraction
    """

    def __call__(file_path):
        """Parse PDF file"""
        # 1. Extract text with PyMuPDF
        # 2. Apply OCR if needed (Tesseract)
        # 3. Analyze layout (detectron2/layoutlm)
        # 4. Extract tables (camelot/tabula)
        # 5. Extract images
        # Return structured content

`docx_parser.py` - Word Documents

class DocxParser:
    """
    Parse .docx files
    - Text extraction
    - Table extraction
    - Image extraction
    - Style preservation
    """

`excel_parser.py` - Spreadsheets

class ExcelParser:
    """
    Parse .xlsx/.xls files
    - Sheet-by-sheet processing
    - Table structure preservation
    - Formula evaluation
    """

`html_parser.py` - Web Pages

class HtmlParser:
    """
    Parse HTML content
    - Clean HTML
    - Extract main content
    - Handle tables
    - Remove scripts/styles
    """

3.3 Vision Module (`/deepdoc/vision/`)

class LayoutAnalyzer:
    """
    Document layout analysis using ML
    - Detectron2 for object detection
    - LayoutLM for document understanding
    """

    def analyze(image):
        """
        Detect document regions:
        - Title
        - Paragraph
        - Table
        - Figure
        - Header/Footer
        - List
        """

4. Module Agent (`/agent/`)

4.1 Tổng Quan

Agentic workflow system với visual canvas builder.

4.2 Canvas Engine (`/agent/canvas.py`)

class Canvas:
    """
    Main workflow orchestrator
    - Parse DSL definition
    - Execute components in order
    - Handle branching logic
    - Manage variables
    """

    def __init__(self, dsl):
        """Initialize from DSL"""
        self.components = self._parse_dsl(dsl)
        self.graph = self._build_graph()

    def run(self, input_data):
        """Execute workflow"""
        context = {"input": input_data}

        for component in self._topological_sort():
            result = component.execute(context)
            context.update(result)

        return context["output"]

4.3 Components (`/agent/component/`)

`begin.py` - Workflow Start

class BeginComponent:
    """
    Entry point of workflow
    - Initialize variables
    - Receive user input
    """
    def execute(self, context):
        return {"user_input": context["input"]}

`llm.py` - LLM Component

class LLMComponent:
    """
    Call LLM with configured prompt
    - Template variable substitution
    - Streaming support
    - Output parsing
    """
    def execute(self, context):
        prompt = self.template.format(**context)
        response = self.llm.chat(prompt)
        return {"llm_output": response}

`retrieval.py` - Retrieval Component

class RetrievalComponent:
    """
    Search knowledge bases
    - Multi-KB search
    - Configurable top_k
    - Score threshold
    """
    def execute(self, context):
        query = context["user_input"]
        results = self.search(query, self.kb_ids)
        return {"retrieved_docs": results}

`categorize.py` - Conditional Branching

class CategorizeComponent:
    """
    Route to different paths based on conditions
    - LLM-based classification
    - Rule-based matching
    """
    def execute(self, context):
        category = self._classify(context)
        return {"next_node": self.routes[category]}

`agent_with_tools.py` - Tool-Using Agent

class AgentWithToolsComponent:
    """
    ReAct pattern agent
    - Tool selection
    - Iterative reasoning
    - Observation handling
    """
    def execute(self, context):
        while not done:
            action = self.llm.decide_action(context)
            if action.type == "tool":
                result = self.tools[action.tool].run(action.input)
                context["observation"] = result
            else:
                return {"output": action.response}

4.4 Tools (`/agent/tools/`)

External Tool Integrations

Tool	File	Chức năng
Tavily	`tavily.py`	Web search API
ArXiv	`arxiv.py`	Academic paper search
Google	`google.py`	Google search
Wikipedia	`wikipedia.py`	Wikipedia lookup
GitHub	`github.py`	GitHub API
Email	`email.py`	Send emails
Code Exec	`code_exec.py`	Execute Python code
DeepL	`deepl.py`	Translation
Jin10	`jin10.py`	Financial news
TuShare	`tushare.py`	Chinese stock data
Yahoo Finance	`yahoofinance.py`	Stock data
QWeather	`qweather.py`	Weather data

class BaseTool:
    """Base class for all tools"""
    name: str
    description: str

    def run(self, input: str) -> str:
        """Execute tool and return result"""

class TavilySearch(BaseTool):
    name = "tavily_search"
    description = "Search the web for current information"

    def run(self, query):
        response = tavily.search(query)
        return format_results(response)

5. Module GraphRAG (`/graphrag/`)

5.1 Tổng Quan

Knowledge graph construction và querying.

5.2 Entity Resolution (`/graphrag/entity_resolution.py`)

class EntityResolution:
    """
    Entity extraction và linking
    - Extract entities from text
    - Cluster similar entities
    - Resolve duplicates
    """

    def extract_entities(text):
        """Extract named entities using LLM"""
        prompt = f"Extract entities from: {text}"
        return llm.chat(prompt)

    def resolve_entities(entities):
        """Merge duplicate entities"""
        clusters = self._cluster_similar(entities)
        return self._merge_clusters(clusters)

5.3 Graph Search (`/graphrag/search.py`)

class GraphSearch:
    """
    Query knowledge graph
    - Entity-based search
    - Relationship traversal
    - Subgraph extraction
    """

    def search(query):
        """Find relevant subgraph for query"""
        # 1. Extract query entities
        # 2. Find matching graph entities
        # 3. Traverse relationships
        # 4. Return context subgraph

6. Module Frontend (`/web/`)

6.1 Tổng Quan

React/TypeScript SPA với UmiJS framework.

6.2 Pages (`/web/src/pages/`)

Page	Chức năng
`/dataset`	Knowledge base management
`/datasets`	Dataset list view
`/next-chats`	Chat interface
`/next-searches`	Search interface
`/document-viewer`	Document preview
`/admin`	Admin dashboard
`/login`	Authentication
`/register`	User registration

6.3 Components (`/web/src/components/`)

Core Components:

file-upload-modal/ - File upload UI
pdf-drawer/ - PDF preview drawer
prompt-editor/ - Prompt template editor
document-preview/ - Document viewer
llm-setting-items/ - LLM configuration UI
ui/ - Shadcn/UI base components

6.4 State Management

// Using Zustand for state
import { create } from 'zustand';

interface KnowledgebaseStore {
  knowledgebases: Knowledgebase[];
  currentKb: Knowledgebase | null;
  fetchKnowledgebases: () => Promise<void>;
  createKnowledgebase: (data: CreateKbRequest) => Promise<void>;
}

export const useKnowledgebaseStore = create<KnowledgebaseStore>((set) => ({
  knowledgebases: [],
  currentKb: null,
  fetchKnowledgebases: async () => {
    const data = await api.get('/kb/list');
    set({ knowledgebases: data });
  },
  // ...
}));

6.5 API Services (`/web/src/services/`)

// API client using Axios
import { request } from 'umi';

export async function createKnowledgebase(data: CreateKbRequest) {
  return request('/api/v1/kb/create', {
    method: 'POST',
    data,
  });
}

export async function chat(dialogId: string, question: string) {
  return request('/api/v1/dialog/chat', {
    method: 'POST',
    data: { dialog_id: dialogId, question },
    responseType: 'stream',
  });
}

7. Module Common (`/common/`)

7.1 Configuration (`/common/settings.py`)

# Main configuration file
class Settings:
    # Database
    MYSQL_HOST = os.getenv('MYSQL_HOST', 'localhost')
    MYSQL_PORT = int(os.getenv('MYSQL_PORT', 5455))
    MYSQL_USER = os.getenv('MYSQL_USER', 'root')
    MYSQL_PASSWORD = os.getenv('MYSQL_PASSWORD', 'infini_rag_flow')
    MYSQL_DATABASE = os.getenv('MYSQL_DATABASE', 'ragflow')

    # Elasticsearch
    ES_HOSTS = os.getenv('ES_HOSTS', 'http://localhost:9200').split(',')

    # Redis
    REDIS_HOST = os.getenv('REDIS_HOST', 'localhost')
    REDIS_PORT = int(os.getenv('REDIS_PORT', 6379))

    # MinIO
    MINIO_HOST = os.getenv('MINIO_HOST', 'localhost:9000')
    MINIO_ACCESS_KEY = os.getenv('MINIO_USER', 'rag_flow')
    MINIO_SECRET_KEY = os.getenv('MINIO_PASSWORD', 'infini_rag_flow')

    # Document Engine
    DOC_ENGINE = os.getenv('DOC_ENGINE', 'elasticsearch')  # or 'infinity'

7.2 Data Source Connectors (`/common/data_source/`)

Supported Connectors:

Connector	File	Chức năng
Confluence	`confluence_connector.py` (81KB)	Atlassian Confluence wiki
Notion	`notion_connector.py` (25KB)	Notion databases
Slack	`slack_connector.py` (22KB)	Slack messages
Gmail	`gmail_connector.py`	Gmail emails
Discord	`discord_connector.py`	Discord channels
SharePoint	`sharepoint_connector.py`	Microsoft SharePoint
Teams	`teams_connector.py`	Microsoft Teams
Dropbox	`dropbox_connector.py`	Dropbox files
Google Drive	`google_drive/`	Google Drive
WebDAV	`webdav_connector.py`	WebDAV servers
Moodle	`moodle_connector.py`	Moodle LMS

class BaseConnector:
    """Abstract base for connectors"""

    def authenticate(credentials):
        """Authenticate with external service"""

    def list_items():
        """List available items"""

    def sync():
        """Sync data to RAGFlow"""

class ConfluenceConnector(BaseConnector):
    """Confluence integration"""

    def __init__(self, url, username, api_token):
        self.client = Confluence(url, username, api_token)

    def sync_space(space_key):
        """Sync all pages from a space"""
        pages = self.client.get_all_pages(space_key)
        for page in pages:
            content = self._convert_to_markdown(page.body)
            yield Document(content=content, metadata=page.metadata)

8. Module SDK (`/sdk/python/`)

8.1 Python SDK

from ragflow import RAGFlow

# Initialize client
client = RAGFlow(
    api_key="your-api-key",
    base_url="http://localhost:9380"
)

# Create knowledge base
kb = client.create_knowledgebase(
    name="My KB",
    embedding_model="text-embedding-3-small"
)

# Upload document
doc = kb.upload_document("path/to/document.pdf")

# Wait for parsing
doc.wait_for_ready()

# Create chat
chat = client.create_chat(
    name="My Chat",
    knowledgebase_ids=[kb.id]
)

# Send message
response = chat.send_message("What is this document about?")
print(response.answer)

9. Tóm Tắt Module Dependencies

┌─────────────────────────────────────────────────────────────────┐
│                         Frontend (web/)                          │
└─────────────────────────────┬───────────────────────────────────┘
                              │ HTTP/SSE
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                          API (api/)                              │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐           │
│   │ kb_app  │  │doc_app  │  │dialog_  │  │canvas_  │           │
│   │         │  │         │  │app      │  │app      │           │
│   └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘           │
│        └────────────┴───────────┴────────────┘                  │
│                              │                                   │
│   ┌──────────────────────────┴──────────────────────────┐      │
│   │                    Services Layer                    │      │
│   │  DialogService │ DocumentService │ KBService         │      │
│   └───────────────────────────┬─────────────────────────┘      │
└───────────────────────────────┼─────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
        ▼                       ▼                       ▼
┌───────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   RAG (rag/)  │    │  Agent (agent/)  │    │GraphRAG(graphrag)│
│               │    │                  │    │                  │
│ - LLM Models  │    │ - Canvas Engine  │    │ - Entity Res.    │
│ - Pipeline    │    │ - Components     │    │ - Graph Search   │
│ - Embeddings  │    │ - Tools          │    │                  │
└───────┬───────┘    └────────┬─────────┘    └────────┬─────────┘
        │                     │                       │
        └─────────────────────┼───────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DeepDoc (deepdoc/)                          │
│                                                                  │
│   PDF Parser │ DOCX Parser │ HTML Parser │ Vision/OCR           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                       Common (common/)                           │
│                                                                  │
│   Settings │ Utilities │ Data Source Connectors                 │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Data Stores                                 │
│                                                                  │
│   MySQL │ Elasticsearch/Infinity │ Redis │ MinIO                │
└─────────────────────────────────────────────────────────────────┘

10. Kích Thước Code Ước Tính

Module	Lines of Code	Complexity
api/	~15,000	High
rag/	~8,000	High
deepdoc/	~5,000	Medium
agent/	~6,000	High
graphrag/	~3,000	Medium
web/src/	~20,000	High
common/	~5,000	Medium
Total	~62,000	-

27 KiB Raw Blame History

RAGFlow - Phân Tích Chi Tiết Các Module

1. Module API (/api/)

1.1 Tổng Quan

1.2 Cấu Trúc

1.3 Chi Tiết Các Blueprint (API Apps)

1.3.1 kb_app.py - Knowledge Base Management

1.3.2 document_app.py - Document Management

1.3.3 dialog_app.py - Chat/Dialog Management

1.3.4 canvas_app.py - Agent Workflow

1.3.5 file_app.py - File Management

1.3.6 search_app.py - Search Operations

1.4 Database Services (/api/db/services/)

dialog_service.py (37KB - Service phức tạp nhất)

document_service.py (39KB)

knowledgebase_service.py (21KB)

1.5 Database Models (/api/db/db_models.py)

2. Module RAG (/rag/)

2.1 Tổng Quan

2.2 LLM Abstractions (/rag/llm/)

chat_model.py - Chat LLM Interface

embedding_model.py - Embedding Interface

rerank_model.py - Reranking Interface

2.3 RAG Pipeline (/rag/flow/)

Pipeline Architecture

parser/parser.py

splitter/splitter.py

tokenizer/tokenizer.py

2.4 RAPTOR (/rag/raptor.py)

3. Module DeepDoc (/deepdoc/)

3.1 Tổng Quan

3.2 Document Parsers (/deepdoc/parser/)

pdf_parser.py - PDF Processing

docx_parser.py - Word Documents

excel_parser.py - Spreadsheets

html_parser.py - Web Pages

3.3 Vision Module (/deepdoc/vision/)

4. Module Agent (/agent/)

4.1 Tổng Quan

4.2 Canvas Engine (/agent/canvas.py)

4.3 Components (/agent/component/)

begin.py - Workflow Start

llm.py - LLM Component

retrieval.py - Retrieval Component

categorize.py - Conditional Branching

agent_with_tools.py - Tool-Using Agent

4.4 Tools (/agent/tools/)

External Tool Integrations

5. Module GraphRAG (/graphrag/)

5.1 Tổng Quan

5.2 Entity Resolution (/graphrag/entity_resolution.py)

5.3 Graph Search (/graphrag/search.py)

6. Module Frontend (/web/)

6.1 Tổng Quan

6.2 Pages (/web/src/pages/)

6.3 Components (/web/src/components/)

6.4 State Management

6.5 API Services (/web/src/services/)

7. Module Common (/common/)

7.1 Configuration (/common/settings.py)

7.2 Data Source Connectors (/common/data_source/)

8. Module SDK (/sdk/python/)

8.1 Python SDK

9. Tóm Tắt Module Dependencies

10. Kích Thước Code Ước Tính

27 KiB

Raw Blame History

1. Module API (`/api/`)

1.3.1 `kb_app.py` - Knowledge Base Management

1.3.2 `document_app.py` - Document Management

1.3.3 `dialog_app.py` - Chat/Dialog Management

1.3.4 `canvas_app.py` - Agent Workflow

1.3.5 `file_app.py` - File Management

1.3.6 `search_app.py` - Search Operations

1.4 Database Services (`/api/db/services/`)

`dialog_service.py` (37KB - Service phức tạp nhất)

`document_service.py` (39KB)

`knowledgebase_service.py` (21KB)

1.5 Database Models (`/api/db/db_models.py`)

2. Module RAG (`/rag/`)

2.2 LLM Abstractions (`/rag/llm/`)

`chat_model.py` - Chat LLM Interface

`embedding_model.py` - Embedding Interface

`rerank_model.py` - Reranking Interface

2.3 RAG Pipeline (`/rag/flow/`)

`parser/parser.py`

`splitter/splitter.py`

`tokenizer/tokenizer.py`

2.4 RAPTOR (`/rag/raptor.py`)

3. Module DeepDoc (`/deepdoc/`)

3.2 Document Parsers (`/deepdoc/parser/`)

`pdf_parser.py` - PDF Processing

`docx_parser.py` - Word Documents

`excel_parser.py` - Spreadsheets

`html_parser.py` - Web Pages

3.3 Vision Module (`/deepdoc/vision/`)

4. Module Agent (`/agent/`)

4.2 Canvas Engine (`/agent/canvas.py`)

4.3 Components (`/agent/component/`)

`begin.py` - Workflow Start

`llm.py` - LLM Component

`retrieval.py` - Retrieval Component

`categorize.py` - Conditional Branching

`agent_with_tools.py` - Tool-Using Agent

4.4 Tools (`/agent/tools/`)

5. Module GraphRAG (`/graphrag/`)

5.2 Entity Resolution (`/graphrag/entity_resolution.py`)

5.3 Graph Search (`/graphrag/search.py`)

6. Module Frontend (`/web/`)

6.2 Pages (`/web/src/pages/`)

6.3 Components (`/web/src/components/`)

6.5 API Services (`/web/src/services/`)

7. Module Common (`/common/`)

7.1 Configuration (`/common/settings.py`)

7.2 Data Source Connectors (`/common/data_source/`)

8. Module SDK (`/sdk/python/`)