2025-12-01 05:06:53 +07:00

55 KiB

Raw Blame History

Cách Agent Workflow Hoạt Động trong RAGFlow

Tổng Quan Hệ Thống

RAGFlow triển khai một hệ thống agent workflow mạnh mẽ cho phép người dùng định nghĩa và thực thi các quy trình làm việc phức tạp thông qua một Domain Specific Language (DSL) dạng JSON. Hệ thống này được xây dựng dựa trên kiến trúc đồ thị (graph-based architecture) với các component có thể kết nối và tương tác với nhau.

1. Kiến Trúc DSL (Domain Specific Language)

1.1. Cấu Trúc Tổng Thể của DSL

DSL trong RAGFlow là một cấu trúc JSON được thiết kế để mô tả workflow dưới dạng đồ thị có hướng (Directed Graph). Mỗi workflow được định nghĩa trong file JSON với cấu trúc:

{
  "components": {
    "component_id": {
      "obj": {
        "component_name": "ComponentType",
        "params": { /* cấu hình component */ }
      },
      "downstream": ["next_component_id"],
      "upstream": ["previous_component_id"]
    }
  },
  "globals": {
    "sys.query": "",
    "sys.user_id": "",
    "sys.conversation_turns": 0,
    "sys.files": []
  },
  "variables": { /* biến do người dùng định nghĩa */ },
  "history": [ /* lịch sử hội thoại */ ],
  "path": ["begin"],
  "retrieval": { "chunks": [], "doc_aggs": [] },
  "memory": []
}

File tham chiếu: ragflow/agent/canvas.py:36-73

1.2. Các Thành Phần Chính của DSL

a) Components - Các Node trong Đồ Thị

Mỗi component đại diện cho một bước xử lý trong workflow. Các component được định danh bằng ID duy nhất theo format ComponentType:UniqueIdentifier.

Ví dụ thực tế từ customer_service.json:

"Agent:TwelveOwlsWatch": {
  "downstream": ["VariableAggregator:FuzzyBerriesFlow"],
  "obj": {
    "component_name": "Agent",
    "params": {
      "llm_id": "deepseek-chat@DeepSeek",
      "max_rounds": 5,
      "sys_prompt": "You are a friendly and casual conversational assistant...",
      "prompts": [
        {
          "content": "The user query is {sys.query}",
          "role": "user"
        }
      ]
    }
  },
  "upstream": ["Categorize:DullFriendsThank"]
}

File tham chiếu: ragflow/agent/templates/customer_service.json:116-166

b) Downstream/Upstream - Định Nghĩa Luồng Thực Thi

downstream: Danh sách các component tiếp theo sẽ được thực thi
upstream: Danh sách các component đã thực thi trước đó

Đây là cách DSL mô tả directed graph - các node kết nối với nhau tạo thành luồng xử lý.

Logic trong code (ragflow/agent/canvas.py:543):

# Sau khi một component thực thi xong, hệ thống sẽ thêm các downstream vào path
_extend_path(cpn["downstream"])

c) Globals - Biến Hệ Thống

Các biến toàn cục được hệ thống quản lý:

sys.query: Câu hỏi của người dùng
sys.user_id: ID người dùng
sys.conversation_turns: Số lượt hội thoại
sys.files: Danh sách file đính kèm
env.*: Biến môi trường do người dùng định nghĩa

File tham chiếu: ragflow/agent/canvas.py:278-283, 343-348

d) Variables - Hệ Thống Tham Chiếu Biến

DSL hỗ trợ tham chiếu động giữa các component thông qua cú pháp {component_id@output_name}.

Ví dụ:

{
  "content": "The user query is {sys.query}\n\nThe relevant document are {Retrieval:ShyPumasJoke@formalized_content}"
}

Regex pattern (ragflow/agent/component/base.py:396):

variable_ref_patt = r"\{* *\{([a-zA-Z:0-9]+@[A-Za-z0-9_.]+|sys\.[A-Za-z0-9_.]+|env\.[A-Za-z0-9_.]+)\} *\}*"

Cơ chế resolve biến (ragflow/agent/canvas.py:158-183):

def get_value_with_variable(self, value: str) -> Any:
    pat = re.compile(r"\{* *\{([a-zA-Z:0-9]+@[A-Za-z0-9_.]+|sys\.[A-Za-z0-9_.]+|env\.[A-Za-z0-9_.]+)\} *\}*")
    out_parts = []
    last = 0

    for m in pat.finditer(value):
        out_parts.append(value[last:m.start()])
        key = m.group(1)
        v = self.get_variable_value(key)  # Lấy giá trị thực từ component hoặc globals
        # ... xử lý giá trị
        out_parts.append(rep)
        last = m.end()

    out_parts.append(value[last:])
    return "".join(out_parts)

1.3. Các Loại Component Có Sẵn

RAGFlow cung cấp một thư viện component phong phú:

Component	File	Chức Năng
Begin	`agent/component/begin.py`	Entry point của workflow
Agent	`agent/component/agent_with_tools.py`	LLM agent với khả năng gọi tool
LLM	`agent/component/llm.py`	Gọi LLM cơ bản không có tool
Retrieval	`agent/tools/retrieval.py`	Tìm kiếm trong knowledge base
Categorize	`agent/component/categorize.py`	Phân loại intent bằng LLM
Switch	`agent/component/switch.py`	Điều kiện rẽ nhánh
Iteration	`agent/component/iteration.py`	Vòng lặp for-each
Loop	`agent/component/loop.py`	Vòng lặp while
Message	`agent/component/message.py`	Format output trả về user
VariableAggregator	`agent/component/variable_assigner.py`	Gộp kết quả từ nhiều nhánh

File tham chiếu: ragflow/agent/component/__init__.py:51-58

2. Execution Engine - Cơ Chế Thực Thi Workflow

2.1. Class Graph - Core Engine

Class Graph là base class chứa logic cốt lõi để load và thực thi workflow.

File: ragflow/agent/canvas.py:34-273

Quá Trình Load DSL

def load(self):
    self.components = self.dsl["components"]

    # Duyệt qua từng component trong DSL
    for k, cpn in self.components.items():
        # Tạo object ComponentParam từ tên component
        param = component_class(cpn["obj"]["component_name"] + "Param")()
        param.update(cpn["obj"]["params"])

        # Validate parameters
        param.check()

        # Tạo object Component thực tế
        cpn["obj"] = component_class(cpn["obj"]["component_name"])(self, k, param)

    self.path = self.dsl["path"]

File tham chiếu: ragflow/agent/canvas.py:84-101

Giải thích logic:

Parse JSON DSL thành dictionary
Với mỗi component, tạo object Parameter tương ứng (VD: AgentParam, LLMParam)
Validate parameters bằng method check()
Tạo object Component thực tế (VD: Agent, LLM) và inject vào graph
Component được truyền reference tới self (Canvas) để có thể truy cập biến global

2.2. Class Canvas - Agent Workflow Executor

Class Canvas kế thừa Graph và triển khai logic đặc thù cho agent workflow.

File: ragflow/agent/canvas.py:275-676

Phương Thức `run()` - Trái Tim của Execution Engine

Đây là async generator function thực thi workflow và yield các event real-time.

Signature:

async def run(self, **kwargs):
    # Nhận tham số: query, files, user_id, inputs

File tham chiếu: ragflow/agent/canvas.py:358-583

Chi Tiết Từng Bước Thực Thi

Bước 1: Khởi Tạo State

st = time.perf_counter()
self.message_id = get_uuid()
created_at = int(time.time())

# Lưu query vào history
self.add_user_input(kwargs.get("query"))

# Reset output của tất cả component
for k, cpn in self.components.items():
    self.components[k]["obj"].reset(True)

File tham chiếu: ragflow/agent/canvas.py:359-364

Bước 2: Set System Variables

for k in kwargs.keys():
    if k in ["query", "user_id", "files"] and kwargs[k]:
        if k == "files":
            self.globals[f"sys.{k}"] = FileService.get_files(kwargs[k])
        else:
            self.globals[f"sys.{k}"] = kwargs[k]

# Tăng conversation turn counter
self.globals["sys.conversation_turns"] += 1

File tham chiếu: ragflow/agent/canvas.py:372-380

Logic: Các tham số từ user được map vào global variables để các component có thể reference bằng {sys.query}, {sys.files}, v.v.

Bước 3: Path Initialization

if not self.path or self.path[-1].lower().find("userfillup") < 0:
    self.path.append("begin")
    self.retrieval.append({"chunks": [], "doc_aggs": []})

File tham chiếu: ragflow/agent/canvas.py:393-395

Logic: Path là một list lưu trữ thứ tự các component đã/đang/sẽ thực thi. Mọi workflow đều bắt đầu từ component begin.

Bước 4: Yield Workflow Started Event

def decorate(event, dt):
    return {
        "event": event,
        "message_id": self.message_id,
        "created_at": created_at,
        "task_id": self.task_id,
        "data": dt
    }

yield decorate("workflow_started", {"inputs": kwargs.get("inputs")})

File tham chiếu: ragflow/agent/canvas.py:382-402

Logic: Hệ thống sử dụng Server-Sent Events (SSE) để stream các event về frontend real-time. Mỗi event có format chuẩn với event, message_id, task_id, data.

Bước 5: Execute Components in Path

idx = len(self.path) - 1

while idx < len(self.path):
    to = len(self.path)

    # Yield node_started events
    for i in range(idx, to):
        yield decorate("node_started", {
            "component_id": self.path[i],
            "component_name": self.get_component_name(self.path[i]),
            "component_type": self.get_component_type(self.path[i])
        })

    # Execute batch of components
    _run_batch(idx, to)

    # ... post-processing

File tham chiếu: ragflow/agent/canvas.py:444-548

Logic:

path là dynamic array có thể mở rộng trong quá trình thực thi
Mỗi iteration thực thi một batch component từ idx đến to
Sau khi thực thi, các component có thể thêm downstream vào path, làm len(self.path) tăng
Vòng lặp tiếp tục cho đến khi không còn component nào trong path

Bước 6: Parallel Execution với ThreadPoolExecutor

def _run_batch(f, t):
    if self.is_canceled():
        raise TaskCanceledException(...)

    with ThreadPoolExecutor(max_workers=5) as executor:
        thr = []
        i = f
        while i < t:
            cpn = self.get_component_obj(self.path[i])

            if cpn.component_name.lower() in ["begin", "userfillup"]:
                thr.append(executor.submit(cpn.invoke, inputs=kwargs.get("inputs", {})))
                i += 1
            else:
                # Kiểm tra dependencies
                for _, ele in cpn.get_input_elements().items():
                    if isinstance(ele, dict) and ele.get("_cpn_id") and ele.get("_cpn_id") not in self.path[:i]:
                        # Nếu dependency chưa execute, skip component này
                        self.path.pop(i)
                        t -= 1
                        break
                else:
                    # Execute component
                    thr.append(executor.submit(cpn.invoke, **cpn.get_input()))
                    i += 1

        # Wait for all threads to complete
        for t in thr:
            t.result()

File tham chiếu: ragflow/agent/canvas.py:405-429

Logic quan trọng:

Hệ thống thực thi tối đa 5 component song song để tăng hiệu suất
Trước khi execute, kiểm tra dependencies: nếu component A reference output của component B mà B chưa execute, thì A bị remove khỏi path
Mỗi component được execute trong thread riêng biệt, nhưng vẫn đợi tất cả thread complete trước khi tiếp tục

Bước 7: Post-Processing & Branching Logic

for i in range(idx, to):
    cpn = self.get_component(self.path[i])
    cpn_obj = self.get_component_obj(self.path[i])

    # Xử lý streaming output cho Message component
    if cpn_obj.component_name.lower() == "message":
        if isinstance(cpn_obj.output("content"), partial):
            _m = ""
            for m in cpn_obj.output("content")():
                if m == "<think>":
                    yield decorate("message", {"content": "", "start_to_think": True})
                elif m == "</think>":
                    yield decorate("message", {"content": "", "end_to_think": True})
                else:
                    yield decorate("message", {"content": m})
                    _m += m
            cpn_obj.set_output("content", _m)
        else:
            yield decorate("message", {"content": cpn_obj.output("content")})

    # Xử lý error handling
    if cpn_obj.error():
        ex = cpn_obj.exception_handler()
        if ex and ex["goto"]:
            self.path.extend(ex["goto"])  # Jump to error handler
        elif ex and ex["default_value"]:
            yield decorate("message", {"content": ex["default_value"]})
        else:
            self.error = cpn_obj.error()

    # Branching logic
    if cpn_obj.component_name.lower() in ["categorize", "switch"]:
        # Categorize/Switch component quyết định nhánh tiếp theo
        _extend_path(cpn_obj.output("_next"))
    elif cpn_obj.component_name.lower() in ("iteration", "loop"):
        # Loop component thêm start node vào path
        _append_path(cpn_obj.get_start())
    else:
        # Component thường thêm downstream vào path
        _extend_path(cpn["downstream"])

    # Yield node_finished event
    yield _node_finished(cpn_obj)

File tham chiếu: ragflow/agent/canvas.py:459-543

Giải thích chi tiết:

Streaming Output: Nếu component trả về functools.partial, hệ thống sẽ iterate và yield từng chunk text real-time
Error Handling: Component có thể định nghĩa exception handler với goto (jump to error component) hoặc default_value (fallback response)
Branching:
- Categorize/Switch: Quyết định nhánh dựa trên classification result
- Iteration/Loop: Tạo vòng lặp bằng cách thêm start node của loop vào path
- Normal component: Thêm tất cả downstream vào path

Bước 8: Workflow Completion

if not self.error:
    yield decorate("workflow_finished", {
        "inputs": kwargs.get("inputs"),
        "outputs": self.get_component_obj(self.path[-1]).output(),
        "elapsed_time": time.perf_counter() - st,
        "created_at": st
    })

    # Lưu vào conversation history
    self.history.append(("assistant", self.get_component_obj(self.path[-1]).output()))

File tham chiếu: ragflow/agent/canvas.py:566-574

3. Component Architecture - Cách Định Nghĩa Component

3.1. ComponentParamBase - Base Class cho Parameters

Mọi component đều có một class Parameter tương ứng để validate và lưu trữ config.

File: ragflow/agent/component/base.py:37-391

Ví Dụ: AgentParam

class AgentParam(LLMParam, ToolParamBase):
    def __init__(self):
        super().__init__()
        self.function_name = "agent"
        self.tools = []
        self.mcp = []
        self.max_rounds = 5
        self.description = ""

File tham chiếu: ragflow/agent/component/agent_with_tools.py:38-79

Phương Thức Quan Trọng

def update(self, conf, allow_redundant=False):
    """
    Đệ quy update parameters từ JSON config
    Hỗ trợ nested parameters và validation
    """
    # ... implementation

def check(self):
    """
    Validate parameters
    Được gọi sau update() để đảm bảo config hợp lệ
    """
    raise NotImplementedError("Parameter Object should be checked.")

def as_dict(self):
    """
    Convert parameters object thành dict để serialize
    """
    # ... implementation

File tham chiếu: ragflow/agent/component/base.py:124-184, 54-55, 96-122

3.2. ComponentBase - Base Class cho Component

File: ragflow/agent/component/base.py:393-583

Constructor

def __init__(self, canvas, id, param: ComponentParamBase):
    from agent.canvas import Graph
    assert isinstance(canvas, Graph), "canvas must be an instance of Canvas"
    self._canvas = canvas  # Reference to workflow graph
    self._id = id          # Component ID
    self._param = param    # Parameters object
    self._param.check()    # Validate ngay khi khởi tạo

File tham chiếu: ragflow/agent/component/base.py:412-418

Logic: Mỗi component giữ reference tới Canvas để có thể:

Truy cập global variables: self._canvas.globals
Lấy output từ component khác: self._canvas.get_variable_value("other_component@output")
Add reference (citations): self._canvas.add_reference(chunks, doc_infos)

Phương Thức invoke() - Entry Point Execution

def invoke(self, **kwargs) -> dict[str, Any]:
    self.set_output("_created_time", time.perf_counter())

    try:
        self._invoke(**kwargs)  # Template method pattern
    except Exception as e:
        if self.get_exception_default_value():
            self.set_exception_default_value()
        else:
            self.set_output("_ERROR", str(e))
        logging.exception(e)

    self.set_output("_elapsed_time", time.perf_counter() - self.output("_created_time"))
    return self.output()

File tham chiếu: ragflow/agent/component/base.py:434-446

Logic:

invoke() là public method được Canvas gọi
Bên trong gọi _invoke() - abstract method mà subclass phải implement
Tự động track _created_time và _elapsed_time
Tự động catch exception và set _ERROR output

Abstract Method: _invoke()

@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 10*60)))
def _invoke(self, **kwargs):
    raise NotImplementedError()

File tham chiếu: ragflow/agent/component/base.py:448-450

Logic:

Subclass phải override method này để implement logic
Có timeout protection (default 10 phút)
Nhận **kwargs là input variables đã được resolve

3.3. Ví Dụ Cụ Thể: Agent Component

File: ragflow/agent/component/agent_with_tools.py

Constructor - Load Tools

def __init__(self, canvas, id, param: AgentParam):
    LLM.__init__(self, canvas, id, param)

    # Initialize tools dictionary
    self.tools = {}

    # Load built-in tool components
    for cpn in self._param.tools:
        cpn = self._load_tool_obj(cpn)
        self.tools[cpn.get_meta()["function"]["name"]] = cpn

    # Initialize LLM with multi-round support
    self.chat_mdl = LLMBundle(
        self._canvas.get_tenant_id(),
        TenantLLMService.llm_id2llm_type(self._param.llm_id),
        self._param.llm_id,
        max_retries=self._param.max_retries,
        retry_interval=self._param.delay_after_error,
        max_rounds=self._param.max_rounds,
        verbose_tool_use=True
    )

    # Collect tool metadata for LLM
    self.tool_meta = [v.get_meta() for _, v in self.tools.items()]

    # Load MCP (Model Context Protocol) tools
    for mcp in self._param.mcp:
        _, mcp_server = MCPServerService.get_by_id(mcp["mcp_id"])
        tool_call_session = MCPToolCallSession(mcp_server, mcp_server.variables)
        for tnm, meta in mcp["tools"].items():
            self.tool_meta.append(mcp_tool_metadata_to_openai_tool(meta))
            self.tools[tnm] = tool_call_session

    # Setup callback for tool usage tracking
    self.callback = partial(self._canvas.tool_use_callback, id)
    self.toolcall_session = LLMToolPluginCallSession(self.tools, self.callback)

File tham chiếu: ragflow/agent/component/agent_with_tools.py:84-107

Logic:

Load danh sách tools từ config (Retrieval, Wikipedia, TavilySearch, etc.)
Khởi tạo LLMBundle với config max_rounds (số vòng ReAct tối đa)
Load external tools từ MCP servers
Tạo callback để track tool usage (lưu vào Redis)

_invoke() Implementation - ReAct Loop

@timeout(int(os.environ.get("COMPONENT_EXEC_TIMEOUT", 20*60)))
def _invoke(self, **kwargs):
    if self.check_if_canceled("Agent processing"):
        return

    # Xử lý nested agent calls (khi agent A gọi agent B)
    if kwargs.get("user_prompt"):
        usr_pmt = ""
        if kwargs.get("reasoning"):
            usr_pmt += "\nREASONING:\n{}\n".format(kwargs["reasoning"])
        if kwargs.get("context"):
            usr_pmt += "\nCONTEXT:\n{}\n".format(kwargs["context"])
        usr_pmt += "\nQUERY:\n{}\n".format(str(kwargs["user_prompt"]))
        self._param.prompts = [{"role": "user", "content": usr_pmt}]

    # Nếu không có tools, fallback to simple LLM
    if not self.tools:
        return LLM._invoke(self, **kwargs)

    # Prepare prompts
    prompt, msg, user_defined_prompt = self._prepare_prompt_variables()

    # Check for structured output schema
    output_schema = self._get_output_schema()
    schema_prompt = ""
    if output_schema:
        schema = json.dumps(output_schema, ensure_ascii=False, indent=2)
        schema_prompt = structured_output_prompt(schema)

    # Check if next component is Message (for streaming)
    downstreams = self._canvas.get_component(self._id)["downstream"]
    ex = self.exception_handler()

    if any([self._canvas.get_component_obj(cid).component_name.lower()=="message" for cid in downstreams]) \
       and not (ex and ex["goto"]) and not output_schema:
        # Stream output directly to Message component
        self.set_output("content", partial(self.stream_output_with_tools, prompt, msg, user_defined_prompt))
        return

    # Non-streaming mode
    _, msg = message_fit_in([{"role": "system", "content": prompt}, *msg], int(self.chat_mdl.max_length * 0.97))
    use_tools = []
    ans = ""

    # Execute ReAct loop
    for delta_ans, tk in self._react_with_tools_streamly(prompt, msg, use_tools, user_defined_prompt, schema_prompt=schema_prompt):
        if self.check_if_canceled("Agent processing"):
            return
        ans += delta_ans

    # Handle errors
    if ans.find("**ERROR**") >= 0:
        logging.error(f"Agent._chat got error. response: {ans}")
        if self.get_exception_default_value():
            self.set_output("content", self.get_exception_default_value())
        else:
            self.set_output("_ERROR", ans)
        return

    # Parse structured output if schema exists
    if output_schema:
        for _ in range(self._param.max_retries + 1):
            try:
                obj = json_repair.loads(clean_formated_answer(ans))
                self.set_output("structured", obj)
                if use_tools:
                    self.set_output("use_tools", use_tools)
                return obj
            except Exception:
                # Retry with format correction
                ans = self._force_format_to_schema(ans, schema_prompt)
        self.set_output("_ERROR", "The answer cannot be parsed as JSON")
        return

    # Normal output
    self.set_output("content", ans)
    if use_tools:
        self.set_output("use_tools", use_tools)
    return ans

File tham chiếu: ragflow/agent/component/agent_with_tools.py:164-240

ReAct Loop Implementation

def _react_with_tools_streamly(self, prompt, history: list[dict], use_tools, user_defined_prompt={}, schema_prompt: str = ""):
    token_count = 0
    tool_metas = self.tool_meta
    hist = deepcopy(history)

    # Optimize multi-turn conversation
    if len(hist) > 3:
        st = timer()
        user_request = full_question(messages=history, chat_mdl=self.chat_mdl)
        self.callback("Multi-turn conversation optimization", {}, user_request, elapsed_time=timer()-st)
    else:
        user_request = history[-1]["content"]

    def use_tool(name, args):
        """Call tool and track usage"""
        tool_response = self.toolcall_session.tool_call(name, args)
        use_tools.append({
            "name": name,
            "arguments": args,
            "results": tool_response
        })
        return name, tool_response

    def complete():
        """Generate final answer with optional citation"""
        need2cite = self._param.cite and self._canvas.get_reference()["chunks"] and self._id.find("-->") < 0
        if schema_prompt:
            need2cite = False

        cited = False
        if hist and hist[0]["role"] == "system":
            if schema_prompt:
                hist[0]["content"] += "\n" + schema_prompt
            if need2cite and len(hist) < 7:
                hist[0]["content"] += citation_prompt()
                cited = True

        yield "", token_count

        # Truncate history if too long
        _hist = hist
        if len(hist) > 12:
            _hist = [hist[0], hist[1], *hist[-10:]]

        # Stream answer
        entire_txt = ""
        for delta_ans in self._generate_streamly(_hist):
            if not need2cite or cited:
                yield delta_ans, 0
            entire_txt += delta_ans

        # Generate citations if needed
        if need2cite and not cited:
            st = timer()
            txt = ""
            for delta_ans in self._gen_citations(entire_txt):
                if self.check_if_canceled("Agent streaming"):
                    return
                yield delta_ans, 0
                txt += delta_ans
            self.callback("gen_citations", {}, txt, elapsed_time=timer()-st)

    # Analyze task first
    st = timer()
    task_desc = analyze_task(self.chat_mdl, prompt, user_request, tool_metas, user_defined_prompt)
    self.callback("analyze_task", {}, task_desc, elapsed_time=timer()-st)

    # ReAct loop
    for _ in range(self._param.max_rounds + 1):
        if self.check_if_canceled("Agent streaming"):
            return

        # LLM decides next step (which tools to call or complete)
        response, tk = next_step(self.chat_mdl, hist, tool_metas, task_desc, user_defined_prompt)
        token_count += tk
        hist.append({"role": "assistant", "content": response})

        try:
            # Parse function calls from LLM response
            functions = json_repair.loads(re.sub(r"```.*", "", response))
            if not isinstance(functions, list):
                raise TypeError(f"List should be returned, but `{functions}`")

            # Execute tools in parallel
            with ThreadPoolExecutor(max_workers=5) as executor:
                thr = []
                for func in functions:
                    name = func["name"]
                    args = func["arguments"]

                    if name == COMPLETE_TASK:
                        # LLM quyết định task hoàn thành
                        for txt, tkcnt in complete():
                            yield txt, tkcnt
                        return

                    thr.append(executor.submit(use_tool, name, args))

                # Reflect on tool results
                st = timer()
                reflection = reflect(self.chat_mdl, hist, [th.result() for th in thr], user_defined_prompt)
                hist.append({"role": "user", "content": reflection})
                self.callback("reflection", {}, str(reflection), elapsed_time=timer()-st)

        except Exception as e:
            logging.exception(msg=f"Wrong JSON argument format in LLM ReAct response: {e}")
            e = f"\nTool call error, please correct the input parameter of response format and call it again.\n *** Exception ***\n{e}"
            hist.append({"role": "user", "content": str(e)})

    # Exceed max rounds, force completion
    logging.warning(f"Exceed max rounds: {self._param.max_rounds}")
    final_instruction = f"""
    {user_request}
    IMPORTANT: You have reached the conversation limit. Based on ALL the information and research you have gathered so far, please provide a DIRECT and COMPREHENSIVE final answer...
    """
    hist.append({"role": "user", "content": final_instruction})

    for txt, tkcnt in complete():
        yield txt, tkcnt

File tham chiếu: ragflow/agent/component/agent_with_tools.py:273-406

Logic chi tiết của ReAct Loop:

Analyze Task: LLM phân tích task và available tools để lập kế hoạch
Loop until max_rounds:
- LLM quyết định next step: gọi tool nào hoặc complete task
- Parse JSON response chứa list function calls
- Execute tất cả tool calls song song (max 5 workers)
- LLM reflect trên tool results để quyết định bước tiếp theo
Tool Call: Mỗi tool được execute và kết quả được append vào history
Reflection: LLM đánh giá tool results và quyết định có cần thêm information không
Completion: Khi LLM return COMPLETE_TASK, generate final answer
Citation: Nếu có retrieval results, tự động generate citations

4. Deep Research - Advanced Reasoning Engine

File: ragflow/agentic_reasoning/deep_research.py

4.1. Tổng Quan

DeepResearcher là một engine cao cấp implement multi-step reasoning với iterative search. Được sử dụng trong dialog service khi enable "Deep Reasoning" mode.

File tham chiếu: ragflow/api/db/services/dialog_service.py:27, 441-463

4.2. Architecture

class DeepResearcher:
    def __init__(self,
                 chat_mdl: LLMBundle,
                 prompt_config: dict,
                 kb_retrieve: partial = None,
                 kg_retrieve: partial = None):
        self.chat_mdl = chat_mdl
        self.prompt_config = prompt_config
        self._kb_retrieve = kb_retrieve  # Knowledge base retrieval function
        self._kg_retrieve = kg_retrieve  # Knowledge graph retrieval function

File tham chiếu: ragflow/agentic_reasoning/deep_research.py:27-37

4.3. Thinking Loop

def thinking(self, chunk_info, question):
    """
    Main reasoning loop với iterative search

    Args:
        chunk_info: Dictionary để lưu retrieved chunks (for citation)
        question: Câu hỏi của user

    Returns:
        Generator yield từng reasoning step
    """
    msg_history = [{"role": "user", "content": question}]
    all_reasoning_steps = []

    for step_index in range(MAX_SEARCH_LIMIT):  # Thường 3-5 steps
        # Step 1: Generate reasoning với LLM
        query_think = ""
        for ans in self._generate_reasoning(msg_history):
            query_think = ans
            yield query_think

        # Step 2: Extract search queries từ reasoning
        queries = self._extract_search_queries(query_think, question, step_index)

        if not queries:
            # Không còn query nào, reasoning complete
            break

        # Step 3: Execute searches
        for search_query in queries:
            # Retrieve from KB, Web, KG
            kbinfos = self._retrieve_information(search_query)

            # Update chunk_info for citation
            self._update_chunk_info(chunk_info, kbinfos)

            # Summarize relevant information
            summary_think = ""
            for ans in self._extract_relevant_info(
                self._truncate_previous_reasoning(all_reasoning_steps),
                search_query,
                kbinfos
            ):
                summary_think = ans
                yield summary_think

            # Append search result to reasoning
            query_think += f"\n{BEGIN_SEARCH_RESULT}\n{summary_think}\n{END_SEARCH_RESULT}"

        # Step 4: Save reasoning step
        all_reasoning_steps.append(query_think)
        msg_history.append({"role": "assistant", "content": query_think})

File tham chiếu: ragflow/agentic_reasoning/deep_research.py (method thinking)

Logic:

Generate Reasoning: LLM tạo chain-of-thought reasoning step
Extract Queries: Parse reasoning text để tìm <|begin_search_query|>...<|end_search_query|>
Multi-source Retrieval:
- Knowledge Base (RAG)
- Web Search (Tavily API)
- Knowledge Graph
Summarize: LLM extract relevant info từ search results
Iterate: Append results vào history và continue reasoning
Stop Condition: Khi LLM không generate thêm search query

4.4. Prompt Engineering

File: ragflow/agentic_reasoning/prompts.py

REASON_PROMPT = """
You are a research assistant performing deep reasoning to answer complex questions.

Instructions:
1. Break down the question into logical steps
2. For each step that requires external information, wrap search queries in tags:
   <|begin_search_query|>your search query here<|end_search_query|>
3. Use previous search results (wrapped in <|begin_search_result|>...<|end_search_result|>) to inform next steps
4. When you have enough information, provide final answer without additional searches

Current question: {question}

Previous reasoning:
{previous_steps}

Continue reasoning:
"""

RELEVANT_EXTRACTION_PROMPT = """
Given the following search results for query "{query}":

{search_results}

Extract and summarize ONLY the information directly relevant to answering:
{context}

Focus on facts, numbers, and specific details. Ignore irrelevant content.
"""

File tham chiếu: ragflow/agentic_reasoning/prompts.py

5. Branching & Control Flow Components

5.1. Categorize Component - LLM-based Intent Classification

File: ragflow/agent/component/categorize.py

Cách Hoạt Động

class CategorizeParam(ComponentParamBase):
    def __init__(self):
        super().__init__()
        self.category_description = {
            "category_name": {
                "description": "Mô tả category",
                "examples": ["example 1", "example 2"],
                "to": ["next_component_id"]
            }
        }
        self.llm_id = ""
        self.query = "sys.query"

Ví dụ từ customer_service.json:

{
  "category_description": {
    "1. contact": {
      "description": "User provides contact information",
      "examples": ["My phone is 123456", "john@email.com"],
      "to": ["Message:BreezyDonutsHeal"]
    },
    "2. casual": {
      "description": "Casual chat, not product related",
      "examples": ["How are you?", "What's your name?"],
      "to": ["Agent:TwelveOwlsWatch"]
    },
    "4. product related": {
      "description": "Questions about product usage",
      "examples": ["How to install?", "Why it doesn't work?"],
      "to": ["Retrieval:ShyPumasJoke"]
    }
  }
}

File tham chiếu: ragflow/agent/templates/customer_service.json:177-213

_invoke() Implementation

def _invoke(self, **kwargs):
    # Get query from variable reference
    query = self._canvas.get_value_with_variable(self._param.query)

    # Build prompt with categories and examples
    prompt = "Classify the following query into one of these categories:\n\n"
    for cat_name, cat_info in self._param.category_description.items():
        prompt += f"{cat_name}: {cat_info['description']}\n"
        prompt += f"Examples: {', '.join(cat_info['examples'])}\n\n"

    prompt += f"Query: {query}\n\nCategory:"

    # Call LLM for classification
    response = self.chat_mdl.chat(prompt, [], {"temperature": 0.1})

    # Find matching category
    for cat_name, cat_info in self._param.category_description.items():
        if cat_name in response:
            self.set_output("category_name", cat_name)
            self.set_output("_next", cat_info["to"])
            return

    # Default to first category
    first_cat = list(self._param.category_description.values())[0]
    self.set_output("_next", first_cat["to"])

Logic:

LLM classify user query vào một trong các category
Set _next output = downstream của category đó
Canvas engine sẽ đọc _next và append vào path

5.2. Switch Component - Conditional Branching

File: ragflow/agent/component/switch.py

Example Configuration

{
  "component_name": "Switch",
  "params": {
    "cases": [
      {
        "condition": "{sys.conversation_turns} > 5",
        "to": ["Agent:SuggestEnd"]
      },
      {
        "condition": "{User:Profile@premium} == true",
        "to": ["Agent:PremiumSupport"]
      }
    ],
    "default": ["Agent:StandardSupport"]
  }
}

Logic

def _invoke(self, **kwargs):
    for case in self._param.cases:
        # Resolve variables in condition
        condition = self._canvas.get_value_with_variable(case["condition"])

        # Evaluate condition
        if eval(condition):
            self.set_output("_next", case["to"])
            return

    # Default branch
    self.set_output("_next", self._param.default)

5.3. Iteration & Loop Components

Files:

ragflow/agent/component/iteration.py
ragflow/agent/component/loop.py

Iteration - For-Each Loop

{
  "component_name": "Iteration",
  "params": {
    "items": "{DataProcessor@results}",
    "item_var": "current_item"
  }
}

Logic:

def _invoke(self, **kwargs):
    items = self._canvas.get_variable_value(self._param.items)

    if not isinstance(items, list):
        items = [items]

    for idx, item in enumerate(items):
        # Set item variable
        self._canvas.set_variable_value(self._param.item_var, item)

        # Add loop body to path
        self._canvas.path.append(self.get_start())  # Start of loop body

Loop - While Loop

{
  "component_name": "Loop",
  "params": {
    "condition": "{attempt_count} < 3",
    "max_iterations": 10
  }
}

6. API Integration - Cách User Tương Tác với Workflow

6.1. REST API Endpoint

File: ragflow/api/apps/canvas_app.py:124-178

Endpoint: POST `/completion`

@manager.route('/completion', methods=['POST'])
@validate_request("id")
@login_required
async def run():
    req = await request_json()
    query = req.get("query", "")
    files = req.get("files", [])
    inputs = req.get("inputs", {})
    user_id = req.get("user_id", current_user.id)

    # Permission check
    if not UserCanvasService.accessible(req["id"], current_user.id):
        return get_json_result(
            data=False,
            message='Only owner of canvas authorized for this operation.',
            code=RetCode.OPERATING_ERROR
        )

    # Load canvas DSL from database
    e, cvs = UserCanvasService.get_by_id(req["id"])
    if not e:
        return get_data_error_result(message="canvas not found.")

    if not isinstance(cvs.dsl, str):
        cvs.dsl = json.dumps(cvs.dsl, ensure_ascii=False)

    # Create Canvas instance
    try:
        canvas = Canvas(cvs.dsl, current_user.id)
    except Exception as e:
        return server_error_response(e)

    # Server-Sent Events (SSE) stream
    async def sse():
        nonlocal canvas, user_id
        try:
            # Execute workflow và stream events
            async for ans in canvas.run(query=query, files=files, user_id=user_id, inputs=inputs):
                yield "data:" + json.dumps(ans, ensure_ascii=False) + "\n\n"

            # Save updated DSL (với updated history, variables, etc.)
            cvs.dsl = json.loads(str(canvas))
            UserCanvasService.update_by_id(req["id"], cvs.to_dict())

        except Exception as e:
            logging.exception(e)
            canvas.cancel_task()
            yield "data:" + json.dumps({
                "code": 500,
                "message": str(e),
                "data": False
            }, ensure_ascii=False) + "\n\n"

    # Return SSE response
    resp = Response(sse(), mimetype="text/event-stream")
    resp.headers.add_header("Cache-control", "no-cache")
    resp.headers.add_header("Connection", "keep-alive")
    resp.headers.add_header("X-Accel-Buffering", "no")
    resp.headers.add_header("Content-Type", "text/event-stream; charset=utf-8")
    return resp

File tham chiếu: ragflow/api/apps/canvas_app.py:124-178

6.2. Event Stream Format

Frontend nhận stream các event theo format:

data:{"event":"workflow_started","message_id":"uuid","task_id":"uuid","data":{"inputs":{}}}

data:{"event":"node_started","message_id":"uuid","data":{"component_id":"begin","component_name":"Begin"}}

data:{"event":"node_finished","message_id":"uuid","data":{"component_id":"begin","outputs":{},"elapsed_time":0.001}}

data:{"event":"node_started","message_id":"uuid","data":{"component_id":"Agent:xxx","component_name":"Agent"}}

data:{"event":"message","message_id":"uuid","data":{"content":"Hello"}}
data:{"event":"message","message_id":"uuid","data":{"content":" there"}}
data:{"event":"message","message_id":"uuid","data":{"content":"!"}}

data:{"event":"message_end","message_id":"uuid","data":{"reference":{"chunks":[],"doc_aggs":[]}}}

data:{"event":"node_finished","message_id":"uuid","data":{"component_id":"Agent:xxx"}}

data:{"event":"workflow_finished","message_id":"uuid","data":{"outputs":{"content":"Hello there!"},"elapsed_time":2.5}}

Frontend có thể:

Track progress real-time
Display streaming responses
Show which component đang execute
Handle errors gracefully

7. Cách Định Nghĩa Custom DSL - Hướng Dẫn Thực Hành

7.1. Bước 1: Thiết Kế Workflow Graph

Vẽ sơ đồ workflow với các node và edge:

[Begin]
   ↓
[Categorize Intent]
   ├→ "Order Status" → [Retrieval:OrderDB] → [Agent:OrderSupport] → [Message]
   ├→ "Product Info" → [Retrieval:ProductKB] → [Agent:ProductExpert] → [Message]
   └→ "General Chat" → [Agent:CasualChat] → [Message]

7.2. Bước 2: Viết JSON DSL

Template Cơ Bản

{
  "components": {
    "begin": {
      "obj": {
        "component_name": "Begin",
        "params": {
          "prologue": "Welcome! How can I help you?",
          "mode": "conversational"
        }
      },
      "downstream": ["Categorize:IntentClassifier"],
      "upstream": []
    },

    "Categorize:IntentClassifier": {
      "obj": {
        "component_name": "Categorize",
        "params": {
          "llm_id": "deepseek-chat@DeepSeek",
          "query": "sys.query",
          "category_description": {
            "order_status": {
              "description": "User asking about order tracking or delivery status",
              "examples": [
                "Where is my order?",
                "When will my package arrive?"
              ],
              "to": ["Retrieval:OrderDB"]
            },
            "product_info": {
              "description": "Questions about product features, specs, or usage",
              "examples": [
                "What are the features?",
                "How to use this product?"
              ],
              "to": ["Retrieval:ProductKB"]
            },
            "general_chat": {
              "description": "Casual conversation not related to orders or products",
              "examples": [
                "Hello",
                "How are you?"
              ],
              "to": ["Agent:CasualChat"]
            }
          }
        }
      },
      "downstream": [
        "Retrieval:OrderDB",
        "Retrieval:ProductKB",
        "Agent:CasualChat"
      ],
      "upstream": ["begin"]
    },

    "Retrieval:OrderDB": {
      "obj": {
        "component_name": "Retrieval",
        "params": {
          "kb_ids": ["order_database_kb_id"],
          "query": "sys.query",
          "top_n": 5,
          "similarity_threshold": 0.3
        }
      },
      "downstream": ["Agent:OrderSupport"],
      "upstream": ["Categorize:IntentClassifier"]
    },

    "Agent:OrderSupport": {
      "obj": {
        "component_name": "Agent",
        "params": {
          "llm_id": "deepseek-chat@DeepSeek",
          "max_rounds": 3,
          "sys_prompt": "You are an order support specialist. Help users track their orders based on the database information provided.",
          "prompts": [
            {
              "role": "user",
              "content": "User question: {sys.query}\n\nOrder database results: {Retrieval:OrderDB@formalized_content}"
            }
          ],
          "tools": []
        }
      },
      "downstream": ["Message:FinalResponse"],
      "upstream": ["Retrieval:OrderDB"]
    },

    "Retrieval:ProductKB": {
      "obj": {
        "component_name": "Retrieval",
        "params": {
          "kb_ids": ["product_kb_id"],
          "query": "sys.query",
          "top_n": 8
        }
      },
      "downstream": ["Agent:ProductExpert"],
      "upstream": ["Categorize:IntentClassifier"]
    },

    "Agent:ProductExpert": {
      "obj": {
        "component_name": "Agent",
        "params": {
          "llm_id": "deepseek-chat@DeepSeek",
          "sys_prompt": "You are a product expert. Answer questions based on official product documentation.",
          "prompts": [
            {
              "role": "user",
              "content": "{sys.query}\n\nProduct docs: {Retrieval:ProductKB@formalized_content}"
            }
          ]
        }
      },
      "downstream": ["Message:FinalResponse"],
      "upstream": ["Retrieval:ProductKB"]
    },

    "Agent:CasualChat": {
      "obj": {
        "component_name": "Agent",
        "params": {
          "llm_id": "deepseek-chat@DeepSeek",
          "sys_prompt": "You are a friendly assistant for casual conversation.",
          "prompts": [
            {
              "role": "user",
              "content": "{sys.query}"
            }
          ]
        }
      },
      "downstream": ["Message:FinalResponse"],
      "upstream": ["Categorize:IntentClassifier"]
    },

    "Message:FinalResponse": {
      "obj": {
        "component_name": "Message",
        "params": {
          "content": [
            "{Agent:OrderSupport@content}",
            "{Agent:ProductExpert@content}",
            "{Agent:CasualChat@content}"
          ]
        }
      },
      "downstream": [],
      "upstream": [
        "Agent:OrderSupport",
        "Agent:ProductExpert",
        "Agent:CasualChat"
      ]
    }
  },

  "globals": {
    "sys.query": "",
    "sys.user_id": "",
    "sys.conversation_turns": 0,
    "sys.files": []
  },

  "variables": {},
  "history": [],
  "path": [],
  "retrieval": [],
  "memory": []
}

7.3. Bước 3: Variable References

Các Pattern Tham Chiếu

System Variables:

"{sys.query}"              // User's question
"{sys.user_id}"            // User ID
"{sys.conversation_turns}" // Conversation count
"{sys.files}"              // Uploaded files

Component Outputs:

"{ComponentID@output_name}"

// Examples:
"{Retrieval:OrderDB@formalized_content}"
"{Agent:ProductExpert@content}"
"{Categorize:IntentClassifier@category_name}"

Nested Access:

"{Agent:Analysis@structured.summary}"
"{DataProcessor@results.0.score}"

7.4. Bước 4: Upload và Test

# Upload canvas via API
curl -X POST http://localhost:9380/api/canvas/set \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "My Custom Workflow",
    "dsl": { ... your JSON DSL ... }
  }'

# Execute workflow
curl -X POST http://localhost:9380/api/canvas/completion \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "canvas_id_from_previous_response",
    "query": "Where is my order #12345?"
  }'

8. Advanced Features

8.1. Nested Agent Composition

Agent có thể gọi agent khác như một tool:

{
  "component_name": "Agent",
  "params": {
    "llm_id": "deepseek-chat@DeepSeek",
    "tools": [
      {
        "component_name": "Agent",
        "name": "product_specialist",
        "params": {
          "llm_id": "deepseek-chat@DeepSeek",
          "sys_prompt": "You are a product specialist...",
          "user_prompt": "Answer this product question: {user_input}"
        }
      },
      {
        "component_name": "Agent",
        "name": "order_specialist",
        "params": {
          "llm_id": "deepseek-chat@DeepSeek",
          "sys_prompt": "You are an order tracking specialist..."
        }
      }
    ]
  }
}

Cách hoạt động:

Supervisor agent phân tích query
Quyết định gọi sub-agent nào
Sub-agent execute và return result
Supervisor synthesize final answer

File tham chiếu: ragflow/agent/component/agent_with_tools.py:109-119

8.2. Model Context Protocol (MCP) Integration

RAGFlow hỗ trợ external tools qua MCP:

{
  "component_name": "Agent",
  "params": {
    "mcp": [
      {
        "mcp_id": "github_mcp_server_id",
        "tools": {
          "search_code": {
            "name": "search_code",
            "description": "Search code in GitHub repositories",
            "parameters": {
              "query": { "type": "string" },
              "repo": { "type": "string" }
            }
          }
        }
      }
    ]
  }
}

File tham chiếu: ragflow/agent/component/agent_with_tools.py:99-104

8.3. Structured Output Schema

Force agent return JSON theo schema:

{
  "component_name": "Agent",
  "params": {
    "outputs": {
      "structured": {
        "type": "object",
        "properties": {
          "product_name": { "type": "string" },
          "price": { "type": "number" },
          "features": {
            "type": "array",
            "items": { "type": "string" }
          },
          "recommendation": { "type": "boolean" }
        },
        "required": ["product_name", "price"]
      }
    }
  }
}

Logic (ragflow/agent/component/agent_with_tools.py:141-154, 215-235):

Inject schema vào system prompt
LLM generate JSON
Validate và parse bằng json_repair
Retry nếu invalid JSON (max retries configurable)

8.4. Exception Handling

{
  "component_name": "Retrieval",
  "params": {
    "kb_ids": ["kb_123"],
    "exception_method": "goto",
    "exception_goto": ["Agent:Fallback"],
    "exception_default_value": "I couldn't find information in our database."
  }
}

Modes:

goto: Jump to specific component khi error
comment: Return default value và continue
null: Raise error và stop workflow

File tham chiếu: ragflow/agent/component/base.py:565-579

9. Tổng Kết Flow Hoàn Chỉnh

User Request → Response Journey

1. User sends request
   POST /api/canvas/completion
   { "id": "canvas_123", "query": "Where is my order?" }

2. API loads DSL from database
   UserCanvasService.get_by_id(canvas_123)
   → DSL JSON

3. Canvas initialization
   canvas = Canvas(dsl_json, tenant_id)
   → Parse JSON
   → Create component objects
   → Validate parameters

4. Execute workflow
   async for event in canvas.run(query="Where is my order?"):

   4.1. Initialize state
       - Set sys.query = "Where is my order?"
       - Reset component outputs
       - path = ["begin"]

   4.2. Execute "begin"
       - Yield workflow_started event
       - Yield node_started event
       - invoke() → set prologue output
       - Yield node_finished event
       - Append downstream ["Categorize:IntentClassifier"] to path

   4.3. Execute "Categorize:IntentClassifier"
       - Yield node_started event
       - invoke():
         * Call LLM with query and categories
         * LLM returns "order_status"
         * Set output._next = ["Retrieval:OrderDB"]
       - Yield node_finished event
       - Append output._next to path

   4.4. Execute "Retrieval:OrderDB"
       - Yield node_started event
       - invoke():
         * Resolve query = sys.query
         * Search in knowledge base
         * Return chunks
         * Set output.formalized_content = formatted chunks
       - Yield node_finished event
       - Append downstream ["Agent:OrderSupport"] to path

   4.5. Execute "Agent:OrderSupport"
       - Yield node_started event
       - invoke():
         * Resolve prompts:
           - sys.query → "Where is my order?"
           - Retrieval:OrderDB@formalized_content → retrieved chunks
         * Start ReAct loop:
           Step 1: LLM analyze task
           Step 2: LLM decides no tools needed
           Step 3: LLM returns COMPLETE_TASK
         * Stream answer: "Your order #12345 is..."
       - Yield message events (streaming)
       - Yield node_finished event
       - Append downstream ["Message:FinalResponse"] to path

   4.6. Execute "Message:FinalResponse"
       - Yield node_started event
       - invoke():
         * Resolve content variables:
           - Agent:OrderSupport@content → "Your order #12345 is..."
           - Other agents → empty (not executed)
         * Return first non-empty content
       - Yield message event
       - Yield message_end event
       - Yield node_finished event
       - No downstream → workflow complete

   4.7. Workflow completion
       - Yield workflow_finished event
       - Save updated DSL to database (with history, path, etc.)

5. SSE stream to frontend
   Frontend displays:
   - Progress indicators
   - Streaming response
   - Citations (if any)

6. User sees response
   "Your order #12345 is currently in transit. Expected delivery: tomorrow."

Phụ Lục: File References Chính

Component	File Path	Line Numbers	Mục Đích
DSL Schema	`ragflow/agent/canvas.py`	36-73	Định nghĩa cấu trúc DSL
Graph Engine	`ragflow/agent/canvas.py`	34-273	Core workflow execution
Canvas Executor	`ragflow/agent/canvas.py`	275-676	Agent workflow runner
Component Base	`ragflow/agent/component/base.py`	37-583	Base class cho component
Agent Component	`ragflow/agent/component/agent_with_tools.py`	81-437	LLM agent với ReAct
Deep Researcher	`ragflow/agentic_reasoning/deep_research.py`	27-150	Multi-step reasoning
API Endpoint	`ragflow/api/apps/canvas_app.py`	124-178	REST API
Component Discovery	`ragflow/agent/component/__init__.py`	51-58	Plugin system
Templates	`ragflow/agent/templates/`	-	Pre-built workflows

Tài liệu này mô tả chi tiết cách RAGFlow implement agent workflow system. Hệ thống cho phép người dùng định nghĩa complex multi-agent workflows thông qua JSON DSL, với execution engine mạnh mẽ hỗ trợ branching, looping, tool calling, và streaming responses.

55 KiB Raw Blame History

Cách Agent Workflow Hoạt Động trong RAGFlow

Tổng Quan Hệ Thống

1. Kiến Trúc DSL (Domain Specific Language)

1.1. Cấu Trúc Tổng Thể của DSL

1.2. Các Thành Phần Chính của DSL

a) Components - Các Node trong Đồ Thị

b) Downstream/Upstream - Định Nghĩa Luồng Thực Thi

c) Globals - Biến Hệ Thống

d) Variables - Hệ Thống Tham Chiếu Biến

1.3. Các Loại Component Có Sẵn

2. Execution Engine - Cơ Chế Thực Thi Workflow

2.1. Class Graph - Core Engine

Quá Trình Load DSL

2.2. Class Canvas - Agent Workflow Executor

Phương Thức run() - Trái Tim của Execution Engine

Chi Tiết Từng Bước Thực Thi

Bước 1: Khởi Tạo State

Bước 2: Set System Variables

Bước 3: Path Initialization

Bước 4: Yield Workflow Started Event

Bước 5: Execute Components in Path

Bước 6: Parallel Execution với ThreadPoolExecutor

Bước 7: Post-Processing & Branching Logic

Bước 8: Workflow Completion

3. Component Architecture - Cách Định Nghĩa Component

3.1. ComponentParamBase - Base Class cho Parameters

Ví Dụ: AgentParam

Phương Thức Quan Trọng

3.2. ComponentBase - Base Class cho Component

Constructor

Phương Thức invoke() - Entry Point Execution

Abstract Method: _invoke()

3.3. Ví Dụ Cụ Thể: Agent Component

Constructor - Load Tools

_invoke() Implementation - ReAct Loop

ReAct Loop Implementation

4. Deep Research - Advanced Reasoning Engine

4.1. Tổng Quan

4.2. Architecture

4.3. Thinking Loop

4.4. Prompt Engineering

5. Branching & Control Flow Components

5.1. Categorize Component - LLM-based Intent Classification

Cách Hoạt Động

_invoke() Implementation

5.2. Switch Component - Conditional Branching

Example Configuration

Logic

5.3. Iteration & Loop Components

Iteration - For-Each Loop

Loop - While Loop

6. API Integration - Cách User Tương Tác với Workflow

6.1. REST API Endpoint

Endpoint: POST /completion

6.2. Event Stream Format

7. Cách Định Nghĩa Custom DSL - Hướng Dẫn Thực Hành

7.1. Bước 1: Thiết Kế Workflow Graph

7.2. Bước 2: Viết JSON DSL

Template Cơ Bản

7.3. Bước 3: Variable References

Các Pattern Tham Chiếu

7.4. Bước 4: Upload và Test

8. Advanced Features

8.1. Nested Agent Composition

8.2. Model Context Protocol (MCP) Integration

8.3. Structured Output Schema

8.4. Exception Handling

9. Tổng Kết Flow Hoàn Chỉnh

User Request → Response Journey

Phụ Lục: File References Chính

55 KiB

Raw Blame History

Phương Thức `run()` - Trái Tim của Execution Engine

Endpoint: POST `/completion`