ragflow/personal_analyze/01-API-LAYER/canvas_app_analysis.md
Claude a6ee18476d
docs: Add detailed backend module analysis documentation
Add comprehensive documentation covering 6 modules:
- 01-API-LAYER: Authentication, routing, SSE streaming
- 02-SERVICE-LAYER: Dialog, Task, LLM service analysis
- 03-RAG-ENGINE: Hybrid search, embedding, reranking
- 04-AGENT-SYSTEM: Canvas engine, components, tools
- 05-DOCUMENT-PROCESSING: Task executor, PDF parsing
- 06-ALGORITHMS: BM25, fusion, RAPTOR

Total 28 documentation files with code analysis, diagrams, and formulas.
2025-11-26 11:10:54 +00:00

14 KiB

Canvas App Analysis

Tổng Quan

canvas_app.py (609 lines) là blueprint xử lý Agent Workflow API - cho phép tạo và chạy các workflow phức tạp với visual canvas.

File Location

/api/apps/canvas_app.py

API Endpoints

Endpoint Method Auth Mô Tả
/templates GET Required List agent templates
/set POST Required Save/update canvas workflow
/get/<canvas_id> GET Required Retrieve canvas DSL
/completion POST Required Execute workflow (SSE)
/debug POST Required Debug single component
/reset POST Required Reset canvas state
/rerun POST Required Reprocess failed pipeline
/cancel/<task_id> PUT Required Cancel running task
/test_db_connect POST Required Test database connectivity
/upload/<canvas_id> POST Optional Upload file to canvas

Core Flow: Workflow Execution

┌─────────────────────────────────────────────────────────────────────────┐
│                     CANVAS WORKFLOW EXECUTION                            │
└─────────────────────────────────────────────────────────────────────────┘

Client                        API                      Canvas Engine
  │                            │                            │
  │ POST /completion           │                            │
  │ {id, query, stream: true}  │                            │
  ├───────────────────────────►│                            │
  │                            │                            │
  │              ┌─────────────┴─────────────┐              │
  │              │ Load canvas DSL           │              │
  │              │ Initialize Canvas engine  │              │
  │              └─────────────┬─────────────┘              │
  │                            │                            │
  │                            │ canvas.run(query)          │
  │                            ├───────────────────────────►│
  │                            │                            │
  │                            │      workflow_started      │
  │ SSE: workflow_started      │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │                            │      node_started (Begin)  │
  │ SSE: node_started          │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │                            │      node_finished (Begin) │
  │ SSE: node_finished         │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │                            │      node_started (LLM)    │
  │ SSE: node_started          │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │                            │      message (streaming)   │
  │ SSE: message               │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │  ... (more tokens/nodes)   │                            │
  │                            │                            │
  │                            │      workflow_finished     │
  │ SSE: workflow_finished     │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │

Code Analysis

Completion Endpoint (Workflow Execution)

@manager.route('/completion', methods=['POST'])
@validate_request("id")
@login_required
async def completion():
    """
    Execute canvas workflow with streaming output.

    Request:
        - id: Canvas ID
        - query: User input (optional, from sys.query)
        - stream: Boolean (default True)
        - message_id: Unique message ID

    Response:
        - SSE stream of workflow events
    """
    req = await request_json()
    canvas_id = req["id"]

    # 1. Load canvas from database
    e, user_canvas = UserCanvasService.get_by_id(canvas_id)
    if not e:
        raise LookupError(f"Canvas {canvas_id} not found")

    # 2. Initialize Canvas engine with DSL
    canvas = Canvas(json.dumps(user_canvas.dsl), current_user.id)

    # 3. Reset canvas state
    canvas.reset()

    # 4. Set user query
    query = req.get("query", "")
    canvas.set_global_variable("sys.query", query)

    # 5. Define streaming generator
    def stream():
        try:
            # Execute workflow and yield events
            async for event in canvas.run(**req):
                yield format_sse_event(event)

        except Exception as e:
            logging.exception(e)
            yield error_event(str(e))

        # Save canvas state
        UserCanvasService.update_by_id(canvas_id, {
            "dsl": json.loads(canvas.to_json())
        })

    resp = Response(stream(), mimetype="text/event-stream")
    resp.headers.add_header("Cache-control", "no-cache")
    resp.headers.add_header("X-Accel-Buffering", "no")
    return resp

Debug Single Component

@manager.route('/debug', methods=['POST'])
@validate_request("id", "component_id", "params")
@login_required
async def debug():
    """
    Debug a single component in isolation.

    Request:
        - id: Canvas ID
        - component_id: Component to debug
        - params: Input parameters for component

    Response:
        - Component outputs
    """
    req = await request_json()

    # Load canvas
    e, user_canvas = UserCanvasService.get_by_id(req["id"])
    canvas = Canvas(json.dumps(user_canvas.dsl), current_user.id)
    canvas.reset()

    # Get component
    component = canvas.get_component(req["component_id"])["obj"]
    component.reset()

    # Set debug inputs
    if isinstance(component, LLM):
        component.set_debug_inputs(req["params"])

    # Execute component
    component.invoke(**{k: o["value"] for k, o in req["params"].items()})

    # Return outputs
    outputs = component.output()
    return get_json_result(data=outputs)

SSE Event Types

# Workflow started
{
    "event": "workflow_started",
    "message_id": "msg_123",
    "created_at": 1699999999,
    "data": {"inputs": {"query": "..."}}
}

# Node started
{
    "event": "node_started",
    "component_id": "LLM:Planning",
    "component_name": "Planning Agent",
    "component_type": "LLM",
    "thoughts": "Processing your request..."
}

# Streaming message (from LLM/Message components)
{
    "event": "message",
    "content": "Here is the answer...",
    "start_to_think": false  # true when <think> tag detected
}

# Node finished
{
    "event": "node_finished",
    "component_id": "LLM:Planning",
    "inputs": {...},
    "outputs": {"content": "..."},
    "elapsed_time": 2.5
}

# User input required
{
    "event": "user_inputs",
    "component_id": "UserFillUp:Confirm",
    "inputs": {
        "feedback": {"type": "text", "optional": false}
    }
}

# Workflow finished
{
    "event": "workflow_finished",
    "outputs": {...},
    "elapsed_time": 5.0
}

# Error event
{
    "event": "error",
    "message": "Error description"
}

Canvas Templates

@manager.route('/templates', methods=['GET'])
@login_required
async def templates():
    """
    List available canvas templates.

    Response:
        - List of template objects with name, avatar, dsl
    """
    templates = CanvasTemplateService.get_all()

    return get_json_result(data=[{
        "id": t.id,
        "name": t.name,
        "avatar": t.avatar,
        "description": t.description,
        "dsl": t.dsl
    } for t in templates])

Cancel Running Task

@manager.route('/cancel/<task_id>', methods=['PUT'])
@login_required
async def cancel(task_id):
    """
    Cancel a running workflow execution.

    The cancellation is propagated to all running components
    via the has_canceled() check in Canvas.run().
    """
    # Set cancellation flag in Redis
    REDIS_CONN.set(f"cancel:{task_id}", "1", ex=3600)

    return get_json_result(data=True)

Canvas DSL Structure

{
    "components": {
        "begin": {
            "obj": {
                "component_name": "Begin",
                "params": {
                    "prologue": "Hello! How can I help?",
                    "mode": "conversational"
                }
            },
            "downstream": ["LLM:Planning"],
            "upstream": []
        },
        "LLM:Planning": {
            "obj": {
                "component_name": "LLM",
                "params": {
                    "llm_id": "gpt-4@OpenAI",
                    "sys_prompt": "You are a helpful assistant...",
                    "prompts": [
                        {"role": "user", "content": "{sys.query}"}
                    ],
                    "temperature": 0.7,
                    "cite": true
                }
            },
            "downstream": ["Message:Output"],
            "upstream": ["begin"]
        },
        "Message:Output": {
            "obj": {
                "component_name": "Message",
                "params": {
                    "content": ["{LLM:Planning@content}"]
                }
            },
            "downstream": [],
            "upstream": ["LLM:Planning"]
        }
    },
    "globals": {
        "sys.query": "",
        "sys.user_id": "user_123"
    },
    "path": ["begin"],
    "history": [],
    "memory": []
}

Sequence Diagram: Agent Workflow

sequenceDiagram
    participant C as Client
    participant A as API (canvas_app)
    participant E as Canvas Engine
    participant B as Begin Component
    participant L as LLM Component
    participant R as Retrieval Component
    participant M as Message Component

    C->>A: POST /completion {id, query}
    A->>A: Load canvas DSL
    A->>E: Initialize Canvas(dsl)

    E->>E: Reset all components
    E->>E: Set sys.query = query

    A->>E: canvas.run()

    E-->>A: workflow_started
    A-->>C: SSE: workflow_started

    E->>B: Begin.invoke()
    E-->>A: node_started (Begin)
    A-->>C: SSE: node_started
    B-->>E: {user_input: query}
    E-->>A: node_finished (Begin)
    A-->>C: SSE: node_finished

    E->>L: LLM.invoke()
    E-->>A: node_started (LLM)
    A-->>C: SSE: node_started

    Note over L: Check if Retrieval downstream

    L->>R: Retrieval.invoke(query)
    R-->>L: {chunks: [...]}

    L->>L: Build prompt with context

    loop Token streaming
        L-->>E: yield token
        E-->>A: message event
        A-->>C: SSE: message
    end

    L-->>E: {content: "Full answer"}
    E-->>A: node_finished (LLM)
    A-->>C: SSE: node_finished

    E->>M: Message.invoke()
    M-->>E: Format output
    E-->>A: node_finished (Message)
    A-->>C: SSE: node_finished

    E-->>A: workflow_finished
    A-->>C: SSE: workflow_finished

    A->>A: Save canvas state

Component Types

Component Purpose Key Parameters
Begin Entry point prologue, mode
LLM Language model call llm_id, prompt, temperature
Agent ReAct with tools tools, max_rounds
Retrieval KB search kb_ids, top_n, threshold
Categorize Route by condition categories, examples
Message Format output content template
Webhook HTTP call url, method, headers
Iteration Loop over array array variable
UserFillUp Request user input input fields

Error Handling

def stream():
    try:
        async for event in canvas.run(**req):
            yield format_sse_event(event)

    except ComponentExecutionError as e:
        # Component-level error
        yield error_event(e.component_id, str(e))

    except TimeoutError as e:
        # Execution timeout
        yield error_event(None, "Workflow execution timed out")

    except Exception as e:
        logging.exception(e)
        yield error_event(None, str(e))

    finally:
        # Always save state
        try:
            UserCanvasService.update_by_id(canvas_id, {...})
        except:
            pass
  • /agent/canvas.py - Canvas execution engine
  • /agent/component/*.py - Component implementations
  • /agent/tools/*.py - Tool integrations
  • /api/db/services/canvas_service.py - Canvas storage