docs: Add detailed backend module analysis documentation

Add comprehensive documentation covering 6 modules:
- 01-API-LAYER: Authentication, routing, SSE streaming
- 02-SERVICE-LAYER: Dialog, Task, LLM service analysis
- 03-RAG-ENGINE: Hybrid search, embedding, reranking
- 04-AGENT-SYSTEM: Canvas engine, components, tools
- 05-DOCUMENT-PROCESSING: Task executor, PDF parsing
- 06-ALGORITHMS: BM25, fusion, RAPTOR

Total 28 documentation files with code analysis, diagrams, and formulas.

2025-11-26 11:10:54 +00:00

14 KiB

Raw Blame History

Canvas App Analysis

Tổng Quan

canvas_app.py (609 lines) là blueprint xử lý Agent Workflow API - cho phép tạo và chạy các workflow phức tạp với visual canvas.

File Location

/api/apps/canvas_app.py

API Endpoints

Endpoint	Method	Auth	Mô Tả
`/templates`	GET	Required	List agent templates
`/set`	POST	Required	Save/update canvas workflow
`/get/<canvas_id>`	GET	Required	Retrieve canvas DSL
`/completion`	POST	Required	Execute workflow (SSE)
`/debug`	POST	Required	Debug single component
`/reset`	POST	Required	Reset canvas state
`/rerun`	POST	Required	Reprocess failed pipeline
`/cancel/<task_id>`	PUT	Required	Cancel running task
`/test_db_connect`	POST	Required	Test database connectivity
`/upload/<canvas_id>`	POST	Optional	Upload file to canvas

Core Flow: Workflow Execution

┌─────────────────────────────────────────────────────────────────────────┐
│                     CANVAS WORKFLOW EXECUTION                            │
└─────────────────────────────────────────────────────────────────────────┘

Client                        API                      Canvas Engine
  │                            │                            │
  │ POST /completion           │                            │
  │ {id, query, stream: true}  │                            │
  ├───────────────────────────►│                            │
  │                            │                            │
  │              ┌─────────────┴─────────────┐              │
  │              │ Load canvas DSL           │              │
  │              │ Initialize Canvas engine  │              │
  │              └─────────────┬─────────────┘              │
  │                            │                            │
  │                            │ canvas.run(query)          │
  │                            ├───────────────────────────►│
  │                            │                            │
  │                            │      workflow_started      │
  │ SSE: workflow_started      │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │                            │      node_started (Begin)  │
  │ SSE: node_started          │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │                            │      node_finished (Begin) │
  │ SSE: node_finished         │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │                            │      node_started (LLM)    │
  │ SSE: node_started          │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │                            │      message (streaming)   │
  │ SSE: message               │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │
  │  ... (more tokens/nodes)   │                            │
  │                            │                            │
  │                            │      workflow_finished     │
  │ SSE: workflow_finished     │◄───────────────────────────┤
  │◄───────────────────────────┤                            │
  │                            │                            │

Code Analysis

Completion Endpoint (Workflow Execution)

@manager.route('/completion', methods=['POST'])
@validate_request("id")
@login_required
async def completion():
    """
    Execute canvas workflow with streaming output.

    Request:
        - id: Canvas ID
        - query: User input (optional, from sys.query)
        - stream: Boolean (default True)
        - message_id: Unique message ID

    Response:
        - SSE stream of workflow events
    """
    req = await request_json()
    canvas_id = req["id"]

    # 1. Load canvas from database
    e, user_canvas = UserCanvasService.get_by_id(canvas_id)
    if not e:
        raise LookupError(f"Canvas {canvas_id} not found")

    # 2. Initialize Canvas engine with DSL
    canvas = Canvas(json.dumps(user_canvas.dsl), current_user.id)

    # 3. Reset canvas state
    canvas.reset()

    # 4. Set user query
    query = req.get("query", "")
    canvas.set_global_variable("sys.query", query)

    # 5. Define streaming generator
    def stream():
        try:
            # Execute workflow and yield events
            async for event in canvas.run(**req):
                yield format_sse_event(event)

        except Exception as e:
            logging.exception(e)
            yield error_event(str(e))

        # Save canvas state
        UserCanvasService.update_by_id(canvas_id, {
            "dsl": json.loads(canvas.to_json())
        })

    resp = Response(stream(), mimetype="text/event-stream")
    resp.headers.add_header("Cache-control", "no-cache")
    resp.headers.add_header("X-Accel-Buffering", "no")
    return resp

Debug Single Component

@manager.route('/debug', methods=['POST'])
@validate_request("id", "component_id", "params")
@login_required
async def debug():
    """
    Debug a single component in isolation.

    Request:
        - id: Canvas ID
        - component_id: Component to debug
        - params: Input parameters for component

    Response:
        - Component outputs
    """
    req = await request_json()

    # Load canvas
    e, user_canvas = UserCanvasService.get_by_id(req["id"])
    canvas = Canvas(json.dumps(user_canvas.dsl), current_user.id)
    canvas.reset()

    # Get component
    component = canvas.get_component(req["component_id"])["obj"]
    component.reset()

    # Set debug inputs
    if isinstance(component, LLM):
        component.set_debug_inputs(req["params"])

    # Execute component
    component.invoke(**{k: o["value"] for k, o in req["params"].items()})

    # Return outputs
    outputs = component.output()
    return get_json_result(data=outputs)

SSE Event Types

# Workflow started
{
    "event": "workflow_started",
    "message_id": "msg_123",
    "created_at": 1699999999,
    "data": {"inputs": {"query": "..."}}
}

# Node started
{
    "event": "node_started",
    "component_id": "LLM:Planning",
    "component_name": "Planning Agent",
    "component_type": "LLM",
    "thoughts": "Processing your request..."
}

# Streaming message (from LLM/Message components)
{
    "event": "message",
    "content": "Here is the answer...",
    "start_to_think": false  # true when <think> tag detected
}

# Node finished
{
    "event": "node_finished",
    "component_id": "LLM:Planning",
    "inputs": {...},
    "outputs": {"content": "..."},
    "elapsed_time": 2.5
}

# User input required
{
    "event": "user_inputs",
    "component_id": "UserFillUp:Confirm",
    "inputs": {
        "feedback": {"type": "text", "optional": false}
    }
}

# Workflow finished
{
    "event": "workflow_finished",
    "outputs": {...},
    "elapsed_time": 5.0
}

# Error event
{
    "event": "error",
    "message": "Error description"
}

Canvas Templates

@manager.route('/templates', methods=['GET'])
@login_required
async def templates():
    """
    List available canvas templates.

    Response:
        - List of template objects with name, avatar, dsl
    """
    templates = CanvasTemplateService.get_all()

    return get_json_result(data=[{
        "id": t.id,
        "name": t.name,
        "avatar": t.avatar,
        "description": t.description,
        "dsl": t.dsl
    } for t in templates])

Cancel Running Task

@manager.route('/cancel/<task_id>', methods=['PUT'])
@login_required
async def cancel(task_id):
    """
    Cancel a running workflow execution.

    The cancellation is propagated to all running components
    via the has_canceled() check in Canvas.run().
    """
    # Set cancellation flag in Redis
    REDIS_CONN.set(f"cancel:{task_id}", "1", ex=3600)

    return get_json_result(data=True)

Canvas DSL Structure

{
    "components": {
        "begin": {
            "obj": {
                "component_name": "Begin",
                "params": {
                    "prologue": "Hello! How can I help?",
                    "mode": "conversational"
                }
            },
            "downstream": ["LLM:Planning"],
            "upstream": []
        },
        "LLM:Planning": {
            "obj": {
                "component_name": "LLM",
                "params": {
                    "llm_id": "gpt-4@OpenAI",
                    "sys_prompt": "You are a helpful assistant...",
                    "prompts": [
                        {"role": "user", "content": "{sys.query}"}
                    ],
                    "temperature": 0.7,
                    "cite": true
                }
            },
            "downstream": ["Message:Output"],
            "upstream": ["begin"]
        },
        "Message:Output": {
            "obj": {
                "component_name": "Message",
                "params": {
                    "content": ["{LLM:Planning@content}"]
                }
            },
            "downstream": [],
            "upstream": ["LLM:Planning"]
        }
    },
    "globals": {
        "sys.query": "",
        "sys.user_id": "user_123"
    },
    "path": ["begin"],
    "history": [],
    "memory": []
}

Sequence Diagram: Agent Workflow

sequenceDiagram
    participant C as Client
    participant A as API (canvas_app)
    participant E as Canvas Engine
    participant B as Begin Component
    participant L as LLM Component
    participant R as Retrieval Component
    participant M as Message Component

    C->>A: POST /completion {id, query}
    A->>A: Load canvas DSL
    A->>E: Initialize Canvas(dsl)

    E->>E: Reset all components
    E->>E: Set sys.query = query

    A->>E: canvas.run()

    E-->>A: workflow_started
    A-->>C: SSE: workflow_started

    E->>B: Begin.invoke()
    E-->>A: node_started (Begin)
    A-->>C: SSE: node_started
    B-->>E: {user_input: query}
    E-->>A: node_finished (Begin)
    A-->>C: SSE: node_finished

    E->>L: LLM.invoke()
    E-->>A: node_started (LLM)
    A-->>C: SSE: node_started

    Note over L: Check if Retrieval downstream

    L->>R: Retrieval.invoke(query)
    R-->>L: {chunks: [...]}

    L->>L: Build prompt with context

    loop Token streaming
        L-->>E: yield token
        E-->>A: message event
        A-->>C: SSE: message
    end

    L-->>E: {content: "Full answer"}
    E-->>A: node_finished (LLM)
    A-->>C: SSE: node_finished

    E->>M: Message.invoke()
    M-->>E: Format output
    E-->>A: node_finished (Message)
    A-->>C: SSE: node_finished

    E-->>A: workflow_finished
    A-->>C: SSE: workflow_finished

    A->>A: Save canvas state

Component Types

Component	Purpose	Key Parameters
Begin	Entry point	prologue, mode
LLM	Language model call	llm_id, prompt, temperature
Agent	ReAct with tools	tools, max_rounds
Retrieval	KB search	kb_ids, top_n, threshold
Categorize	Route by condition	categories, examples
Message	Format output	content template
Webhook	HTTP call	url, method, headers
Iteration	Loop over array	array variable
UserFillUp	Request user input	input fields

Error Handling

def stream():
    try:
        async for event in canvas.run(**req):
            yield format_sse_event(event)

    except ComponentExecutionError as e:
        # Component-level error
        yield error_event(e.component_id, str(e))

    except TimeoutError as e:
        # Execution timeout
        yield error_event(None, "Workflow execution timed out")

    except Exception as e:
        logging.exception(e)
        yield error_event(None, str(e))

    finally:
        # Always save state
        try:
            UserCanvasService.update_by_id(canvas_id, {...})
        except:
            pass

/agent/canvas.py - Canvas execution engine
/agent/component/*.py - Component implementations
/agent/tools/*.py - Tool integrations
/api/db/services/canvas_service.py - Canvas storage

14 KiB Raw Blame History

Canvas App Analysis

Tổng Quan

File Location

API Endpoints

Core Flow: Workflow Execution

Code Analysis

Completion Endpoint (Workflow Execution)

Debug Single Component

SSE Event Types

Canvas Templates

Cancel Running Task

Canvas DSL Structure

Sequence Diagram: Agent Workflow

Component Types

Error Handling

Related Files

14 KiB

Raw Blame History