# Conversation App Analysis

## Tổng Quan

`conversation_app.py` (419 lines) là blueprint xử lý chat/conversation API với **Server-Sent Events (SSE)** streaming.

## File Location
```
/api/apps/conversation_app.py
```

## API Endpoints

| Endpoint | Method | Auth | Mô Tả |
|----------|--------|------|-------|
| `/set` | POST | Required | Create/update conversation |
| `/get` | GET | Required | Retrieve conversation |
| `/list` | GET | Required | List conversations by dialog |
| `/completion` | POST | Required | **Stream chat responses (SSE)** |
| `/delete_msg` | POST | Required | Remove message from history |
| `/thumbup` | POST | Required | Rate assistant response |
| `/ask` | POST | Required | Query knowledge bases (SSE) |
| `/mindmap` | POST | Required | Generate mind map |
| `/tts` | POST | Required | Stream audio output |

## Core Flow: Chat Completion (SSE Streaming)

```
┌─────────────────────────────────────────────────────────────────────────┐
│                      CHAT COMPLETION FLOW (SSE)                          │
└─────────────────────────────────────────────────────────────────────────┘

Client                          API                         Services
  │                              │                              │
  │ POST /completion             │                              │
  │ {conversation_id,            │                              │
  │  messages: [...]}            │                              │
  ├─────────────────────────────►│                              │
  │                              │                              │
  │                   ┌──────────┴──────────┐                   │
  │                   │ Validate request    │                   │
  │                   │ Load conversation   │                   │
  │                   │ Load dialog config  │                   │
  │                   └──────────┬──────────┘                   │
  │                              │                              │
  │                              │  DialogService.chat()        │
  │                              ├─────────────────────────────►│
  │                              │                              │
  │                              │     ┌────────────────────────┤
  │                              │     │ 1. Load LLM models     │
  │                              │     │ 2. Query refinement    │
  │                              │     │ 3. Retrieve chunks     │
  │                              │     │ 4. Rerank results      │
  │                              │     │ 5. Build prompt        │
  │                              │     │ 6. Stream LLM response │
  │                              │     └────────────────────────┤
  │                              │                              │
  │  SSE: data: {"answer": "H"}  │◄─────────────────────────────┤
  │◄─────────────────────────────┤                              │
  │                              │                              │
  │  SSE: data: {"answer": "He"} │◄─────────────────────────────┤
  │◄─────────────────────────────┤                              │
  │                              │                              │
  │  SSE: data: {"answer":"Hello,│◄─────────────────────────────┤
  │              "reference":{}} │                              │
  │◄─────────────────────────────┤                              │
  │                              │                              │
  │  ... (more tokens)           │                              │
  │                              │                              │
  │  SSE: data: {"done": true,   │◄─────────────────────────────┤
  │              "reference":    │                              │
  │              {"chunks":[...]}}                              │
  │◄─────────────────────────────┤                              │
  │                              │                              │
```

## Code Analysis

### Completion Endpoint (SSE Streaming)

```python
@manager.route("/completion", methods=["POST"])
@login_required
@validate_request("conversation_id", "messages")
async def completion():
    """
    Stream chat completion using Server-Sent Events.

    Request:
        - conversation_id: Conversation ID
        - messages: Array of {role, content}
        - stream: Boolean (default True)

    Response:
        - SSE stream of JSON objects
        - Each event: data: {answer, reference, audio_binary}
        - Final event: data: {done: true, reference: {...}}
    """
    req = await request.json
    conv_id = req["conversation_id"]

    # 1. Load conversation and dialog
    e, conv = ConversationService.get_by_id(conv_id)
    if not e:
        raise LookupError("Conversation not found")

    e, dia = DialogService.get_by_id(conv.dialog_id)
    if not e:
        raise LookupError("Dialog not found")

    # 2. Extract user message
    messages = req["messages"]
    msg = messages[-1]["content"]  # Latest user message

    # 3. Generate unique message ID
    message_id = get_uuid()

    # 4. Define streaming generator
    def stream():
        try:
            # Call DialogService.chat() which yields tokens
            for ans in chat(dia, msg, True, **req):
                # Structure the response
                ans = structure_answer(conv, ans, message_id, conv.id)

                yield "data:" + json.dumps({
                    "code": 0,
                    "message": "",
                    "data": ans
                }, ensure_ascii=False) + "\n\n"

        except Exception as e:
            logging.exception(e)
            yield "data:" + json.dumps({
                "code": 500,
                "message": str(e),
                "data": {"answer": "**ERROR**: " + str(e)}
            }, ensure_ascii=False) + "\n\n"

        # Final event
        yield "data:" + json.dumps({
            "code": 0,
            "message": "",
            "data": True
        }, ensure_ascii=False) + "\n\n"

    # 5. Return SSE response
    resp = Response(stream(), mimetype="text/event-stream")
    resp.headers.add_header("Cache-control", "no-cache")
    resp.headers.add_header("Connection", "keep-alive")
    resp.headers.add_header("X-Accel-Buffering", "no")
    return resp
```

### SSE Response Structure

```python
def structure_answer(conv, ans, message_id, conv_id):
    """
    Structure the streaming answer for client consumption.

    Args:
        conv: Conversation object
        ans: Raw answer from DialogService.chat()
        message_id: Unique message ID
        conv_id: Conversation ID

    Returns:
        Structured answer dict
    """
    return {
        "id": message_id,
        "conversation_id": conv_id,
        "answer": ans.get("answer", ""),
        "reference": ans.get("reference", {}),
        "audio_binary": ans.get("audio_binary"),  # TTS if enabled
        "created_at": time.time()
    }
```

### SSE Event Format

```
# Token streaming event
data: {"code": 0, "message": "", "data": {"id": "msg_123", "answer": "Hello", "reference": {}}}

# Intermediate event with partial answer
data: {"code": 0, "message": "", "data": {"id": "msg_123", "answer": "Hello, I can help", "reference": {}}}

# Final event with references
data: {"code": 0, "message": "", "data": {"id": "msg_123", "answer": "Hello, I can help you with that.", "reference": {"chunks": [...], "doc_aggs": [...]}}}

# Completion signal
data: {"code": 0, "message": "", "data": true}

# Error event
data: {"code": 500, "message": "Error details", "data": {"answer": "**ERROR**: ..."}}
```

### Ask Endpoint (Search with Summary)

```python
@manager.route("/ask", methods=["POST"])
@login_required
@validate_request("question", "kb_ids")
async def ask():
    """
    Search knowledge bases and generate summary.

    Request:
        - question: Search query
        - kb_ids: List of knowledge base IDs
        - search_config: Optional search configuration

    Response:
        - SSE stream with search results and summary
    """
    req = await request.json
    question = req["question"]
    kb_ids = req["kb_ids"]

    def stream():
        for ans in DialogService.ask(
            question,
            kb_ids,
            current_user.id,
            search_config=req.get("search_config", {})
        ):
            yield "data:" + json.dumps({
                "code": 0,
                "data": ans
            }, ensure_ascii=False) + "\n\n"

    resp = Response(stream(), mimetype="text/event-stream")
    resp.headers.add_header("Cache-control", "no-cache")
    return resp
```

### Mind Map Generation

```python
@manager.route("/mindmap", methods=["POST"])
@login_required
@validate_request("question", "kb_ids")
async def mindmap():
    """
    Generate mind map from retrieved content.

    Uses trio.run() for async processing with MindMapExtractor.
    """
    req = await request.json

    def stream():
        for ans in DialogService.gen_mindmap(
            req["question"],
            req["kb_ids"],
            current_user.id,
            search_config=req.get("search_config", {})
        ):
            yield "data:" + json.dumps({
                "code": 0,
                "data": ans
            }, ensure_ascii=False) + "\n\n"

    resp = Response(stream(), mimetype="text/event-stream")
    return resp
```

### TTS (Text-to-Speech) Streaming

```python
@manager.route("/tts", methods=["POST"])
@login_required
@validate_request("text")
async def tts():
    """
    Stream audio from text using TTS model.

    Request:
        - text: Text to convert to speech
        - dialog_id: Dialog ID for TTS configuration

    Response:
        - SSE stream with base64 audio chunks
    """
    req = await request.json
    text = req["text"]

    # Load TTS model from dialog config
    e, dia = DialogService.get_by_id(req.get("dialog_id"))
    tts_mdl = LLMBundle(current_user.id, LLMType.TTS, dia.tts_id)

    def stream():
        for audio_chunk in tts_mdl.tts(text):
            yield "data:" + json.dumps({
                "code": 0,
                "data": {"audio": base64.b64encode(audio_chunk).decode()}
            }) + "\n\n"

    resp = Response(stream(), mimetype="text/event-stream")
    return resp
```

## SSE Implementation Details

### Client-Side Handling (JavaScript)

```javascript
const eventSource = new EventSource('/v1/conversation/completion', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${token}`
    },
    body: JSON.stringify({
        conversation_id: convId,
        messages: [{ role: 'user', content: question }]
    })
});

eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data);

    if (data === true) {
        // Stream complete
        eventSource.close();
        return;
    }

    if (data.code !== 0) {
        console.error('Error:', data.message);
        return;
    }

    // Append streaming content
    appendToChat(data.answer);

    // Handle references when available
    if (data.reference?.chunks) {
        displayReferences(data.reference.chunks);
    }
};

eventSource.onerror = (error) => {
    console.error('SSE Error:', error);
    eventSource.close();
};
```

### HTTP Headers for SSE

```python
resp = Response(stream(), mimetype="text/event-stream")
resp.headers.add_header("Cache-control", "no-cache")      # Disable caching
resp.headers.add_header("Connection", "keep-alive")       # Keep connection open
resp.headers.add_header("X-Accel-Buffering", "no")       # Disable Nginx buffering
```

## Sequence Diagram: Chat with RAG

```mermaid
sequenceDiagram
    participant C as Client
    participant A as API (conversation_app)
    participant DS as DialogService
    participant ES as Elasticsearch
    participant LLM as LLM Provider

    C->>A: POST /completion (SSE)
    A->>A: Validate & Load conversation

    A->>DS: chat(dialog, message, stream=True)

    rect rgb(200, 220, 240)
        Note over DS,ES: Retrieval Phase
        DS->>DS: Generate query embedding
        DS->>ES: Hybrid search (vector + BM25)
        ES-->>DS: Candidate chunks
        DS->>DS: Rerank results
    end

    rect rgb(220, 240, 200)
        Note over DS,LLM: Generation Phase
        DS->>DS: Build prompt with context
        DS->>LLM: Stream completion

        loop Token streaming
            LLM-->>DS: Token chunk
            DS-->>A: yield {answer, reference}
            A-->>C: SSE: data: {...}
        end
    end

    DS->>DS: Save conversation message
    DS-->>A: yield {done: true, reference}
    A-->>C: SSE: data: {done: true}
```

## Error Handling

```python
def stream():
    try:
        for ans in chat(dia, msg, True, **req):
            yield format_sse(ans)

    except LookupError as e:
        # Resource not found
        yield error_sse(404, str(e))

    except PermissionError as e:
        # Authorization failed
        yield error_sse(403, str(e))

    except Exception as e:
        # Generic error
        logging.exception(e)
        yield error_sse(500, str(e))

    finally:
        # Always send completion signal
        yield "data: true\n\n"

def error_sse(code, message):
    return "data:" + json.dumps({
        "code": code,
        "message": message,
        "data": {"answer": f"**ERROR**: {message}"}
    }) + "\n\n"
```

## Performance Considerations

1. **Streaming vs Buffering**: SSE allows real-time token display
2. **Nginx Buffering**: Must disable with `X-Accel-Buffering: no`
3. **Connection Keep-Alive**: Long-lived connections for streaming
4. **Memory Management**: Generator-based streaming avoids memory buildup
5. **Timeout Handling**: LLM calls have configurable timeouts

## Related Files

- `/api/db/services/dialog_service.py` - RAG chat implementation
- `/api/db/services/conversation_service.py` - Conversation storage
- `/rag/llm/chat_model.py` - LLM streaming interface