# Dialog Service Analysis - Core RAG Implementation

## Tổng Quan

`dialog_service.py` (37KB) là service quan trọng nhất, implement toàn bộ **RAG pipeline** từ retrieval đến generation.

## File Location
```
/api/db/services/dialog_service.py
```

## Core Method: `chat()`

Đây là method chính xử lý RAG chat với streaming response.

### Complete Flow Diagram

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        RAG CHAT PIPELINE                                 │
└─────────────────────────────────────────────────────────────────────────┘

INPUT: dialog, messages[], stream=True
                │
                ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [1] MODEL INITIALIZATION                                               │
│                                                                        │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  │
│  │ Embedding   │  │   Chat      │  │  Reranker   │  │    TTS      │  │
│  │   Model     │  │   Model     │  │   Model     │  │   Model     │  │
│  │             │  │             │  │ (optional)  │  │ (optional)  │  │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘  │
│                                                                        │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [2] QUESTION PROCESSING                                                │
│                                                                        │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Multi-turn Refinement (if enabled)                              │  │
│  │   "What about Python?" → "What is Python programming language?" │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                              │                                         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Cross-language Translation (if enabled)                         │  │
│  │   "Python是什么?" → ["What is Python?", "Python是什么?"]         │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                              │                                         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Keyword Extraction (if enabled)                                 │  │
│  │   "What is machine learning?" → ["machine learning", "ML", "AI"]│  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                                                        │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [3] METADATA FILTERING (Optional)                                      │
│                                                                        │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Auto-generate filters from question via LLM                     │  │
│  │   "Q3 2024 revenue" → {"quarter": "Q3", "year": "2024"}        │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                              │                                         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Apply filters to get document IDs                               │  │
│  │   doc_ids = filter_by_metadata(conditions)                      │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                                                        │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [4] RETRIEVAL PHASE                                                    │
│                                                                        │
│  Option A: Deep Research Mode (reasoning=True)                         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ DeepResearcher.thinking()                                       │  │
│  │   • Multi-step reasoning                                        │  │
│  │   • Iterative retrieval                                         │  │
│  │   • Self-reflection                                             │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                                                        │
│  Option B: Standard Retrieval                                          │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ retriever.retrieval(                                            │  │
│  │     question,                                                   │  │
│  │     embd_mdl,           # Embedding model                       │  │
│  │     tenant_ids,                                                 │  │
│  │     kb_ids,             # Knowledge base IDs                    │  │
│  │     page=1,                                                     │  │
│  │     page_size=top_n,    # Default: 6                           │  │
│  │     similarity_threshold=0.2,                                   │  │
│  │     vector_similarity_weight=0.3,                              │  │
│  │     rerank_mdl=rerank_mdl                                      │  │
│  │ )                                                               │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                                                        │
│  Optional Enhancements:                                                │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ • TOC Enhancement: retrieval_by_toc()                          │  │
│  │ • Web Search: Tavily API integration                           │  │
│  │ • Knowledge Graph: kg_retriever.retrieval()                    │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                                                        │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [5] ANSWER GENERATION                                                  │
│                                                                        │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Build Prompt:                                                   │  │
│  │   system_prompt = prompt_config["system"].format(**kwargs)      │  │
│  │   + citation_prompt (if quote=True)                            │  │
│  │   + retrieved_context                                          │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                              │                                         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Token Fitting:                                                  │  │
│  │   used_tokens, msg = message_fit_in(msg, max_tokens * 0.95)     │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                              │                                         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Stream Generation:                                              │  │
│  │   for token in chat_mdl.chat_streamly(prompt, msg, gen_conf):   │  │
│  │       yield {"answer": accumulated, "reference": {}}            │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                                                        │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [6] CITATION PROCESSING                                                │
│                                                                        │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Insert Citations:                                               │  │
│  │   answer, idx = retriever.insert_citations(                     │  │
│  │       answer,                                                   │  │
│  │       chunk_contents,                                          │  │
│  │       chunk_vectors,                                           │  │
│  │       embd_mdl,                                                │  │
│  │       tkweight=0.7,    # Token similarity weight               │  │
│  │       vtweight=0.3     # Vector similarity weight              │  │
│  │   )                                                            │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                              │                                         │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │ Repair Bad Citations:                                           │  │
│  │   • Fix malformed citation formats                             │  │
│  │   • Merge duplicate citations                                  │  │
│  └─────────────────────────────────────────────────────────────────┘  │
│                                                                        │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
OUTPUT: Generator[{
    "answer": "Response text with [1] citations...",
    "reference": {
        "chunks": [...],
        "doc_aggs": [...]
    },
    "audio_binary": bytes (if TTS enabled)
}]
```

### Code Implementation

```python
@classmethod
def chat(cls, dialog, messages, stream=True, **kwargs):
    """
    Main RAG chat pipeline.

    Args:
        dialog: Dialog configuration object
        messages: List of conversation messages
        stream: Enable streaming response
        **kwargs: Additional parameters

    Yields:
        Dict with answer, reference, and optional audio
    """

    # ========================================
    # [1] MODEL INITIALIZATION
    # ========================================

    # Get embedding model from knowledge bases
    e, kbs = KnowledgebaseService.get_by_ids(dialog.kb_ids)
    embd_mdl = LLMBundle(dialog.tenant_id, LLMType.EMBEDDING, kbs[0].embd_id)

    # Get chat model
    chat_mdl = LLMBundle(dialog.tenant_id, LLMType.CHAT, dialog.llm_id)

    # Get reranker (optional)
    rerank_mdl = None
    if dialog.rerank_id:
        rerank_mdl = LLMBundle(dialog.tenant_id, LLMType.RERANK, dialog.rerank_id)

    # Get TTS model (optional)
    tts_mdl = None
    if dialog.prompt_config.get("tts"):
        tts_mdl = LLMBundle(dialog.tenant_id, LLMType.TTS, dialog.tts_id)

    # ========================================
    # [2] QUESTION PROCESSING
    # ========================================

    # Extract user question
    question = messages[-1]["content"]
    questions = [question]

    # Multi-turn refinement
    if dialog.prompt_config.get("refine_multiturn") and len(messages) > 1:
        refined = refine_question(chat_mdl, messages)
        questions = [refined]

    # Cross-language translation
    if dialog.prompt_config.get("cross_languages"):
        translated = translate_question(
            chat_mdl,
            question,
            dialog.prompt_config["cross_languages"]
        )
        questions.extend(translated)

    # Keyword extraction
    if dialog.prompt_config.get("keyword"):
        keywords = extract_keywords(chat_mdl, question)
        questions.extend(keywords)

    # ========================================
    # [3] METADATA FILTERING
    # ========================================

    doc_ids = None

    if kwargs.get("doc_ids"):
        # Manual document filtering
        doc_ids = kwargs["doc_ids"]

    elif dialog.prompt_config.get("meta_data_filter"):
        # Auto-generate filters from question
        metas = DocumentService.get_meta_by_kbs(dialog.kb_ids)

        if dialog.prompt_config["meta_data_filter"]["method"] == "auto":
            # LLM generates filter conditions
            filters = gen_meta_filter(chat_mdl, metas, question)
            doc_ids = meta_filter(metas, filters["conditions"])
        else:
            # Manual filter conditions
            doc_ids = meta_filter(
                metas,
                dialog.prompt_config["meta_data_filter"]["conditions"]
            )

    # ========================================
    # [4] RETRIEVAL PHASE
    # ========================================

    if dialog.prompt_config.get("reasoning"):
        # Deep Research Mode
        reasoner = DeepResearcher(
            chat_mdl,
            dialog.prompt_config,
            lambda q: retriever.retrieval(q, embd_mdl, ...)
        )

        for think in reasoner.thinking(kbinfos, questions):
            yield {"answer": think["thought"], "reference": {}}

        kbinfos = reasoner.get_final_context()
    else:
        # Standard Retrieval
        kbinfos = retriever.retrieval(
            question=" ".join(questions),
            embd_mdl=embd_mdl,
            tenant_ids=[kb.tenant_id for kb in kbs],
            kb_ids=dialog.kb_ids,
            page=1,
            page_size=dialog.top_n,
            similarity_threshold=dialog.similarity_threshold,
            vector_similarity_weight=dialog.vector_similarity_weight,
            doc_ids=doc_ids,
            top=dialog.top_k,
            rerank_mdl=rerank_mdl
        )

    # Optional: TOC Enhancement
    if dialog.prompt_config.get("toc_enhance"):
        kbinfos["chunks"] = retriever.retrieval_by_toc(
            question,
            kbinfos["chunks"]
        )

    # Optional: Web Search (Tavily)
    if dialog.prompt_config.get("tavily_api_key"):
        web_results = tavily_search(
            question,
            dialog.prompt_config["tavily_api_key"]
        )
        kbinfos["chunks"].extend(web_results)

    # Optional: Knowledge Graph
    if dialog.prompt_config.get("use_kg"):
        kg_result = kg_retriever.retrieval(question, dialog.kb_ids)
        if kg_result:
            kbinfos["chunks"].insert(0, kg_result)

    # ========================================
    # [5] ANSWER GENERATION
    # ========================================

    # Build prompt
    prompt_config = dialog.prompt_config
    system_prompt = prompt_config["system"].format(**kwargs)

    # Add citation prompt if quotes enabled
    if prompt_config.get("quote") and kbinfos["chunks"]:
        system_prompt += citation_prompt(question)

    # Build context from retrieved chunks
    context = kb_prompt(kbinfos)

    # Build message history
    msg = [{"role": "system", "content": system_prompt + context}]
    msg.extend(messages)

    # Token fitting (use 95% of max tokens)
    max_tokens = chat_mdl.max_length
    used_tokens, msg = message_fit_in(msg, int(max_tokens * 0.95))

    # Generation config
    gen_conf = dialog.llm_setting.copy()
    gen_conf["max_tokens"] = min(
        gen_conf.get("max_tokens", 2048),
        max_tokens - used_tokens
    )

    # Stream generation
    answer = ""
    for chunk in chat_mdl.chat_streamly(system_prompt, msg[1:], gen_conf):
        answer = chunk
        yield {
            "answer": answer,
            "reference": {},
            "audio_binary": tts_mdl.tts(answer) if tts_mdl else None
        }

    # ========================================
    # [6] CITATION PROCESSING
    # ========================================

    if prompt_config.get("quote") and kbinfos["chunks"]:
        # Insert citations
        answer, idx = retriever.insert_citations(
            answer,
            [ck["content_ltks"] for ck in kbinfos["chunks"]],
            [ck["vector"] for ck in kbinfos["chunks"]],
            embd_mdl,
            tkweight=1 - dialog.vector_similarity_weight,
            vtweight=dialog.vector_similarity_weight
        )

        # Repair malformed citations
        answer, idx = repair_bad_citation_formats(answer, kbinfos, idx)

    # Final yield with references
    yield decorate_answer(answer, kbinfos, idx)
```

## Supporting Methods

### Question Refinement

```python
def refine_question(chat_mdl, messages):
    """
    Refine question for multi-turn context.

    Example:
        User: "What is Python?"
        Assistant: "Python is a programming language..."
        User: "What about its main features?"
        → Refined: "What are the main features of Python programming language?"
    """
    prompt = """Given the conversation history, rewrite the last question
    to be self-contained and clear.

    Conversation:
    {history}

    Last question: {question}

    Rewritten question:"""

    history = format_history(messages[:-1])
    question = messages[-1]["content"]

    return chat_mdl.chat(prompt.format(history=history, question=question))
```

### Cross-Language Translation

```python
def translate_question(chat_mdl, question, languages):
    """
    Translate question to multiple languages for broader retrieval.

    Args:
        question: Original question
        languages: List of target languages

    Returns:
        List of translated questions
    """
    translations = []
    for lang in languages:
        prompt = f"Translate to {lang}: {question}"
        translated = chat_mdl.chat(prompt)
        translations.append(translated)
    return translations
```

### Metadata Filtering

```python
def gen_meta_filter(chat_mdl, metas, question):
    """
    Generate metadata filters from question using LLM.

    Args:
        metas: Available metadata fields and values
        question: User question

    Returns:
        Filter conditions dict
    """
    prompt = f"""Given these metadata fields:
    {json.dumps(metas)}

    And this question: {question}

    Generate filter conditions as JSON:
    {{"conditions": [{{"field": "...", "operator": "==", "value": "..."}}]}}
    """

    response = chat_mdl.chat(prompt)
    return json.loads(response)
```

### Citation Processing

```python
def decorate_answer(answer, kbinfos, citation_indices):
    """
    Decorate final answer with references.

    Returns:
        {
            "answer": "Answer with [1] citations...",
            "reference": {
                "chunks": [
                    {
                        "chunk_id": "...",
                        "content": "...",
                        "doc_id": "...",
                        "docnm_kwd": "Document Name",
                        "positions": [[x0, x1, top, bottom]],
                        "similarity": 0.85
                    }
                ],
                "doc_aggs": [
                    {"doc_id": "...", "doc_name": "...", "count": 3}
                ]
            }
        }
    """
    # Filter chunks that are actually cited
    cited_chunks = [
        kbinfos["chunks"][i]
        for i in citation_indices
        if i < len(kbinfos["chunks"])
    ]

    return {
        "answer": answer,
        "reference": {
            "chunks": cited_chunks,
            "doc_aggs": kbinfos.get("doc_aggs", [])
        }
    }
```

## Configuration Options

### Dialog.prompt_config

```python
{
    # Basic settings
    "system": "You are a helpful assistant...",
    "prologue": "Hello! How can I help you?",
    "empty_response": "I couldn't find relevant information.",

    # Citation settings
    "quote": True,           # Enable citations [1], [2]

    # Retrieval enhancements
    "toc_enhance": False,    # Use table of contents
    "reasoning": False,      # Deep research mode
    "use_kg": False,         # Knowledge graph

    # Question processing
    "refine_multiturn": False,  # Multi-turn refinement
    "cross_languages": [],      # ["English", "Chinese"]
    "keyword": False,           # Extract keywords

    # External search
    "tavily_api_key": "",    # Tavily web search

    # Audio
    "tts": False,            # Text-to-speech

    # Metadata filtering
    "meta_data_filter": {
        "method": "auto",    # or "manual"
        "conditions": []
    }
}
```

### Dialog.llm_setting

```python
{
    "temperature": 0.7,
    "max_tokens": 2048,
    "top_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0
}
```

## Performance Metrics

The `decorate_answer()` function tracks:

```python
{
    "total_time": 5.2,           # Total execution time
    "llm_init_time": 0.1,        # Model initialization
    "retrieval_time": 1.5,       # Search time
    "generation_time": 3.5,      # LLM generation
    "tokens_per_second": 45.2,   # Generation speed
    "input_tokens": 1500,        # Prompt tokens
    "output_tokens": 250         # Response tokens
}
```

## Related Methods

| Method | Purpose |
|--------|---------|
| `chat_solo()` | Chat without RAG (no retrieval) |
| `ask()` | Search-focused with summary |
| `gen_mindmap()` | Generate mind map from content |
| `use_sql()` | SQL-based structured retrieval |

## Related Files

- `/rag/nlp/search.py` - retriever implementation
- `/rag/llm/chat_model.py` - LLM interface
- `/rag/prompts/*.md` - Prompt templates