# 03-RAG-ENGINE - Retrieval-Augmented Generation ## Tổng Quan RAG Engine là core của RAGFlow, implement các thuật toán retrieval, embedding, reranking và generation. ## Kiến Trúc RAG Engine ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ RAG ENGINE ARCHITECTURE │ └─────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────┐ │ User Query │ └──────────────┬──────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────┐ │ QUERY PROCESSING │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Tokenize │→ │ TF-IDF │→ │ Synonym │ │ │ │ Query │ │ Weight │ │ Expansion │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ └────────────────────────────────────┬──────────────────────────────────┘ │ ┌────────────────┴────────────────┐ │ │ ▼ ▼ ┌───────────────────────────────┐ ┌───────────────────────────────────┐ │ VECTOR SEARCH │ │ BM25 SEARCH │ │ ┌─────────────────────────┐ │ │ ┌─────────────────────────────┐ │ │ │ Embedding Model │ │ │ │ Full-text Query │ │ │ │ (OpenAI/BGE/Jina) │ │ │ │ (Elasticsearch) │ │ │ └───────────┬─────────────┘ │ │ └───────────┬─────────────────┘ │ │ │ │ │ │ │ │ ┌───────────▼─────────────┐ │ │ ┌───────────▼─────────────────┐ │ │ │ Cosine Similarity │ │ │ │ BM25 Scoring │ │ │ │ Score (0-1) │ │ │ │ Score │ │ │ └───────────┬─────────────┘ │ │ └───────────┬─────────────────┘ │ └──────────────┼────────────────┘ └──────────────┼────────────────────┘ │ │ └───────────────┬───────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────┐ │ SCORE FUSION │ │ │ │ Final = α × Vector_Score + (1-α) × BM25_Score │ │ where α = vector_similarity_weight (default: 0.3) │ │ │ └────────────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────┐ │ RERANKING (Optional) │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Cross-Encoder Model (Jina/Cohere/BGE) │ │ │ │ Re-score each chunk against query │ │ │ │ Return Top-K after reranking │ │ │ └─────────────────────────────────────────────────────────────────┘ │ └────────────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────┐ │ CONTEXT BUILDING │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Format chunks into context string │ │ │ │ Add metadata (doc name, page, positions) │ │ │ │ Build citation mapping │ │ │ └─────────────────────────────────────────────────────────────────┘ │ └────────────────────────────────┬──────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────┐ │ LLM GENERATION │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ System Prompt + Context + User Query │ │ │ │ Token Fitting (stay within context window) │ │ │ │ Streaming Generation │ │ │ │ Citation Insertion │ │ │ └─────────────────────────────────────────────────────────────────┘ │ └───────────────────────────────────────────────────────────────────────┘ ``` ## Cấu Trúc Thư Mục ``` /rag/ ├── llm/ # LLM Model Abstractions │ ├── chat_model.py # Chat LLM interface (30+ providers) │ ├── embedding_model.py # Embedding models │ ├── rerank_model.py # Reranking models │ ├── cv_model.py # Computer vision │ └── tts_model.py # Text-to-speech │ ├── nlp/ # NLP Processing │ ├── query.py # Query processing │ ├── search.py # Search & retrieval ⭐ │ └── rag_tokenizer.py # Tokenization │ ├── app/ # RAG Application │ └── naive.py # Naive RAG implementation │ ├── flow/ # Processing Pipeline │ ├── pipeline.py # Pipeline orchestration │ ├── parser/ # Document parsing │ ├── tokenizer/ # Tokenization │ ├── splitter/ # Chunking │ └── extractor/ # Information extraction │ ├── utils/ # Utilities │ ├── es_conn.py # Elasticsearch connection │ └── infinity_conn.py # Infinity connection │ ├── prompts/ # Prompt Templates │ ├── generator.py # Prompt generator │ ├── citations.md # Citation prompt │ ├── keywords.md # Keyword extraction │ └── ... # Other templates │ ├── raptor.py # RAPTOR algorithm ├── settings.py # Configuration └── benchmark.py # Performance testing ``` ## Files Trong Module Này | File | Mô Tả | |------|-------| | [hybrid_search_algorithm.md](./hybrid_search_algorithm.md) | Thuật toán Hybrid Search (Vector + BM25) | | [embedding_generation.md](./embedding_generation.md) | Text embedding và vector generation | | [rerank_algorithm.md](./rerank_algorithm.md) | Cross-encoder reranking | | [chunking_strategies.md](./chunking_strategies.md) | Document chunking strategies | | [prompt_engineering.md](./prompt_engineering.md) | Prompt construction | | [query_processing.md](./query_processing.md) | Query analysis | ## Core Algorithms ### 1. Hybrid Search ```python # Score fusion formula Final_Score = α × Vector_Score + (1-α) × BM25_Score where: α = vector_similarity_weight (default: 0.3) Vector_Score = cosine_similarity(query_embedding, chunk_embedding) BM25_Score = normalized_bm25(query_tokens, chunk_tokens) ``` ### 2. BM25 Scoring ```python # BM25 formula BM25(D, Q) = Σ IDF(qi) × (f(qi, D) × (k1 + 1)) / (f(qi, D) + k1 × (1 - b + b × |D|/avgdl)) where: f(qi, D) = term frequency of qi in document D |D| = document length avgdl = average document length k1 = 1.2 (term frequency saturation) b = 0.75 (length normalization) ``` ### 3. Cosine Similarity ```python # Cosine similarity formula cos(θ) = (A · B) / (||A|| × ||B||) where: A, B = embedding vectors A · B = dot product ||A|| = L2 norm ``` ### 4. Cross-Encoder Reranking ```python # Reranking score Rerank_Score = CrossEncoder(query, document) # Final ranking Final_Rank = α × Token_Similarity + β × Vector_Similarity + γ × Rank_Features where: α = 0.3 (token weight) β = 0.7 (vector weight) γ = variable (PageRank, tag boost) ``` ## LLM Provider Support ### Chat Models (30+) | Provider | Models | |----------|--------| | OpenAI | GPT-3.5, GPT-4, GPT-4V | | Anthropic | Claude 3 (Opus, Sonnet, Haiku) | | Google | Gemini Pro | | Alibaba | Qwen, Qwen-VL | | Groq | LLaMA 3, Mixtral | | Mistral | Mistral 7B, Mixtral 8x7B | | Cohere | Command R, Command R+ | | DeepSeek | DeepSeek Chat | | Ollama | Local models | | ... | And many more | ### Embedding Models | Provider | Models | Dimensions | |----------|--------|------------| | OpenAI | text-embedding-3-small | 1536 | | OpenAI | text-embedding-3-large | 3072 | | BGE | bge-large-en-v1.5 | 1024 | | BGE | bge-m3 | 1024 | | Jina | jina-embeddings-v2 | 768 | | Cohere | embed-english-v3 | 1024 | ### Reranking Models | Provider | Models | |----------|--------| | Jina | jina-reranker-v2 | | Cohere | rerank-english-v3 | | BGE | bge-reranker-large | | NVIDIA | rerank-qa-mistral-4b | ## Configuration Parameters ### Search Configuration ```python { "similarity_threshold": 0.2, # Minimum similarity "vector_similarity_weight": 0.3, # α in fusion formula "top_n": 6, # Final results count "top_k": 1024, # Initial candidates "rerank_model": "jina-reranker-v2" } ``` ### Chunking Configuration ```python { "chunk_token_num": 512, # Tokens per chunk "delimiter": "\n!?。;!?", # Split delimiters "layout_recognize": "DeepDOC", # Layout detection "overlapped_percent": 0 # Chunk overlap } ``` ### Generation Configuration ```python { "temperature": 0.7, "max_tokens": 2048, "top_p": 1.0, "frequency_penalty": 0.0, "presence_penalty": 0.0 } ``` ## Key Performance Metrics | Metric | Typical Value | Description | |--------|---------------|-------------| | Vector Search Latency | < 100ms | Elasticsearch query time | | BM25 Search Latency | < 50ms | Full-text search time | | Reranking Latency | 200-500ms | Cross-encoder inference | | Embedding Generation | 1-5s/batch | Per batch of 16 texts | | Total Retrieval | < 1s | End-to-end search | ## Related Files - `/api/db/services/dialog_service.py` - Uses RAG engine - `/rag/nlp/search.py` - Core search implementation - `/rag/utils/es_conn.py` - Elasticsearch queries