ragflow/personal_analyze/03-RAG-ENGINE
Claude a6ee18476d
docs: Add detailed backend module analysis documentation
Add comprehensive documentation covering 6 modules:
- 01-API-LAYER: Authentication, routing, SSE streaming
- 02-SERVICE-LAYER: Dialog, Task, LLM service analysis
- 03-RAG-ENGINE: Hybrid search, embedding, reranking
- 04-AGENT-SYSTEM: Canvas engine, components, tools
- 05-DOCUMENT-PROCESSING: Task executor, PDF parsing
- 06-ALGORITHMS: BM25, fusion, RAPTOR

Total 28 documentation files with code analysis, diagrams, and formulas.
2025-11-26 11:10:54 +00:00
..
chunking_strategies.md docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
embedding_generation.md docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
hybrid_search_algorithm.md docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
query_processing.md docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
README.md docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00
rerank_algorithm.md docs: Add detailed backend module analysis documentation 2025-11-26 11:10:54 +00:00

03-RAG-ENGINE - Retrieval-Augmented Generation

Tổng Quan

RAG Engine là core của RAGFlow, implement các thuật toán retrieval, embedding, reranking và generation.

Kiến Trúc RAG Engine

┌─────────────────────────────────────────────────────────────────────────┐
│                          RAG ENGINE ARCHITECTURE                         │
└─────────────────────────────────────────────────────────────────────────┘

                    ┌─────────────────────────────┐
                    │         User Query          │
                    └──────────────┬──────────────┘
                                   │
                                   ▼
┌───────────────────────────────────────────────────────────────────────┐
│                       QUERY PROCESSING                                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                   │
│  │  Tokenize   │→ │  TF-IDF     │→ │  Synonym    │                   │
│  │  Query      │  │  Weight     │  │  Expansion  │                   │
│  └─────────────┘  └─────────────┘  └─────────────┘                   │
└────────────────────────────────────┬──────────────────────────────────┘
                                     │
                    ┌────────────────┴────────────────┐
                    │                                 │
                    ▼                                 ▼
┌───────────────────────────────┐  ┌───────────────────────────────────┐
│      VECTOR SEARCH            │  │        BM25 SEARCH                │
│  ┌─────────────────────────┐  │  │  ┌─────────────────────────────┐  │
│  │  Embedding Model        │  │  │  │  Full-text Query            │  │
│  │  (OpenAI/BGE/Jina)      │  │  │  │  (Elasticsearch)            │  │
│  └───────────┬─────────────┘  │  │  └───────────┬─────────────────┘  │
│              │                │  │              │                    │
│  ┌───────────▼─────────────┐  │  │  ┌───────────▼─────────────────┐  │
│  │  Cosine Similarity      │  │  │  │  BM25 Scoring               │  │
│  │  Score (0-1)            │  │  │  │  Score                      │  │
│  └───────────┬─────────────┘  │  │  └───────────┬─────────────────┘  │
└──────────────┼────────────────┘  └──────────────┼────────────────────┘
               │                                   │
               └───────────────┬───────────────────┘
                               │
                               ▼
┌───────────────────────────────────────────────────────────────────────┐
│                       SCORE FUSION                                     │
│                                                                        │
│   Final = α × Vector_Score + (1-α) × BM25_Score                       │
│   where α = vector_similarity_weight (default: 0.3)                   │
│                                                                        │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│                       RERANKING (Optional)                             │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │  Cross-Encoder Model (Jina/Cohere/BGE)                          │  │
│  │  Re-score each chunk against query                              │  │
│  │  Return Top-K after reranking                                   │  │
│  └─────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│                       CONTEXT BUILDING                                 │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │  Format chunks into context string                              │  │
│  │  Add metadata (doc name, page, positions)                       │  │
│  │  Build citation mapping                                         │  │
│  └─────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│                       LLM GENERATION                                   │
│  ┌─────────────────────────────────────────────────────────────────┐  │
│  │  System Prompt + Context + User Query                           │  │
│  │  Token Fitting (stay within context window)                     │  │
│  │  Streaming Generation                                           │  │
│  │  Citation Insertion                                             │  │
│  └─────────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────────────┘

Cấu Trúc Thư Mục

/rag/
├── llm/                      # LLM Model Abstractions
│   ├── chat_model.py         # Chat LLM interface (30+ providers)
│   ├── embedding_model.py    # Embedding models
│   ├── rerank_model.py       # Reranking models
│   ├── cv_model.py           # Computer vision
│   └── tts_model.py          # Text-to-speech
│
├── nlp/                      # NLP Processing
│   ├── query.py              # Query processing
│   ├── search.py             # Search & retrieval ⭐
│   └── rag_tokenizer.py      # Tokenization
│
├── app/                      # RAG Application
│   └── naive.py              # Naive RAG implementation
│
├── flow/                     # Processing Pipeline
│   ├── pipeline.py           # Pipeline orchestration
│   ├── parser/               # Document parsing
│   ├── tokenizer/            # Tokenization
│   ├── splitter/             # Chunking
│   └── extractor/            # Information extraction
│
├── utils/                    # Utilities
│   ├── es_conn.py            # Elasticsearch connection
│   └── infinity_conn.py      # Infinity connection
│
├── prompts/                  # Prompt Templates
│   ├── generator.py          # Prompt generator
│   ├── citations.md          # Citation prompt
│   ├── keywords.md           # Keyword extraction
│   └── ...                   # Other templates
│
├── raptor.py                 # RAPTOR algorithm
├── settings.py               # Configuration
└── benchmark.py              # Performance testing

Files Trong Module Này

File Mô Tả
hybrid_search_algorithm.md Thuật toán Hybrid Search (Vector + BM25)
embedding_generation.md Text embedding và vector generation
rerank_algorithm.md Cross-encoder reranking
chunking_strategies.md Document chunking strategies
prompt_engineering.md Prompt construction
query_processing.md Query analysis

Core Algorithms

# Score fusion formula
Final_Score = α × Vector_Score + (1-α) × BM25_Score

where:
    α = vector_similarity_weight (default: 0.3)
    Vector_Score = cosine_similarity(query_embedding, chunk_embedding)
    BM25_Score = normalized_bm25(query_tokens, chunk_tokens)

2. BM25 Scoring

# BM25 formula
BM25(D, Q) = Σ IDF(qi) × (f(qi, D) × (k1 + 1)) / (f(qi, D) + k1 × (1 - b + b × |D|/avgdl))

where:
    f(qi, D) = term frequency of qi in document D
    |D| = document length
    avgdl = average document length
    k1 = 1.2 (term frequency saturation)
    b = 0.75 (length normalization)

3. Cosine Similarity

# Cosine similarity formula
cos(θ) = (A · B) / (||A|| × ||B||)

where:
    A, B = embedding vectors
    A · B = dot product
    ||A|| = L2 norm

4. Cross-Encoder Reranking

# Reranking score
Rerank_Score = CrossEncoder(query, document)

# Final ranking
Final_Rank = α × Token_Similarity + β × Vector_Similarity + γ × Rank_Features

where:
    α = 0.3 (token weight)
    β = 0.7 (vector weight)
    γ = variable (PageRank, tag boost)

LLM Provider Support

Chat Models (30+)

Provider Models
OpenAI GPT-3.5, GPT-4, GPT-4V
Anthropic Claude 3 (Opus, Sonnet, Haiku)
Google Gemini Pro
Alibaba Qwen, Qwen-VL
Groq LLaMA 3, Mixtral
Mistral Mistral 7B, Mixtral 8x7B
Cohere Command R, Command R+
DeepSeek DeepSeek Chat
Ollama Local models
... And many more

Embedding Models

Provider Models Dimensions
OpenAI text-embedding-3-small 1536
OpenAI text-embedding-3-large 3072
BGE bge-large-en-v1.5 1024
BGE bge-m3 1024
Jina jina-embeddings-v2 768
Cohere embed-english-v3 1024

Reranking Models

Provider Models
Jina jina-reranker-v2
Cohere rerank-english-v3
BGE bge-reranker-large
NVIDIA rerank-qa-mistral-4b

Configuration Parameters

Search Configuration

{
    "similarity_threshold": 0.2,      # Minimum similarity
    "vector_similarity_weight": 0.3,  # α in fusion formula
    "top_n": 6,                       # Final results count
    "top_k": 1024,                    # Initial candidates
    "rerank_model": "jina-reranker-v2"
}

Chunking Configuration

{
    "chunk_token_num": 512,           # Tokens per chunk
    "delimiter": "\n!?。;!?",       # Split delimiters
    "layout_recognize": "DeepDOC",    # Layout detection
    "overlapped_percent": 0           # Chunk overlap
}

Generation Configuration

{
    "temperature": 0.7,
    "max_tokens": 2048,
    "top_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0
}

Key Performance Metrics

Metric Typical Value Description
Vector Search Latency < 100ms Elasticsearch query time
BM25 Search Latency < 50ms Full-text search time
Reranking Latency 200-500ms Cross-encoder inference
Embedding Generation 1-5s/batch Per batch of 16 texts
Total Retrieval < 1s End-to-end search
  • /api/db/services/dialog_service.py - Uses RAG engine
  • /rag/nlp/search.py - Core search implementation
  • /rag/utils/es_conn.py - Elasticsearch queries