ragflow/personal_analyze/03-RAG-ENGINE/README.md
Claude a6ee18476d
docs: Add detailed backend module analysis documentation
Add comprehensive documentation covering 6 modules:
- 01-API-LAYER: Authentication, routing, SSE streaming
- 02-SERVICE-LAYER: Dialog, Task, LLM service analysis
- 03-RAG-ENGINE: Hybrid search, embedding, reranking
- 04-AGENT-SYSTEM: Canvas engine, components, tools
- 05-DOCUMENT-PROCESSING: Task executor, PDF parsing
- 06-ALGORITHMS: BM25, fusion, RAPTOR

Total 28 documentation files with code analysis, diagrams, and formulas.
2025-11-26 11:10:54 +00:00

281 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 03-RAG-ENGINE - Retrieval-Augmented Generation
## Tổng Quan
RAG Engine là core của RAGFlow, implement các thuật toán retrieval, embedding, reranking và generation.
## Kiến Trúc RAG Engine
```
┌─────────────────────────────────────────────────────────────────────────┐
│ RAG ENGINE ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────┐
│ User Query │
└──────────────┬──────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ QUERY PROCESSING │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Tokenize │→ │ TF-IDF │→ │ Synonym │ │
│ │ Query │ │ Weight │ │ Expansion │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└────────────────────────────────────┬──────────────────────────────────┘
┌────────────────┴────────────────┐
│ │
▼ ▼
┌───────────────────────────────┐ ┌───────────────────────────────────┐
│ VECTOR SEARCH │ │ BM25 SEARCH │
│ ┌─────────────────────────┐ │ │ ┌─────────────────────────────┐ │
│ │ Embedding Model │ │ │ │ Full-text Query │ │
│ │ (OpenAI/BGE/Jina) │ │ │ │ (Elasticsearch) │ │
│ └───────────┬─────────────┘ │ │ └───────────┬─────────────────┘ │
│ │ │ │ │ │
│ ┌───────────▼─────────────┐ │ │ ┌───────────▼─────────────────┐ │
│ │ Cosine Similarity │ │ │ │ BM25 Scoring │ │
│ │ Score (0-1) │ │ │ │ Score │ │
│ └───────────┬─────────────┘ │ │ └───────────┬─────────────────┘ │
└──────────────┼────────────────┘ └──────────────┼────────────────────┘
│ │
└───────────────┬───────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ SCORE FUSION │
│ │
│ Final = α × Vector_Score + (1-α) × BM25_Score │
│ where α = vector_similarity_weight (default: 0.3) │
│ │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ RERANKING (Optional) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Cross-Encoder Model (Jina/Cohere/BGE) │ │
│ │ Re-score each chunk against query │ │
│ │ Return Top-K after reranking │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ CONTEXT BUILDING │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Format chunks into context string │ │
│ │ Add metadata (doc name, page, positions) │ │
│ │ Build citation mapping │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ LLM GENERATION │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ System Prompt + Context + User Query │ │
│ │ Token Fitting (stay within context window) │ │
│ │ Streaming Generation │ │
│ │ Citation Insertion │ │
│ └─────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────┘
```
## Cấu Trúc Thư Mục
```
/rag/
├── llm/ # LLM Model Abstractions
│ ├── chat_model.py # Chat LLM interface (30+ providers)
│ ├── embedding_model.py # Embedding models
│ ├── rerank_model.py # Reranking models
│ ├── cv_model.py # Computer vision
│ └── tts_model.py # Text-to-speech
├── nlp/ # NLP Processing
│ ├── query.py # Query processing
│ ├── search.py # Search & retrieval ⭐
│ └── rag_tokenizer.py # Tokenization
├── app/ # RAG Application
│ └── naive.py # Naive RAG implementation
├── flow/ # Processing Pipeline
│ ├── pipeline.py # Pipeline orchestration
│ ├── parser/ # Document parsing
│ ├── tokenizer/ # Tokenization
│ ├── splitter/ # Chunking
│ └── extractor/ # Information extraction
├── utils/ # Utilities
│ ├── es_conn.py # Elasticsearch connection
│ └── infinity_conn.py # Infinity connection
├── prompts/ # Prompt Templates
│ ├── generator.py # Prompt generator
│ ├── citations.md # Citation prompt
│ ├── keywords.md # Keyword extraction
│ └── ... # Other templates
├── raptor.py # RAPTOR algorithm
├── settings.py # Configuration
└── benchmark.py # Performance testing
```
## Files Trong Module Này
| File | Mô Tả |
|------|-------|
| [hybrid_search_algorithm.md](./hybrid_search_algorithm.md) | Thuật toán Hybrid Search (Vector + BM25) |
| [embedding_generation.md](./embedding_generation.md) | Text embedding và vector generation |
| [rerank_algorithm.md](./rerank_algorithm.md) | Cross-encoder reranking |
| [chunking_strategies.md](./chunking_strategies.md) | Document chunking strategies |
| [prompt_engineering.md](./prompt_engineering.md) | Prompt construction |
| [query_processing.md](./query_processing.md) | Query analysis |
## Core Algorithms
### 1. Hybrid Search
```python
# Score fusion formula
Final_Score = α × Vector_Score + (1-α) × BM25_Score
where:
α = vector_similarity_weight (default: 0.3)
Vector_Score = cosine_similarity(query_embedding, chunk_embedding)
BM25_Score = normalized_bm25(query_tokens, chunk_tokens)
```
### 2. BM25 Scoring
```python
# BM25 formula
BM25(D, Q) = Σ IDF(qi) × (f(qi, D) × (k1 + 1)) / (f(qi, D) + k1 × (1 - b + b × |D|/avgdl))
where:
f(qi, D) = term frequency of qi in document D
|D| = document length
avgdl = average document length
k1 = 1.2 (term frequency saturation)
b = 0.75 (length normalization)
```
### 3. Cosine Similarity
```python
# Cosine similarity formula
cos(θ) = (A · B) / (||A|| × ||B||)
where:
A, B = embedding vectors
A · B = dot product
||A|| = L2 norm
```
### 4. Cross-Encoder Reranking
```python
# Reranking score
Rerank_Score = CrossEncoder(query, document)
# Final ranking
Final_Rank = α × Token_Similarity + β × Vector_Similarity + γ × Rank_Features
where:
α = 0.3 (token weight)
β = 0.7 (vector weight)
γ = variable (PageRank, tag boost)
```
## LLM Provider Support
### Chat Models (30+)
| Provider | Models |
|----------|--------|
| OpenAI | GPT-3.5, GPT-4, GPT-4V |
| Anthropic | Claude 3 (Opus, Sonnet, Haiku) |
| Google | Gemini Pro |
| Alibaba | Qwen, Qwen-VL |
| Groq | LLaMA 3, Mixtral |
| Mistral | Mistral 7B, Mixtral 8x7B |
| Cohere | Command R, Command R+ |
| DeepSeek | DeepSeek Chat |
| Ollama | Local models |
| ... | And many more |
### Embedding Models
| Provider | Models | Dimensions |
|----------|--------|------------|
| OpenAI | text-embedding-3-small | 1536 |
| OpenAI | text-embedding-3-large | 3072 |
| BGE | bge-large-en-v1.5 | 1024 |
| BGE | bge-m3 | 1024 |
| Jina | jina-embeddings-v2 | 768 |
| Cohere | embed-english-v3 | 1024 |
### Reranking Models
| Provider | Models |
|----------|--------|
| Jina | jina-reranker-v2 |
| Cohere | rerank-english-v3 |
| BGE | bge-reranker-large |
| NVIDIA | rerank-qa-mistral-4b |
## Configuration Parameters
### Search Configuration
```python
{
"similarity_threshold": 0.2, # Minimum similarity
"vector_similarity_weight": 0.3, # α in fusion formula
"top_n": 6, # Final results count
"top_k": 1024, # Initial candidates
"rerank_model": "jina-reranker-v2"
}
```
### Chunking Configuration
```python
{
"chunk_token_num": 512, # Tokens per chunk
"delimiter": "\n!?。;!?", # Split delimiters
"layout_recognize": "DeepDOC", # Layout detection
"overlapped_percent": 0 # Chunk overlap
}
```
### Generation Configuration
```python
{
"temperature": 0.7,
"max_tokens": 2048,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0
}
```
## Key Performance Metrics
| Metric | Typical Value | Description |
|--------|---------------|-------------|
| Vector Search Latency | < 100ms | Elasticsearch query time |
| BM25 Search Latency | < 50ms | Full-text search time |
| Reranking Latency | 200-500ms | Cross-encoder inference |
| Embedding Generation | 1-5s/batch | Per batch of 16 texts |
| Total Retrieval | < 1s | End-to-end search |
## Related Files
- `/api/db/services/dialog_service.py` - Uses RAG engine
- `/rag/nlp/search.py` - Core search implementation
- `/rag/utils/es_conn.py` - Elasticsearch queries