Add comprehensive documentation covering 6 modules: - 01-API-LAYER: Authentication, routing, SSE streaming - 02-SERVICE-LAYER: Dialog, Task, LLM service analysis - 03-RAG-ENGINE: Hybrid search, embedding, reranking - 04-AGENT-SYSTEM: Canvas engine, components, tools - 05-DOCUMENT-PROCESSING: Task executor, PDF parsing - 06-ALGORITHMS: BM25, fusion, RAPTOR Total 28 documentation files with code analysis, diagrams, and formulas.
272 lines
9.6 KiB
Markdown
272 lines
9.6 KiB
Markdown
# Rerank Algorithm
|
||
|
||
## Tong Quan
|
||
|
||
Reranking sử dụng cross-encoder models để re-score và sắp xếp lại search results dựa trên query-document relevance.
|
||
|
||
## File Location
|
||
```
|
||
/rag/llm/rerank_model.py
|
||
/rag/nlp/search.py (rerank_by_model method)
|
||
```
|
||
|
||
## Reranking Flow
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ INITIAL SEARCH RESULTS │
|
||
│ Top 1024 candidates from hybrid search │
|
||
└──────────────────────────┬──────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ CROSS-ENCODER RERANKING │
|
||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||
│ │ For each (query, document) pair: │ │
|
||
│ │ score = CrossEncoder(query, document) │ │
|
||
│ └─────────────────────────────────────────────────────────┘ │
|
||
└──────────────────────────┬──────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ SCORE FUSION │
|
||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||
│ │ final_score = α × token_sim + β × vector_sim + γ × rank │ │
|
||
│ │ where α=0.3, β=0.7, γ=variable │ │
|
||
│ └─────────────────────────────────────────────────────────┘ │
|
||
└──────────────────────────┬──────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ TOP-N RESULTS │
|
||
│ Return top 6 (default) highest scoring documents │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Supported Rerank Models
|
||
|
||
| Provider | Class | Notes |
|
||
|----------|-------|-------|
|
||
| Jina | `JinaRerank` | multilingual |
|
||
| Cohere | `CoHereRerank` | Native SDK |
|
||
| NVIDIA | `NvidiaRerank` | Model-specific URLs |
|
||
| Voyage AI | `VoyageRerank` | Token counting |
|
||
| Qwen | `QWenRerank` | Dashscope |
|
||
| BGE | `HuggingfaceRerank` | TEI HTTP |
|
||
| LocalAI | `LocalAIRerank` | Custom normalization |
|
||
| SILICONFLOW | `SILICONFLOWRerank` | Chunk config |
|
||
|
||
## Base Implementation
|
||
|
||
```python
|
||
class Base(ABC):
|
||
def similarity(self, query: str, texts: list) -> tuple[np.ndarray, int]:
|
||
"""
|
||
Calculate relevance scores for query-document pairs.
|
||
|
||
Args:
|
||
query: Search query
|
||
texts: List of document texts
|
||
|
||
Returns:
|
||
(scores, token_count): Array of relevance scores and tokens used
|
||
"""
|
||
raise NotImplementedError()
|
||
```
|
||
|
||
## Jina Rerank
|
||
|
||
```python
|
||
class JinaRerank(Base):
|
||
def __init__(self, key, model_name, base_url=None):
|
||
self.headers = {
|
||
"Content-Type": "application/json",
|
||
"Authorization": f"Bearer {key}"
|
||
}
|
||
self.base_url = base_url or "https://api.jina.ai/v1/rerank"
|
||
self.model_name = model_name
|
||
|
||
def similarity(self, query: str, texts: list):
|
||
texts = [truncate(t, 8196) for t in texts]
|
||
|
||
data = {
|
||
"model": self.model_name,
|
||
"query": query,
|
||
"documents": texts,
|
||
"top_n": len(texts)
|
||
}
|
||
|
||
res = requests.post(self.base_url, headers=self.headers, json=data).json()
|
||
rank = np.zeros(len(texts), dtype=float)
|
||
|
||
for d in res["results"]:
|
||
rank[d["index"]] = d["relevance_score"]
|
||
|
||
return rank, total_token_count_from_response(res)
|
||
```
|
||
|
||
## LocalAI Rerank with Normalization
|
||
|
||
```python
|
||
class LocalAIRerank(Base):
|
||
def similarity(self, query: str, texts: list):
|
||
# ... API call ...
|
||
|
||
# Normalize scores to [0, 1] range
|
||
min_rank = np.min(rank)
|
||
max_rank = np.max(rank)
|
||
|
||
if not np.isclose(min_rank, max_rank, atol=1e-3):
|
||
rank = (rank - min_rank) / (max_rank - min_rank)
|
||
else:
|
||
rank = np.zeros_like(rank)
|
||
|
||
return rank, token_count
|
||
```
|
||
|
||
## Rerank Integration in Search
|
||
|
||
```python
|
||
# In search.py - rerank_by_model()
|
||
|
||
def rerank_by_model(self, rerank_mdl, sres, question,
|
||
tkweight=0.3, vtweight=0.7, rank_feature=None):
|
||
"""
|
||
Rerank search results using cross-encoder model.
|
||
|
||
Args:
|
||
rerank_mdl: Reranking model instance
|
||
sres: Search results with content
|
||
question: Original query
|
||
tkweight: Token similarity weight (default 0.3)
|
||
vtweight: Vector similarity weight (default 0.7)
|
||
rank_feature: Optional PageRank scores
|
||
|
||
Returns:
|
||
(combined_sim, token_sim, vector_sim): Score arrays
|
||
"""
|
||
|
||
# Extract content for reranking
|
||
contents = [sres.field[id]["content_with_weight"] for id in sres.ids]
|
||
|
||
# Call rerank model
|
||
rank_scores, token_count = rerank_mdl.similarity(question, contents)
|
||
|
||
# Get original similarities
|
||
tksim = [sres.field[id].get("term_sim", 0) for id in sres.ids]
|
||
vsim = [sres.field[id].get("vector_sim", 0) for id in sres.ids]
|
||
|
||
# Weighted combination
|
||
combined = []
|
||
for i, id in enumerate(sres.ids):
|
||
score = tkweight * tksim[i] + vtweight * vsim[i]
|
||
|
||
# Add rank feature (PageRank) if available
|
||
if rank_feature and id in rank_feature:
|
||
score *= (1 + rank_feature[id])
|
||
|
||
# Incorporate rerank score
|
||
score = score * 0.5 + rank_scores[i] * 0.5
|
||
|
||
combined.append(score)
|
||
|
||
return np.array(combined), tksim, vsim
|
||
```
|
||
|
||
## Hybrid Similarity (Without Rerank Model)
|
||
|
||
```python
|
||
def hybrid_similarity(self, avec, bvecs, atks, btkss, tkweight=0.3, vtweight=0.7):
|
||
"""
|
||
Calculate hybrid similarity without rerank model.
|
||
|
||
Uses:
|
||
- Cosine similarity for vectors
|
||
- Token overlap for text matching
|
||
"""
|
||
from sklearn.metrics.pairwise import cosine_similarity
|
||
|
||
# Vector similarity
|
||
vsim = cosine_similarity([avec], bvecs)[0]
|
||
|
||
# Token similarity
|
||
tksim = self.token_similarity(atks, btkss)
|
||
|
||
# Weighted combination
|
||
combined = np.array(vsim) * vtweight + np.array(tksim) * tkweight
|
||
|
||
return combined, tksim, vsim
|
||
|
||
def token_similarity(self, query_tokens, doc_tokens_list):
|
||
"""
|
||
Calculate token overlap similarity.
|
||
|
||
Formula:
|
||
sim = |query ∩ doc| / |query|
|
||
"""
|
||
query_set = set(query_tokens)
|
||
|
||
sims = []
|
||
for doc_tokens in doc_tokens_list:
|
||
doc_set = set(doc_tokens)
|
||
overlap = len(query_set & doc_set)
|
||
sim = overlap / len(query_set) if query_set else 0
|
||
sims.append(sim)
|
||
|
||
return sims
|
||
```
|
||
|
||
## Final Ranking Formula
|
||
|
||
```python
|
||
# Complete reranking formula
|
||
Final_Rank = α × Token_Similarity + β × Vector_Similarity + γ × Rank_Features
|
||
|
||
# Where:
|
||
# α = 0.3 (token weight, configurable)
|
||
# β = 0.7 (vector weight, configurable)
|
||
# γ = variable (PageRank, tag boost)
|
||
|
||
# With rerank model:
|
||
Final_Score = 0.5 × Hybrid_Score + 0.5 × Rerank_Score
|
||
```
|
||
|
||
## Configuration
|
||
|
||
```python
|
||
RERANK_CFG = {
|
||
"factory": "Jina",
|
||
"api_key": os.getenv("JINA_API_KEY"),
|
||
"base_url": "https://api.jina.ai/v1/rerank",
|
||
"model": "jina-reranker-v2-base-multilingual"
|
||
}
|
||
|
||
# Search configuration
|
||
{
|
||
"rerank_model": "jina-reranker-v2", # Rerank model to use
|
||
"vector_similarity_weight": 0.7, # β weight
|
||
"top_n": 6, # Final results
|
||
"top_k": 1024, # Initial candidates
|
||
}
|
||
```
|
||
|
||
## Performance Considerations
|
||
|
||
### Latency
|
||
- Reranking adds 200-500ms latency
|
||
- Typically processes 50-100 candidates
|
||
|
||
### Batch Size
|
||
- Most models support batch processing
|
||
- Trade-off: larger batch = more memory, faster total time
|
||
|
||
### When to Use Reranking
|
||
- High-stakes queries requiring precision
|
||
- When initial retrieval quality is insufficient
|
||
- Cross-lingual retrieval scenarios
|
||
|
||
## Related Files
|
||
|
||
- `/rag/llm/rerank_model.py` - Rerank model implementations
|
||
- `/rag/nlp/search.py` - Reranking integration
|
||
- `/api/db/services/dialog_service.py` - Rerank model selection
|