Add comprehensive documentation covering 6 modules: - 01-API-LAYER: Authentication, routing, SSE streaming - 02-SERVICE-LAYER: Dialog, Task, LLM service analysis - 03-RAG-ENGINE: Hybrid search, embedding, reranking - 04-AGENT-SYSTEM: Canvas engine, components, tools - 05-DOCUMENT-PROCESSING: Task executor, PDF parsing - 06-ALGORITHMS: BM25, fusion, RAPTOR Total 28 documentation files with code analysis, diagrams, and formulas.
9.6 KiB
9.6 KiB
Rerank Algorithm
Tong Quan
Reranking sử dụng cross-encoder models để re-score và sắp xếp lại search results dựa trên query-document relevance.
File Location
/rag/llm/rerank_model.py
/rag/nlp/search.py (rerank_by_model method)
Reranking Flow
┌─────────────────────────────────────────────────────────────────┐
│ INITIAL SEARCH RESULTS │
│ Top 1024 candidates from hybrid search │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CROSS-ENCODER RERANKING │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ For each (query, document) pair: │ │
│ │ score = CrossEncoder(query, document) │ │
│ └─────────────────────────────────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ SCORE FUSION │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ final_score = α × token_sim + β × vector_sim + γ × rank │ │
│ │ where α=0.3, β=0.7, γ=variable │ │
│ └─────────────────────────────────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ TOP-N RESULTS │
│ Return top 6 (default) highest scoring documents │
└─────────────────────────────────────────────────────────────────┘
Supported Rerank Models
| Provider | Class | Notes |
|---|---|---|
| Jina | JinaRerank |
multilingual |
| Cohere | CoHereRerank |
Native SDK |
| NVIDIA | NvidiaRerank |
Model-specific URLs |
| Voyage AI | VoyageRerank |
Token counting |
| Qwen | QWenRerank |
Dashscope |
| BGE | HuggingfaceRerank |
TEI HTTP |
| LocalAI | LocalAIRerank |
Custom normalization |
| SILICONFLOW | SILICONFLOWRerank |
Chunk config |
Base Implementation
class Base(ABC):
def similarity(self, query: str, texts: list) -> tuple[np.ndarray, int]:
"""
Calculate relevance scores for query-document pairs.
Args:
query: Search query
texts: List of document texts
Returns:
(scores, token_count): Array of relevance scores and tokens used
"""
raise NotImplementedError()
Jina Rerank
class JinaRerank(Base):
def __init__(self, key, model_name, base_url=None):
self.headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {key}"
}
self.base_url = base_url or "https://api.jina.ai/v1/rerank"
self.model_name = model_name
def similarity(self, query: str, texts: list):
texts = [truncate(t, 8196) for t in texts]
data = {
"model": self.model_name,
"query": query,
"documents": texts,
"top_n": len(texts)
}
res = requests.post(self.base_url, headers=self.headers, json=data).json()
rank = np.zeros(len(texts), dtype=float)
for d in res["results"]:
rank[d["index"]] = d["relevance_score"]
return rank, total_token_count_from_response(res)
LocalAI Rerank with Normalization
class LocalAIRerank(Base):
def similarity(self, query: str, texts: list):
# ... API call ...
# Normalize scores to [0, 1] range
min_rank = np.min(rank)
max_rank = np.max(rank)
if not np.isclose(min_rank, max_rank, atol=1e-3):
rank = (rank - min_rank) / (max_rank - min_rank)
else:
rank = np.zeros_like(rank)
return rank, token_count
Rerank Integration in Search
# In search.py - rerank_by_model()
def rerank_by_model(self, rerank_mdl, sres, question,
tkweight=0.3, vtweight=0.7, rank_feature=None):
"""
Rerank search results using cross-encoder model.
Args:
rerank_mdl: Reranking model instance
sres: Search results with content
question: Original query
tkweight: Token similarity weight (default 0.3)
vtweight: Vector similarity weight (default 0.7)
rank_feature: Optional PageRank scores
Returns:
(combined_sim, token_sim, vector_sim): Score arrays
"""
# Extract content for reranking
contents = [sres.field[id]["content_with_weight"] for id in sres.ids]
# Call rerank model
rank_scores, token_count = rerank_mdl.similarity(question, contents)
# Get original similarities
tksim = [sres.field[id].get("term_sim", 0) for id in sres.ids]
vsim = [sres.field[id].get("vector_sim", 0) for id in sres.ids]
# Weighted combination
combined = []
for i, id in enumerate(sres.ids):
score = tkweight * tksim[i] + vtweight * vsim[i]
# Add rank feature (PageRank) if available
if rank_feature and id in rank_feature:
score *= (1 + rank_feature[id])
# Incorporate rerank score
score = score * 0.5 + rank_scores[i] * 0.5
combined.append(score)
return np.array(combined), tksim, vsim
Hybrid Similarity (Without Rerank Model)
def hybrid_similarity(self, avec, bvecs, atks, btkss, tkweight=0.3, vtweight=0.7):
"""
Calculate hybrid similarity without rerank model.
Uses:
- Cosine similarity for vectors
- Token overlap for text matching
"""
from sklearn.metrics.pairwise import cosine_similarity
# Vector similarity
vsim = cosine_similarity([avec], bvecs)[0]
# Token similarity
tksim = self.token_similarity(atks, btkss)
# Weighted combination
combined = np.array(vsim) * vtweight + np.array(tksim) * tkweight
return combined, tksim, vsim
def token_similarity(self, query_tokens, doc_tokens_list):
"""
Calculate token overlap similarity.
Formula:
sim = |query ∩ doc| / |query|
"""
query_set = set(query_tokens)
sims = []
for doc_tokens in doc_tokens_list:
doc_set = set(doc_tokens)
overlap = len(query_set & doc_set)
sim = overlap / len(query_set) if query_set else 0
sims.append(sim)
return sims
Final Ranking Formula
# Complete reranking formula
Final_Rank = α × Token_Similarity + β × Vector_Similarity + γ × Rank_Features
# Where:
# α = 0.3 (token weight, configurable)
# β = 0.7 (vector weight, configurable)
# γ = variable (PageRank, tag boost)
# With rerank model:
Final_Score = 0.5 × Hybrid_Score + 0.5 × Rerank_Score
Configuration
RERANK_CFG = {
"factory": "Jina",
"api_key": os.getenv("JINA_API_KEY"),
"base_url": "https://api.jina.ai/v1/rerank",
"model": "jina-reranker-v2-base-multilingual"
}
# Search configuration
{
"rerank_model": "jina-reranker-v2", # Rerank model to use
"vector_similarity_weight": 0.7, # β weight
"top_n": 6, # Final results
"top_k": 1024, # Initial candidates
}
Performance Considerations
Latency
- Reranking adds 200-500ms latency
- Typically processes 50-100 candidates
Batch Size
- Most models support batch processing
- Trade-off: larger batch = more memory, faster total time
When to Use Reranking
- High-stakes queries requiring precision
- When initial retrieval quality is insufficient
- Cross-lingual retrieval scenarios
Related Files
/rag/llm/rerank_model.py- Rerank model implementations/rag/nlp/search.py- Reranking integration/api/db/services/dialog_service.py- Rerank model selection