14 KiB
Query Decomposition Retrieval
Overview
The Query Decomposition Retrieval component is an advanced retrieval system that automatically decomposes complex queries into simpler sub-questions, performs concurrent retrieval, and intelligently reranks results using LLM-based scoring combined with vector similarity.
This feature addresses a critical limitation in traditional RAG systems: handling complex, multi-faceted queries that require information from multiple sources or aspects.
Problem Statement
Current approaches to complex query handling have significant limitations:
1. Workflow-based Approach
- High Complexity: Users must manually assemble multiple components (LLM, loop, retriever) and design complex data flow logic
- Redundant Overhead: Each retrieval round requires independent serialization, deserialization, and network calls
- Poor User Experience: Requires deep technical expertise to set up
2. Agent-based Approach
- Slow Performance: Multiple LLM calls for thinking, tool selection, and execution make it inherently slow
- Unpredictable Behavior: Agents can be unstable, potentially leading to excessive retrieval rounds or loops
- Limited Control: Difficult to ensure deterministic, consistent behavior
Solution: Native Query Decomposition
The Query Decomposition Retrieval component integrates powerful query decomposition directly into the retrieval pipeline, offering:
- Simplified User Experience: One-click enable with customizable prompts - no workflow engineering required
- Enhanced Performance: Tight internal integration eliminates overhead and enables global optimization
- Better Results: Global chunk deduplication and reranking across sub-queries
- Deterministic Behavior: Explicit control over decomposition and scoring logic
Key Features
1. Automatic Query Decomposition
- Uses LLM to intelligently break down complex queries into 2-3 simpler sub-questions
- Each sub-question focuses on one specific aspect
- Configurable decomposition prompt with high-quality defaults
2. Concurrent Retrieval
- Retrieves chunks for all sub-queries in parallel
- Significantly faster than sequential processing
- Configurable concurrency control
3. Global Deduplication
- Identifies and removes duplicate chunks across all sub-query results
- Tracks which sub-queries retrieved each chunk
- Preserves the best scoring information for each unique chunk
4. LLM-based Relevance Scoring
- Uses LLM to judge each chunk's relevance to the original query
- Provides explainable scores with reasoning
- Scores normalized to 0.0-1.0 range
5. Score Fusion
- Combines LLM relevance scores with vector similarity scores
- Configurable fusion weight (e.g., 0.7 * LLM_score + 0.3 * vector_score)
- Balances semantic understanding with vector matching
6. Global Ranking
- All unique chunks ranked by fused score
- Returns top-N results from global ranking
- Better coverage and relevance than per-sub-query ranking
How It Works
Step 1: Query Decomposition
Input: "Compare machine learning and deep learning, and explain their applications in computer vision"
LLM Decomposition:
- "What is machine learning and what are its characteristics?"
- "What is deep learning and what are its characteristics?"
- "How are machine learning and deep learning used in computer vision?"
Step 2: Concurrent Retrieval
For each sub-question, perform vector retrieval:
- Sub-query 1 → Retrieves chunks about ML fundamentals
- Sub-query 2 → Retrieves chunks about DL fundamentals
- Sub-query 3 → Retrieves chunks about CV applications
All retrievals happen in parallel for maximum performance.
Step 3: Deduplication
If the same chunk appears in multiple sub-query results:
- Keep only one copy
- Track all sub-queries that retrieved it
- Average the vector similarity scores
Step 4: LLM Scoring
For each unique chunk:
- Call LLM with reranking prompt
- LLM judges: "How useful is this chunk for the original query?"
- Returns score 1-10 with reasoning
Step 5: Score Fusion
For each chunk:
final_score = fusion_weight * (LLM_score / 10) + (1 - fusion_weight) * vector_score
Example with fusion_weight=0.7:
LLM_score = 8/10 = 0.8
vector_score = 0.75
final_score = 0.7 * 0.8 + 0.3 * 0.75 = 0.56 + 0.225 = 0.785
Step 6: Global Ranking
- Sort all chunks by final_score (descending)
- Return top-N chunks
- Chunks are globally optimal, not just best per sub-query
Configuration
Basic Configuration
# In your agent workflow
retrieval = QueryDecompositionRetrieval()
# Enable query decomposition (default: True)
retrieval.enable_decomposition = True
# Maximum number of sub-queries (default: 3)
retrieval.max_decomposition_count = 3
# Number of final results (default: 8)
retrieval.top_n = 8
Advanced Configuration
# Score fusion weight (default: 0.7)
# Higher values trust LLM scores more, lower values trust vector similarity more
retrieval.score_fusion_weight = 0.7
# Enable concurrent retrieval (default: True)
retrieval.enable_concurrency = True
# Similarity threshold (default: 0.2)
retrieval.similarity_threshold = 0.2
# Vector vs keyword weight (default: 0.3)
retrieval.keywords_similarity_weight = 0.3
Custom Prompts
Decomposition Prompt
retrieval.decomposition_prompt = """You are a query decomposition expert.
Break down this query into {max_count} sub-questions:
{original_query}
Output as JSON array: ["sub-question 1", "sub-question 2"]
"""
Reranking Prompt
retrieval.reranking_prompt = """Rate this chunk's relevance (1-10):
Query: {query}
Chunk: {chunk_text}
Output JSON: {{"score": 8, "reason": "Contains key information"}}
"""
Usage Examples
Example 1: Simple Comparison Query
query = "Compare Python and JavaScript for web development"
# Decomposition produces:
# 1. "What are Python's strengths and use cases in web development?"
# 2. "What are JavaScript's strengths and use cases in web development?"
# 3. "What are the key differences between Python and JavaScript for web development?"
# Result: Comprehensive answer covering both languages and their comparison
Example 2: Multi-Aspect Research Query
query = "Explain the causes, key events, and consequences of World War II"
# Decomposition produces:
# 1. "What were the main causes that led to World War II?"
# 2. "What were the most significant events during World War II?"
# 3. "What were the major consequences and aftermath of World War II?"
# Result: Well-structured answer covering all three aspects
Example 3: Technical Deep-Dive Query
query = "How does BERT work and what are its applications in NLP?"
# Decomposition produces:
# 1. "What is BERT and how does its architecture work?"
# 2. "What are the main applications of BERT in natural language processing?"
# Result: Technical explanation plus practical applications
Configuration Examples
For High Precision (Trust LLM More)
retrieval.score_fusion_weight = 0.9 # 90% LLM, 10% vector
retrieval.similarity_threshold = 0.3 # Higher threshold
retrieval.top_n = 5 # Fewer, more precise results
For High Recall (Trust Vector More)
retrieval.score_fusion_weight = 0.3 # 30% LLM, 70% vector
retrieval.similarity_threshold = 0.1 # Lower threshold
retrieval.top_n = 15 # More results for coverage
For Balanced Performance
retrieval.score_fusion_weight = 0.7 # 70% LLM, 30% vector (default)
retrieval.similarity_threshold = 0.2 # Standard threshold (default)
retrieval.top_n = 8 # Standard result count (default)
Performance Comparison
| Approach | Setup Time | Query Time | Result Quality | Determinism |
|---|---|---|---|---|
| Manual Workflow | High (30+ min) | Medium (2-3s) | Depends on design | High |
| Agent-based | Medium (10 min) | High (5-10s) | Variable | Low |
| Query Decomposition | Low (1 min) | Low (1-2s) | High | High |
Performance Benefits
- Concurrent Execution: Sub-queries retrieved in parallel
- Single Deduplication Pass: No redundant processing
- Batch LLM Scoring: Efficient use of LLM calls
- Internal Optimization: No serialization/network overhead
Best Practices
1. When to Enable Decomposition
✅ Good for:
- Complex, multi-faceted queries
- Comparison questions ("Compare A and B")
- Multi-part questions ("Explain X, Y, and Z")
- Research queries requiring comprehensive coverage
❌ Not needed for:
- Simple factual queries ("What is X?")
- Single-concept lookups
- Very specific technical questions
2. Tuning Score Fusion Weight
- Start with default (0.7) for most use cases
- Increase to 0.8-0.9 if LLM is very good at judging relevance
- Decrease to 0.5-0.6 if vector similarity is highly reliable
- Monitor and adjust based on user feedback
3. Prompt Engineering Tips
Decomposition Prompt:
- Be explicit about number of sub-questions
- Emphasize non-redundancy
- Require JSON format for reliable parsing
- Keep it concise
Reranking Prompt:
- Use clear scoring scale (1-10 is intuitive)
- Request justification for explainability
- Emphasize direct vs indirect relevance
- Require strict JSON format
4. Monitoring and Debugging
The component adds metadata to results for debugging:
{
"chunk_id": "...",
"content": "...",
"llm_relevance_score": 0.8, # LLM score (0-1)
"vector_similarity_score": 0.75, # Vector score (0-1)
"final_fused_score": 0.785, # Fused score
"retrieved_by_sub_queries": ["sub-q-1", "sub-q-2"] # Which sub-queries found it
}
Troubleshooting
Issue: Decomposition Not Working
Symptoms: Always falling back to direct retrieval
Solutions:
- Check
enable_decompositionis True - Verify LLM is properly configured
- Review decomposition prompt format
- Check logs for LLM errors
Issue: Poor Sub-Question Quality
Symptoms: Sub-questions are too similar or off-topic
Solutions:
- Refine decomposition prompt
- Adjust
max_decomposition_count - Consider lowering temperature in LLM config
- Try different LLM models
Issue: Slow Performance
Symptoms: Queries taking too long
Solutions:
- Ensure
enable_concurrencyis True - Reduce
max_decomposition_count - Lower
top_kto reduce initial retrieval size - Consider faster LLM model for scoring
Issue: Unexpected Rankings
Symptoms: Results don't match expectations
Solutions:
- Review
score_fusion_weightsetting - Check
similarity_thresholdisn't too restrictive - Examine debugging metadata in results
- Refine reranking prompt for clarity
API Reference
Parameters
Core Settings
-
enable_decomposition (bool, default: True)
- Master toggle for decomposition feature
-
max_decomposition_count (int, default: 3)
- Maximum number of sub-queries to generate
- Range: 1-10
-
score_fusion_weight (float, default: 0.7)
- Weight for LLM score in final ranking
- Formula:
final = weight * llm + (1-weight) * vector - Range: 0.0-1.0
-
enable_concurrency (bool, default: True)
- Whether to retrieve sub-queries in parallel
Prompts
-
decomposition_prompt (str)
- Template for query decomposition
- Variables:
{original_query},{max_count}
-
reranking_prompt (str)
- Template for chunk relevance scoring
- Variables:
{query},{chunk_text}
Retrieval Settings
-
top_n (int, default: 8)
- Number of final results to return
-
top_k (int, default: 1024)
- Number of initial candidates per sub-query
-
similarity_threshold (float, default: 0.2)
- Minimum similarity score to include chunk
-
keywords_similarity_weight (float, default: 0.3)
- Weight of keyword matching vs vector similarity
Methods
_invoke(**kwargs)
Main execution method.
Args:
- query (str): User's input query
Returns:
- Sets "formalized_content" and "json" outputs
thoughts()
Returns description of processing for debugging.
Integration Examples
In Agent Workflow
from agent.tools.query_decomposition_retrieval import QueryDecompositionRetrieval
# Create component
retrieval = QueryDecompositionRetrieval()
# Configure
retrieval.enable_decomposition = True
retrieval.score_fusion_weight = 0.7
retrieval.kb_ids = ["kb1", "kb2"]
# Use in workflow
result = retrieval.invoke(query="Complex question here")
With Custom Configuration
# High-precision research mode
research_retrieval = QueryDecompositionRetrieval()
research_retrieval.score_fusion_weight = 0.9 # Trust LLM more
research_retrieval.max_decomposition_count = 4 # More sub-queries
research_retrieval.top_n = 10 # More results
# Fast response mode
fast_retrieval = QueryDecompositionRetrieval()
fast_retrieval.max_decomposition_count = 2 # Fewer sub-queries
fast_retrieval.enable_concurrency = True # Parallel processing
fast_retrieval.top_n = 5 # Fewer results
Future Enhancements
Potential improvements for future versions:
- Adaptive Decomposition: Automatically determine optimal number of sub-queries based on query complexity
- Hierarchical Decomposition: Support multi-level query decomposition for extremely complex queries
- Cross-Language Decomposition: Generate sub-queries in multiple languages
- Caching: Cache decomposition results for similar queries
- A/B Testing: Built-in support for comparing different fusion weights
- Batch Processing: Process multiple queries in parallel
- Streaming Results: Return results as they're scored, not all at once
Support
For issues or questions:
- GitHub Issues: https://github.com/infiniflow/ragflow/issues
- Documentation: https://ragflow.io/docs
- Community: Join our Discord/Slack
Contributing
We welcome contributions! Areas where you can help:
- Improving default prompts
- Adding support for more languages
- Performance optimizations
- Additional scoring algorithms
- UI enhancements
See Contributing Guide for details.
License
Copyright 2024 The InfiniFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0.