ragflow/rag
hsparks.codes 272534df64 feat: Complete implementation of hierarchical retrieval architecture
Implements full three-tier retrieval system with RAGFlow integration.

Changes:
- Complete Tier 1: KB routing with rule-based, LLM-based, and auto modes
- Complete Tier 2: Document filtering with metadata support
- Complete Tier 3: Chunk refinement with vector search integration
- Integration with RAGFlow's Dealer and search infrastructure
- Add hierarchical_retrieval_config field to Dialog model
- Database migration for configuration storage
- 29 passing unit tests (6 skipped due to NLTK environment dependency)

Implementation Details:
- HierarchicalRetrieval: Main orchestrator with RAGFlow integration
- KBRouter: Standalone router using keyword matching
- DocumentFilter: Metadata-based filtering
- ChunkRefiner: Vector search integration via rag.nlp.search.Dealer
- Rule-based routing uses token overlap scoring
- Auto routing analyzes query characteristics
- Tier 3 integrates with existing DocStoreConnection and embedding models

Test Results:
 29/29 tests passing
- All tier tests working
- Integration scenarios validated
- Config and result dataclasses tested
- Edge cases handled

Addresses owner feedback: Complete implementation rather than skeleton.

Related to #11610
2025-12-03 12:03:42 +01:00
..
app feat: improve presentation PdfParser (#11639) 2025-12-02 17:35:14 +08:00
flow Feat: support TOC transformer. (#11685) 2025-12-03 12:27:50 +08:00
llm Refa: make RAGFlow more asynchronous 2 (#11689) 2025-12-03 14:19:53 +08:00
nlp Import rag_tokenizer from Infinity (#11647) 2025-12-02 14:59:37 +08:00
prompts Refa: make RAGFlow more asynchronous 2 (#11689) 2025-12-03 14:19:53 +08:00
res Fix: prio synonym match than wordnet for english (#10762) 2025-10-27 09:32:55 +08:00
retrieval feat: Complete implementation of hierarchical retrieval architecture 2025-12-03 12:03:42 +01:00
svr feat: Auto-disable Raptor for structured data (Issue #11653) (#11676) 2025-12-03 17:02:29 +08:00
utils feat: Auto-disable Raptor for structured data (Issue #11653) (#11676) 2025-12-03 17:02:29 +08:00
__init__.py Fix: incorrect async chat streamly output (#11679) 2025-12-03 11:15:45 +08:00
benchmark.py Move api.settings to common.settings (#11036) 2025-11-06 09:36:38 +08:00
raptor.py Feat: add fault-tolerant mechanism to RAPTOR (#11206) 2025-11-13 18:48:07 +08:00
settings.py Move api.settings to common.settings (#11036) 2025-11-06 09:36:38 +08:00