ragflow

History

hsparks.codes d104f59e29 feat: Implement hierarchical retrieval architecture (#11610 ) This PR implements the complete three-tier hierarchical retrieval architecture as specified in issue #11610, enabling production-grade RAG capabilities. ## Tier 1: Knowledge Base Routing - Auto-route queries to relevant knowledge bases - Per-KB retrieval parameters (KBRetrievalParams dataclass) - Rule-based routing with keyword overlap scoring - LLM-based routing with fallback to rule-based - Configurable routing methods: auto, rule_based, llm_based, all ## Tier 2: Document Filtering - Document-level metadata filtering within selected KBs - Configurable metadata fields for filtering - LLM-generated filter conditions - Metadata similarity matching (fuzzy matching) - Enhanced metadata generation for documents ## Tier 3: Chunk Refinement - Parent-child chunking with summary mapping - Custom prompts for keyword extraction - LLM-based question generation for chunks - Integration with existing retrieval pipeline ## Metadata Management (Batch CRUD) - MetadataService with batch operations: - batch_get_metadata - batch_update_metadata - batch_delete_metadata_fields - batch_set_metadata_field - get_metadata_schema - search_by_metadata - get_metadata_statistics - copy_metadata - REST API endpoints in metadata_app.py ## Integration - HierarchicalConfig dataclass for configuration - Integrated into Dealer class (search.py) - Wired into agent retrieval tool - Non-breaking: disabled by default ## Tests - 48 unit tests covering all components - Tests for config, routing, filtering, and metadata operations		2025-12-09 07:32:00 +01:00
..
app	Refa: migrate CV model chat to Async (#11828 )	2025-12-09 13:08:37 +08:00
flow	Refa: migrate CV model chat to Async (#11828 )	2025-12-09 13:08:37 +08:00
llm	Refa: migrate CV model chat to Async (#11828 )	2025-12-09 13:08:37 +08:00
nlp	feat: Implement hierarchical retrieval architecture (#11610 )	2025-12-09 07:32:00 +01:00
prompts	Fix:[ERROR][Exception]: list index out of range (#11826 )	2025-12-09 09:58:34 +08:00
res	Remove huqie.txt from RAGFflow and bump infinity to 0.6.10 (#11661 )	2025-12-04 14:53:57 +08:00
svr	Fix: parent-child chunking method (#11810 )	2025-12-09 09:34:01 +08:00
utils	feat(gcs): Add support for Google Cloud Storage (GCS) integration (#11718 )	2025-12-04 10:44:05 +08:00
__init__.py	Fix: incorrect async chat streamly output (#11679 )	2025-12-03 11:15:45 +08:00
benchmark.py	Move api.settings to common.settings (#11036 )	2025-11-06 09:36:38 +08:00
raptor.py	Feat: add fault-tolerant mechanism to RAPTOR (#11206 )	2025-11-13 18:48:07 +08:00
settings.py	Move api.settings to common.settings (#11036 )	2025-11-06 09:36:38 +08:00