ragflow

History

hsparks.codes d104f59e29 feat: Implement hierarchical retrieval architecture (#11610 ) This PR implements the complete three-tier hierarchical retrieval architecture as specified in issue #11610, enabling production-grade RAG capabilities. ## Tier 1: Knowledge Base Routing - Auto-route queries to relevant knowledge bases - Per-KB retrieval parameters (KBRetrievalParams dataclass) - Rule-based routing with keyword overlap scoring - LLM-based routing with fallback to rule-based - Configurable routing methods: auto, rule_based, llm_based, all ## Tier 2: Document Filtering - Document-level metadata filtering within selected KBs - Configurable metadata fields for filtering - LLM-generated filter conditions - Metadata similarity matching (fuzzy matching) - Enhanced metadata generation for documents ## Tier 3: Chunk Refinement - Parent-child chunking with summary mapping - Custom prompts for keyword extraction - LLM-based question generation for chunks - Integration with existing retrieval pipeline ## Metadata Management (Batch CRUD) - MetadataService with batch operations: - batch_get_metadata - batch_update_metadata - batch_delete_metadata_fields - batch_set_metadata_field - get_metadata_schema - search_by_metadata - get_metadata_statistics - copy_metadata - REST API endpoints in metadata_app.py ## Integration - HierarchicalConfig dataclass for configuration - Integrated into Dealer class (search.py) - Wired into agent retrieval tool - Non-breaking: disabled by default ## Tests - 48 unit tests covering all components - Tests for config, routing, filtering, and metadata operations		2025-12-09 07:32:00 +01:00
..
__init__.py	Import rag_tokenizer from Infinity (#11647 )	2025-12-02 14:59:37 +08:00
query.py	feat: add OceanBase doc engine (#11228 )	2025-11-20 10:00:14 +08:00
rag_tokenizer.py	Import rag_tokenizer from Infinity (#11647 )	2025-12-02 14:59:37 +08:00
search.py	feat: Implement hierarchical retrieval architecture (#11610 )	2025-12-09 07:32:00 +01:00
surname.py	Update info (#1005 )	2024-05-31 09:53:04 +08:00
synonym.py	Move 'get_project_base_directory' to common directory (#10940 )	2025-11-02 21:05:28 +08:00
term_weight.py	Refactor function name (#11210 )	2025-11-12 19:00:15 +08:00