ragflow/api/db/services
hsparks.codes d104f59e29 feat: Implement hierarchical retrieval architecture (#11610)
This PR implements the complete three-tier hierarchical retrieval architecture
as specified in issue #11610, enabling production-grade RAG capabilities.

## Tier 1: Knowledge Base Routing
- Auto-route queries to relevant knowledge bases
- Per-KB retrieval parameters (KBRetrievalParams dataclass)
- Rule-based routing with keyword overlap scoring
- LLM-based routing with fallback to rule-based
- Configurable routing methods: auto, rule_based, llm_based, all

## Tier 2: Document Filtering
- Document-level metadata filtering within selected KBs
- Configurable metadata fields for filtering
- LLM-generated filter conditions
- Metadata similarity matching (fuzzy matching)
- Enhanced metadata generation for documents

## Tier 3: Chunk Refinement
- Parent-child chunking with summary mapping
- Custom prompts for keyword extraction
- LLM-based question generation for chunks
- Integration with existing retrieval pipeline

## Metadata Management (Batch CRUD)
- MetadataService with batch operations:
  - batch_get_metadata
  - batch_update_metadata
  - batch_delete_metadata_fields
  - batch_set_metadata_field
  - get_metadata_schema
  - search_by_metadata
  - get_metadata_statistics
  - copy_metadata
- REST API endpoints in metadata_app.py

## Integration
- HierarchicalConfig dataclass for configuration
- Integrated into Dealer class (search.py)
- Wired into agent retrieval tool
- Non-breaking: disabled by default

## Tests
- 48 unit tests covering all components
- Tests for config, routing, filtering, and metadata operations
2025-12-09 07:32:00 +01:00
..
__init__.py Refactor: fix typos (#10200) 2025-09-25 12:05:43 +08:00
api_service.py Add time utils (#10849) 2025-10-28 19:09:14 +08:00
canvas_service.py Feat: add or logic operations for meta data filters. (#11404) 2025-11-20 14:31:12 +08:00
common_service.py Fix: add auto_parse to kb detail. (#11153) 2025-11-11 12:22:43 +08:00
connector_service.py feat: improve metadata handling in connector service (#11421) 2025-11-26 19:55:48 +08:00
conversation_service.py Refa: cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats (#11779) 2025-12-08 09:43:03 +08:00
dialog_service.py Refa: cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats (#11779) 2025-12-08 09:43:03 +08:00
document_service.py Refa: make RAGFlow more asynchronous 2 (#11689) 2025-12-03 14:19:53 +08:00
evaluation_service.py Refa: cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats (#11779) 2025-12-08 09:43:03 +08:00
file2document_service.py Move some constants to common (#11004) 2025-11-05 08:01:39 +08:00
file_service.py Refa: make RAGFlow more asynchronous (#11601) 2025-12-01 14:24:06 +08:00
knowledgebase_service.py Feat: Alter flask to Quart for async API serving. (#11275) 2025-11-18 17:05:16 +08:00
langfuse_service.py Add time utils (#10849) 2025-10-28 19:09:14 +08:00
llm_service.py Refa: cleanup synchronous functions in chat_model and implement synchronization for conversation and dialog chats (#11779) 2025-12-08 09:43:03 +08:00
mcp_server_service.py Fix typos: retrievaler -> retriever (#10372) 2025-10-10 09:17:36 +08:00
metadata_service.py feat: Implement hierarchical retrieval architecture (#11610) 2025-12-09 07:32:00 +01:00
pipeline_operation_log_service.py Feat: add data source to pipleline logs . (#11075) 2025-11-07 11:43:59 +08:00
search_service.py Move some constants to common (#11004) 2025-11-05 08:01:39 +08:00
task_service.py Move api.settings to common.settings (#11036) 2025-11-06 09:36:38 +08:00
tenant_llm_service.py Move api.settings to common.settings (#11036) 2025-11-06 09:36:38 +08:00
user_canvas_version.py Fix typos: retrievaler -> retriever (#10372) 2025-10-10 09:17:36 +08:00
user_service.py Move api.settings to common.settings (#11036) 2025-11-06 09:36:38 +08:00