hsparks.codes
d104f59e29
feat: Implement hierarchical retrieval architecture ( #11610 )
...
This PR implements the complete three-tier hierarchical retrieval architecture
as specified in issue #11610 , enabling production-grade RAG capabilities.
## Tier 1: Knowledge Base Routing
- Auto-route queries to relevant knowledge bases
- Per-KB retrieval parameters (KBRetrievalParams dataclass)
- Rule-based routing with keyword overlap scoring
- LLM-based routing with fallback to rule-based
- Configurable routing methods: auto, rule_based, llm_based, all
## Tier 2: Document Filtering
- Document-level metadata filtering within selected KBs
- Configurable metadata fields for filtering
- LLM-generated filter conditions
- Metadata similarity matching (fuzzy matching)
- Enhanced metadata generation for documents
## Tier 3: Chunk Refinement
- Parent-child chunking with summary mapping
- Custom prompts for keyword extraction
- LLM-based question generation for chunks
- Integration with existing retrieval pipeline
## Metadata Management (Batch CRUD)
- MetadataService with batch operations:
- batch_get_metadata
- batch_update_metadata
- batch_delete_metadata_fields
- batch_set_metadata_field
- get_metadata_schema
- search_by_metadata
- get_metadata_statistics
- copy_metadata
- REST API endpoints in metadata_app.py
## Integration
- HierarchicalConfig dataclass for configuration
- Integrated into Dealer class (search.py)
- Wired into agent retrieval tool
- Non-breaking: disabled by default
## Tests
- 48 unit tests covering all components
- Tests for config, routing, filtering, and metadata operations
2025-12-09 07:32:00 +01:00
Kevin Hu
b5ad7b7062
Feat: support TOC transformer. ( #11685 )
...
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-12-03 12:27:50 +08:00
Kevin Hu
820934fc77
Fix: no result if metadata returns none. ( #11412 )
...
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-20 19:51:25 +08:00
Kevin Hu
06cef71ba6
Feat: add or logic operations for meta data filters. ( #11404 )
...
### What problem does this PR solve?
#11376 #11387
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 14:31:12 +08:00
Yongteng Lei
9213568692
Feat: add mechanism to check cancellation in Agent ( #10766 )
...
### What problem does this PR solve?
Add mechanism to check cancellation in Agent.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-11-11 17:36:48 +08:00
buua436
83ff8e8009
Fix:update agent variable name rule ( #11124 )
...
### What problem does this PR solve?
change:
1. update agent variable name rule.
2. reset() in Canvas doesn't reset the env var.
3. correct log input binding in message component
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-11 11:18:30 +08:00
Jin Hai
f98b24c9bf
Move api.settings to common.settings ( #11036 )
...
### What problem does this PR solve?
As title
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-06 09:36:38 +08:00
Jin Hai
1a9215bc6f
Move some vars to globals ( #11017 )
...
### What problem does this PR solve?
As title.
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 14:14:38 +08:00
Jin Hai
bab3fce136
Move some constants to common ( #11004 )
...
### What problem does this PR solve?
As title.
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-05 08:01:39 +08:00
Jin Hai
1e45137284
Move 'timeout' to common folder ( #10983 )
...
### What problem does this PR solve?
As title.
### Type of change
- [x] Refactoring
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-11-04 11:51:12 +08:00
buua436
ac465ba2a6
Feat:add variables to the metadata filtering function of the knowledg… ( #10967 )
...
…e retrieval component.
### What problem does this PR solve?
issue:
#10861
change:
add variables to the metadata filtering function of the knowledge
retrieval component
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-11-03 19:19:09 +08:00
buua436
866098634b
Feat:setting metadata in the retrieval ( #10682 )
...
### What problem does this PR solve?
issue:
[#9272 ](https://github.com/infiniflow/ragflow/issues/9272 )
change:
setting metadata in the retrieval
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-10-21 09:52:26 +08:00
Kevin Hu
0d8791936e
Feat: TOC retrieval ( #10456 )
...
### What problem does this PR solve?
#10436
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-10-10 17:07:55 +08:00
Jin Hai
d931c33ced
Fix typos: retrievaler -> retriever ( #10372 )
...
### What problem does this PR solve?
Fix typos
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-10-10 09:17:36 +08:00
Yongteng Lei
daea357940
Fix: invalid COMPONENT_EXEC_TIMEOUT ( #10278 )
...
### What problem does this PR solve?
Fix invalid COMPONENT_EXEC_TIMEOUT. #10273
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-09-25 14:11:09 +08:00
Jin Hai
4eb7659499
Fix bug: broken import from rag.prompts.prompts ( #10217 )
...
### What problem does this PR solve?
Fix broken imports
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: jinhai <haijin.chn@gmail.com>
2025-09-23 10:19:25 +08:00
Wilmer
c8b79dfed4
The retrieval component needs to support returning JSON data( #10170 ) ( #10171 )
...
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-09-22 17:28:29 +08:00
湛露先生
6ff7cfe005
Fix bugs for agent/tools. ( #9930 )
...
### What problem does this PR solve?
1 Fix typos
2 Fix agent/tools/crawler.py return bug.
3 Fix agent/tools/deepl.py component_name bug.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
- [x] Performance Improvement
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-09-05 12:31:44 +08:00
天海蒼灆
ccb9f0b0d7
Feature (agent): Allow Retrieval kb_ids param use kb_id,and allow list kb_name or kb_id ( #9531 )
...
### What problem does this PR solve?
Allow Retrieval kb_ids param use kb_id,and allow list kb_name or kb_id。
- Add judgment on whether the knowledge base name is a list and support
batch queries
-When the knowledge base name does not exist, try using the ID for
querying
-If both query methods fail, throw an exception
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-08-19 09:42:39 +08:00
Kevin Hu
b6e34e3aa7
Fix: PyPDF's Manipulated FlateDecode streams can exhaust RAM ( #9469 )
...
### What problem does this PR solve?
#3951
#8463
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-14 13:45:19 +08:00
Kevin Hu
5749aa30b0
Fix: model type error. ( #9308 )
...
### What problem does this PR solve?
#9240
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-07 16:14:47 +08:00
Kevin Hu
3f6177b5e5
Feat: Add thought info to every component. ( #9134 )
...
### What problem does this PR solve?
#9082 #6365
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-07-31 15:13:45 +08:00
Kevin Hu
d9fe279dde
Feat: Redesign and refactor agent module ( #9113 )
...
### What problem does this PR solve?
#9082 #6365
<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00