Implements comprehensive batch metadata operations to make metadata
management easier for developers who previously could only edit metadata
one file at a time.
New API Endpoints:
1. POST /api/v1/document/batch_set_meta
- Update metadata for multiple documents at once
- Supports partial success (some docs succeed, others fail)
- Returns detailed per-document results
2. POST /api/v1/document/get_meta
- Retrieve metadata for a single document
- Returns doc ID, name, and metadata fields
3. POST /api/v1/document/batch_get_meta
- Retrieve metadata for multiple documents
- Returns metadata for all accessible documents
- Handles authorization and errors per document
4. POST /api/v1/document/list_metadata_fields
- List all unique metadata field names in a knowledge base
- Shows field types, example values, and usage count
- Helps discover existing metadata schema
Features:
- Batch operations reduce API calls and improve UX
- Proper authorization checks for each document
- Type validation (str, int, float only)
- Partial success handling (continues on errors)
- Metadata field discovery for KB-wide analysis
- Comprehensive error handling and reporting
Test Coverage:
✅ 10/10 unit tests passing
- Request validation
- Type checking
- Response structure validation
- Authorization logic
- Partial success handling
- JSON parsing
- Field type tracking
Benefits:
- Batch update 100s of documents in one API call
- Discover metadata schema across entire KB
- Better error handling with per-document results
- Maintains backward compatibility with existing /set_meta
Fixes#11564
### What problem does this PR solve?
Make RAGFlow more asynchronous 2. #11551, #11579, #11619.
### Type of change
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Make RAGFlow more asynchronous 2. #11551, #11579, #11619.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
- [x] Performance Improvement
### What problem does this PR solve?
Try to make this more asynchronous. Verified in chat and agent
scenarios, reducing blocking behavior. #11551, #11579.
However, the impact of these changes still requires further
investigation to ensure everything works as expected.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: extract message output to file
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix some IDE warnings
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
1. Update RetCode to common.constants
2. Decouple the admin and API modules
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Add get_uuid, download_img and hash_str2int into misc_utils.py
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: Make knowledge base renaming automatically reflected in agent
discussions, solved #10597
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Unexpected operation of document management.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix wrong chunk number while re-parsing document and keeping original
chunks
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Fix virtual file cannot be displayed in KB. #9265
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
list_document supports range filtering.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
doc_ids is a list , should use request.args.getlist("doc_ids")
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Correct cancel logic error
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
### What problem does this PR solve?
Change document status in bulk.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix chunk number error after re-parsing. #8503.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Correct boolean parsing for 'desc' parameter in document_app.py to
properly handle string values
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Replace hardcoded 255-byte file name length checks with
FILE_NAME_LEN_LIMIT constant
- Update error messages to show the actual limit value
- #8290
### Type of change
- [x] Refactoring
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
- Add validation for empty filenames in document_app.py and trim
whitespace
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Add filename length validation (<=255 bytes) for document
upload/rename in both HTTP and SDK APIs
- Update error messages for consistency
- Fix comparison operator in SDK from '>=' to '>' for filename length
check
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Try the best to repair corrupted PDF files on upload automatically.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix `filed_map` was incorrectly persisted. #7412
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Enhance capability of `list_docs`.
Breaking change: change method from `GET` to `POST`.
### Type of change
- [x] Refactoring
- [x] Enhancement with breaking change
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6905
When deleting a document will check before removing it from storage
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes#5923
Fixes the readonly variables from payload at
/datasets/<dataset_id>
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
Now if user tries to modify readonly values then it will show " The
input parameters are invalid. "
invalid_keys = {"id", "embd_id", "chunk_num", "doc_num", "parser_id",
"create_date", "create_time", "created_by",
"status","token_num","update_date","update_time"}
if any(key in req for key in invalid_keys):
return get_error_data_result(message="The input parameters are
invalid.")
i have include those readonly keys in invalid_keys
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Raghav <2020csb1115@iitrpr.ac.in>