- Add document_service_analysis.md: comprehensive analysis of document lifecycle management including insert, remove, parse, progress tracking - Add knowledgebase_service_analysis.md: dataset management and access control analysis with permission model, parser configuration
681 lines
24 KiB
Markdown
681 lines
24 KiB
Markdown
# Knowledgebase Service Analysis - Dataset Management & Access Control
|
|
|
|
## Tổng Quan
|
|
|
|
`knowledgebase_service.py` (566 lines) quản lý **Dataset (Knowledgebase)** - đơn vị tổ chức tài liệu trong RAGFlow, bao gồm CRUD operations, access control, parser configuration, và document association tracking.
|
|
|
|
## File Location
|
|
```
|
|
/api/db/services/knowledgebase_service.py
|
|
```
|
|
|
|
## Class Definition
|
|
|
|
```python
|
|
class KnowledgebaseService(CommonService):
|
|
model = Knowledgebase # Line 49
|
|
```
|
|
|
|
Kế thừa `CommonService` với các method cơ bản: `query()`, `get_by_id()`, `save()`, `update_by_id()`, `delete_by_id()`
|
|
|
|
---
|
|
|
|
## Knowledgebase Model Structure
|
|
|
|
```python
|
|
# From db_models.py (Lines 734-753)
|
|
|
|
class Knowledgebase(DataBaseModel):
|
|
id = CharField(max_length=32, primary_key=True)
|
|
avatar = TextField(null=True) # KB avatar (base64)
|
|
tenant_id = CharField(max_length=32, index=True) # Owner tenant
|
|
name = CharField(max_length=128, index=True) # KB name
|
|
language = CharField(max_length=32) # "English"|"Chinese"
|
|
description = TextField(null=True) # KB description
|
|
embd_id = CharField(max_length=128) # Embedding model ID
|
|
permission = CharField(max_length=16) # "me"|"team"
|
|
created_by = CharField(max_length=32) # Creator user ID
|
|
|
|
# Statistics
|
|
doc_num = IntegerField(default=0) # Document count
|
|
token_num = IntegerField(default=0) # Total tokens
|
|
chunk_num = IntegerField(default=0) # Total chunks
|
|
|
|
# Search config
|
|
similarity_threshold = FloatField(default=0.2)
|
|
vector_similarity_weight = FloatField(default=0.3)
|
|
|
|
# Parser config
|
|
parser_id = CharField(default="naive") # Default parser
|
|
pipeline_id = CharField(null=True) # Pipeline workflow ID
|
|
parser_config = JSONField(default={"pages": [[1, 1000000]]})
|
|
pagerank = IntegerField(default=0)
|
|
```
|
|
|
|
---
|
|
|
|
## Permission Model
|
|
|
|
### Dual-Level Access Control
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ PERMISSION MODEL │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
Level 1: Knowledgebase.permission
|
|
│
|
|
├─── "me" ───► Only owner (created_by) can access
|
|
│
|
|
└─── "team" ──► All users in owner's tenant can access
|
|
|
|
Level 2: UserTenant relationship
|
|
│
|
|
└─── User must belong to KB's tenant to access
|
|
|
|
Combined Check (get_by_tenant_ids):
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ ((tenant_id IN joined_tenants) AND (permission == "team")) │
|
|
│ OR │
|
|
│ (tenant_id == user_id) │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Permission Methods
|
|
|
|
| Method | Lines | Purpose |
|
|
|--------|-------|---------|
|
|
| `accessible` | 471-486 | Check if user can VIEW KB |
|
|
| `accessible4deletion` | 53-83 | Check if user can DELETE KB |
|
|
| `get_by_tenant_ids` | 134-197 | Get KBs with permission filter |
|
|
| `get_kb_by_id` | 488-500 | Get KB by ID + user permission |
|
|
|
|
---
|
|
|
|
## Core Methods
|
|
|
|
### 1. Create Knowledgebase
|
|
|
|
**Lines**: 374-429
|
|
|
|
```python
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def create_with_name(
|
|
cls,
|
|
*,
|
|
name: str,
|
|
tenant_id: str,
|
|
parser_id: str | None = None,
|
|
**kwargs
|
|
):
|
|
"""
|
|
Create a dataset with validation and defaults.
|
|
|
|
Validation Steps:
|
|
1. Name must be string
|
|
2. Name cannot be empty
|
|
3. Name cannot exceed DATASET_NAME_LIMIT bytes (UTF-8)
|
|
4. Deduplicate name within tenant (append _1, _2, etc.)
|
|
5. Verify tenant exists
|
|
|
|
Returns:
|
|
(True, payload_dict) on success
|
|
(False, error_result) on failure
|
|
"""
|
|
```
|
|
|
|
**Creation Flow**:
|
|
|
|
```
|
|
create_with_name(name, tenant_id, ...)
|
|
│
|
|
├──► Validate name type
|
|
│ └── Must be string
|
|
│
|
|
├──► Validate name content
|
|
│ ├── Strip whitespace
|
|
│ ├── Check not empty
|
|
│ └── Check UTF-8 byte length
|
|
│
|
|
├──► duplicate_name(query, name, tenant_id, status)
|
|
│ └── Returns unique name: "name", "name_1", "name_2"...
|
|
│
|
|
├──► TenantService.get_by_id(tenant_id)
|
|
│ └── Verify tenant exists
|
|
│
|
|
└──► Build payload dict
|
|
├── id: get_uuid()
|
|
├── name: deduplicated_name
|
|
├── tenant_id: tenant_id
|
|
├── created_by: tenant_id
|
|
├── parser_id: parser_id or "naive"
|
|
└── parser_config: get_parser_config(parser_id, config)
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Get Knowledgebases by Tenant
|
|
|
|
**Lines**: 134-197
|
|
|
|
```python
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def get_by_tenant_ids(cls, joined_tenant_ids, user_id,
|
|
page_number, items_per_page,
|
|
orderby, desc, keywords,
|
|
parser_id=None):
|
|
"""
|
|
Get knowledge bases accessible to user with pagination.
|
|
|
|
Permission Logic:
|
|
- Include team KBs from joined tenants
|
|
- Include private KBs owned by user
|
|
|
|
Filters:
|
|
- keywords: Case-insensitive name search
|
|
- parser_id: Filter by parser type
|
|
|
|
Joins:
|
|
- User: Get owner nickname and avatar
|
|
|
|
Returns:
|
|
(list[dict], total_count)
|
|
"""
|
|
```
|
|
|
|
**Query Structure**:
|
|
|
|
```sql
|
|
SELECT
|
|
kb.id, kb.avatar, kb.name, kb.language, kb.description,
|
|
kb.tenant_id, kb.permission, kb.doc_num, kb.token_num,
|
|
kb.chunk_num, kb.parser_id, kb.embd_id,
|
|
user.nickname, user.avatar AS tenant_avatar,
|
|
kb.update_time
|
|
FROM knowledgebase kb
|
|
JOIN user ON kb.tenant_id = user.id
|
|
WHERE
|
|
((kb.tenant_id IN (?, ?, ...) AND kb.permission = 'team')
|
|
OR kb.tenant_id = ?)
|
|
AND kb.status = '1'
|
|
AND LOWER(kb.name) LIKE '%keyword%' -- if keywords
|
|
AND kb.parser_id = ? -- if parser_id
|
|
ORDER BY kb.{orderby} DESC/ASC
|
|
LIMIT ? OFFSET ?
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Get Knowledgebase Detail
|
|
|
|
**Lines**: 250-292
|
|
|
|
```python
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def get_detail(cls, kb_id):
|
|
"""
|
|
Get comprehensive KB information including pipeline details.
|
|
|
|
Joins:
|
|
- UserCanvas (LEFT): Get pipeline name and avatar
|
|
|
|
Fields included:
|
|
- Basic: id, avatar, name, language, description
|
|
- Config: parser_id, parser_config, embd_id
|
|
- Stats: doc_num, token_num, chunk_num
|
|
- Pipeline: pipeline_id, pipeline_name, pipeline_avatar
|
|
- GraphRAG: graphrag_task_id, graphrag_task_finish_at
|
|
- RAPTOR: raptor_task_id, raptor_task_finish_at
|
|
- MindMap: mindmap_task_id, mindmap_task_finish_at
|
|
- Timestamps: create_time, update_time
|
|
|
|
Returns:
|
|
dict or None if not found
|
|
"""
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Check Parsing Status
|
|
|
|
**Lines**: 85-117
|
|
|
|
```python
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def is_parsed_done(cls, kb_id):
|
|
"""
|
|
Verify all documents in KB are ready for chat.
|
|
|
|
Validation Rules:
|
|
1. KB must exist
|
|
2. No documents in RUNNING/CANCEL/FAIL state
|
|
3. No documents UNSTART with zero chunks
|
|
|
|
Returns:
|
|
(True, None) - All parsed
|
|
(False, error_message) - Not ready
|
|
|
|
Used by:
|
|
Chat creation validation
|
|
"""
|
|
```
|
|
|
|
**Status Check Flow**:
|
|
|
|
```
|
|
is_parsed_done(kb_id)
|
|
│
|
|
├──► cls.query(id=kb_id)
|
|
│ └── Get KB info
|
|
│
|
|
├──► DocumentService.get_by_kb_id(kb_id, ...)
|
|
│ └── Get all documents (up to 1000)
|
|
│
|
|
└──► For each document:
|
|
│
|
|
├─── run == RUNNING ───► Return (False, "still being parsed")
|
|
├─── run == CANCEL ───► Return (False, "still being parsed")
|
|
├─── run == FAIL ───► Return (False, "still being parsed")
|
|
└─── run == UNSTART
|
|
└── chunk_num == 0 ──► Return (False, "has not been parsed")
|
|
|
|
└──► Return (True, None)
|
|
```
|
|
|
|
---
|
|
|
|
### 5. Parser Configuration Management
|
|
|
|
**Lines**: 294-345
|
|
|
|
```python
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def update_parser_config(cls, id, config):
|
|
"""
|
|
Deep merge parser configuration.
|
|
|
|
Algorithm (dfs_update):
|
|
- For dict values: recursively merge
|
|
- For list values: union (set merge)
|
|
- For scalar values: replace
|
|
|
|
Example:
|
|
old = {"pages": [[1, 100]], "ocr": True}
|
|
new = {"pages": [[101, 200]], "language": "en"}
|
|
result = {"pages": [[1, 100], [101, 200]], "ocr": True, "language": "en"}
|
|
"""
|
|
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def delete_field_map(cls, id):
|
|
"""Remove field_map key from parser_config."""
|
|
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def get_field_map(cls, ids):
|
|
"""
|
|
Aggregate field mappings from multiple KBs.
|
|
|
|
Used by: TABLE parser for column mapping
|
|
"""
|
|
```
|
|
|
|
**Deep Merge Algorithm**:
|
|
|
|
```python
|
|
def dfs_update(old, new):
|
|
for k, v in new.items():
|
|
if k not in old:
|
|
old[k] = v # Add new key
|
|
elif isinstance(v, dict):
|
|
dfs_update(old[k], v) # Recursive merge
|
|
elif isinstance(v, list):
|
|
old[k] = list(set(old[k] + v)) # Union lists
|
|
else:
|
|
old[k] = v # Replace value
|
|
```
|
|
|
|
---
|
|
|
|
### 6. Document Statistics Management
|
|
|
|
**Lines**: 516-565
|
|
|
|
```python
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def atomic_increase_doc_num_by_id(cls, kb_id):
|
|
"""
|
|
Atomically increment doc_num by 1.
|
|
Called when: DocumentService.insert()
|
|
|
|
SQL: UPDATE knowledgebase SET doc_num = doc_num + 1 WHERE id = ?
|
|
"""
|
|
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def decrease_document_num_in_delete(cls, kb_id, doc_num_info: dict):
|
|
"""
|
|
Decrease statistics when documents are deleted.
|
|
|
|
doc_num_info = {
|
|
'doc_num': number of docs deleted,
|
|
'chunk_num': total chunks deleted,
|
|
'token_num': total tokens deleted
|
|
}
|
|
|
|
SQL:
|
|
UPDATE knowledgebase SET
|
|
doc_num = doc_num - ?,
|
|
chunk_num = chunk_num - ?,
|
|
token_num = token_num - ?,
|
|
update_time = ?
|
|
WHERE id = ?
|
|
"""
|
|
```
|
|
|
|
**Statistics Flow**:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ STATISTICS TRACKING │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
[Document Insert]
|
|
│
|
|
└──► KnowledgebaseService.atomic_increase_doc_num_by_id(kb_id)
|
|
└── kb.doc_num += 1
|
|
|
|
[Chunk Processing]
|
|
│
|
|
└──► DocumentService.increment_chunk_num(doc_id, kb_id, tokens, chunks, ...)
|
|
├── doc.chunk_num += chunks
|
|
├── doc.token_num += tokens
|
|
├── kb.chunk_num += chunks
|
|
└── kb.token_num += tokens
|
|
|
|
[Document Delete]
|
|
│
|
|
└──► KnowledgebaseService.decrease_document_num_in_delete(kb_id, info)
|
|
├── kb.doc_num -= info['doc_num']
|
|
├── kb.chunk_num -= info['chunk_num']
|
|
└── kb.token_num -= info['token_num']
|
|
```
|
|
|
|
---
|
|
|
|
### 7. Access Control Methods
|
|
|
|
**Lines**: 471-514
|
|
|
|
```python
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def accessible(cls, kb_id, user_id):
|
|
"""
|
|
Check if user can access (view) KB.
|
|
|
|
Logic: User must belong to KB's tenant via UserTenant table.
|
|
|
|
SQL:
|
|
SELECT kb.id
|
|
FROM knowledgebase kb
|
|
JOIN user_tenant ON user_tenant.tenant_id = kb.tenant_id
|
|
WHERE kb.id = ? AND user_tenant.user_id = ?
|
|
"""
|
|
|
|
@classmethod
|
|
@DB.connection_context()
|
|
def accessible4deletion(cls, kb_id, user_id):
|
|
"""
|
|
Check if user can delete KB.
|
|
|
|
Logic: User must be the CREATOR of the KB.
|
|
|
|
SQL:
|
|
SELECT kb.id
|
|
FROM knowledgebase kb
|
|
WHERE kb.id = ? AND kb.created_by = ?
|
|
"""
|
|
```
|
|
|
|
**Access Control Diagram**:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ ACCESS CONTROL CHECKS │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
VIEW Access (accessible):
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ User │
|
|
│ │ │
|
|
│ ┌──────────────┴──────────────┐ │
|
|
│ ▼ ▼ │
|
|
│ UserTenant Knowledgebase │
|
|
│ (user_id) (tenant_id) │
|
|
│ │ │ │
|
|
│ └──────────┬──────────────────┘ │
|
|
│ ▼ │
|
|
│ tenant_id MATCH? │
|
|
│ │ │
|
|
│ ┌────────┴────────┐ │
|
|
│ Yes No │
|
|
│ │ │ │
|
|
│ ALLOWED DENIED │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
|
|
DELETE Access (accessible4deletion):
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ User │
|
|
│ (user_id) │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ Knowledgebase │
|
|
│ (created_by) │
|
|
│ │ │
|
|
│ user_id == created_by? │
|
|
│ │ │
|
|
│ ┌────────────┴────────────┐ │
|
|
│ Yes No │
|
|
│ │ │ │
|
|
│ ALLOWED DENIED │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Service Interactions
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ SERVICE INTERACTION DIAGRAM │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
KnowledgebaseService
|
|
│
|
|
┌───────────────────────┼───────────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
|
│ Document │ │ Tenant │ │ User │
|
|
│ Service │ │ Service │ │ Service │
|
|
│ │ │ │ │ │
|
|
│ • get_by_kb_id│ │ • get_by_id │ │ • get profile │
|
|
│ • insert │ │ (validate │ │ info for │
|
|
│ (→ atomic │ │ tenant) │ │ joins │
|
|
│ _increase) │ │ │ │ │
|
|
└───────────────┘ └───────────────┘ └───────────────┘
|
|
|
|
API Layer Callers:
|
|
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
|
│ kb_app.py │ │ dialog_app.py │ │ RESTful API │
|
|
│ │ │ │ │ │
|
|
│ • create │ │ • is_parsed │ │ • create_ │
|
|
│ • update │ │ _done check │ │ with_name │
|
|
│ • delete │ │ before chat │ │ │
|
|
│ • list │ │ │ │ │
|
|
└───────────────┘ └───────────────┘ └───────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Key Method Reference Table
|
|
|
|
| Category | Method | Lines | Purpose |
|
|
|----------|--------|-------|---------|
|
|
| **Query** | `get_by_tenant_ids` | 134-197 | Paginated list with permissions |
|
|
| **Query** | `get_all_kb_by_tenant_ids` | 199-233 | Get all KBs (batch pagination) |
|
|
| **Query** | `get_kb_ids` | 235-248 | Get KB IDs for tenant |
|
|
| **Query** | `get_detail` | 250-292 | Comprehensive KB info |
|
|
| **Query** | `get_by_name` | 347-363 | Get by name + tenant |
|
|
| **Query** | `get_list` | 432-469 | Filtered paginated list |
|
|
| **Query** | `get_all_ids` | 365-371 | Get all KB IDs |
|
|
| **Create** | `create_with_name` | 374-429 | Validated creation |
|
|
| **Access** | `accessible` | 471-486 | View permission check |
|
|
| **Access** | `accessible4deletion` | 53-83 | Delete permission check |
|
|
| **Access** | `get_kb_by_id` | 488-500 | Get with user permission |
|
|
| **Access** | `get_kb_by_name` | 502-514 | Get by name with permission |
|
|
| **Config** | `update_parser_config` | 294-321 | Deep merge config |
|
|
| **Config** | `delete_field_map` | 323-331 | Remove field map |
|
|
| **Config** | `get_field_map` | 333-345 | Get field mappings |
|
|
| **Status** | `is_parsed_done` | 85-117 | Check parsing complete |
|
|
| **Stats** | `atomic_increase_doc_num_by_id` | 516-524 | Increment doc count |
|
|
| **Stats** | `decrease_document_num_in_delete` | 552-565 | Decrease on delete |
|
|
| **Stats** | `update_document_number_in_init` | 526-550 | Init doc count |
|
|
| **Docs** | `list_documents_by_ids` | 119-132 | Get doc IDs for KBs |
|
|
|
|
---
|
|
|
|
## API Endpoints Mapping
|
|
|
|
| HTTP Method | Endpoint | Service Method |
|
|
|-------------|----------|----------------|
|
|
| `POST` | `/v1/kb/create` | `create_with_name()` |
|
|
| `GET` | `/v1/kb/list` | `get_by_tenant_ids()` |
|
|
| `GET` | `/v1/kb/detail` | `get_detail()` |
|
|
| `PUT` | `/v1/kb/{kb_id}` | `update_by_id()` (inherited) |
|
|
| `DELETE` | `/v1/kb/{kb_id}` | `delete_by_id()` + cleanup |
|
|
| `PUT` | `/v1/kb/{kb_id}/config` | `update_parser_config()` |
|
|
|
|
---
|
|
|
|
## Parser Configuration Schema
|
|
|
|
```python
|
|
parser_config = {
|
|
# Page range for PDF parsing
|
|
"pages": [[1, 1000000]], # Default: all pages
|
|
|
|
# OCR settings
|
|
"ocr": True,
|
|
"ocr_model": "tesseract", # or "paddleocr"
|
|
|
|
# Layout settings
|
|
"layout_recognize": True,
|
|
|
|
# Chunking settings
|
|
"chunk_token_num": 128,
|
|
"delimiter": "\n!?。;!?",
|
|
|
|
# For TABLE parser
|
|
"field_map": {
|
|
"column_name": "mapped_field_name"
|
|
},
|
|
|
|
# For specific parsers
|
|
"raptor": {"enabled": False},
|
|
"graphrag": {"enabled": False}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
| Location | Error Type | Handling |
|
|
|----------|-----------|----------|
|
|
| `create_with_name()` | Invalid name | Return `(False, error_result)` |
|
|
| `create_with_name()` | Tenant not found | Return `(False, error_result)` |
|
|
| `update_parser_config()` | KB not found | Raise `LookupError` |
|
|
| `delete_field_map()` | KB not found | Raise `LookupError` |
|
|
| `decrease_document_num_in_delete()` | KB not found | Raise `RuntimeError` |
|
|
| `update_document_number_in_init()` | ValueError "no data to save" | Pass (ignore) |
|
|
|
|
---
|
|
|
|
## Database Patterns
|
|
|
|
### Atomic Updates
|
|
|
|
```python
|
|
# Atomic increment using SQL expression (Line 522-523)
|
|
data["doc_num"] = cls.model.doc_num + 1 # Peewee generates: doc_num + 1
|
|
cls.model.update(data).where(cls.model.id == kb_id).execute()
|
|
```
|
|
|
|
### Batch Pagination Pattern
|
|
|
|
```python
|
|
# Avoid deep pagination performance issues (Lines 224-232)
|
|
offset, limit = 0, 50
|
|
res = []
|
|
while True:
|
|
kb_batch = kbs.offset(offset).limit(limit)
|
|
_temp = list(kb_batch.dicts())
|
|
if not _temp:
|
|
break
|
|
res.extend(_temp)
|
|
offset += limit
|
|
```
|
|
|
|
### Selective Field Save
|
|
|
|
```python
|
|
# Save only dirty fields without updating timestamps (Lines 537-545)
|
|
dirty_fields = kb.dirty_fields
|
|
if cls.model._meta.combined.get("update_time") in dirty_fields:
|
|
dirty_fields.remove(cls.model._meta.combined["update_time"])
|
|
kb.save(only=dirty_fields)
|
|
```
|
|
|
|
---
|
|
|
|
## Key Constants & Imports
|
|
|
|
```python
|
|
# Permission types (from api/db/__init__.py)
|
|
class TenantPermission(Enum):
|
|
ME = "me" # Private to creator
|
|
TEAM = "team" # Shared with tenant
|
|
|
|
# Status (from common/constants.py)
|
|
class StatusEnum(Enum):
|
|
WASTED = "0" # Soft deleted
|
|
VALID = "1" # Active
|
|
|
|
# Dataset name limit (from api/constants.py)
|
|
DATASET_NAME_LIMIT = 128 # bytes (UTF-8)
|
|
|
|
# Default parser
|
|
ParserType.NAIVE = "naive"
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Considerations
|
|
|
|
1. **Batch Pagination**: `get_all_kb_by_tenant_ids()` uses offset-limit pagination to avoid memory issues
|
|
|
|
2. **Selective Joins**: Queries only join necessary tables (User, UserTenant, UserCanvas)
|
|
|
|
3. **Index Usage**: All filter/sort fields are indexed (`tenant_id`, `name`, `permission`, `status`, `parser_id`)
|
|
|
|
4. **Atomic Operations**: Statistics updates use SQL expressions for atomicity without explicit transactions
|
|
|
|
5. **Lazy Loading**: Document details fetched separately from KB list queries
|