Add comprehensive documentation covering 6 modules: - 01-API-LAYER: Authentication, routing, SSE streaming - 02-SERVICE-LAYER: Dialog, Task, LLM service analysis - 03-RAG-ENGINE: Hybrid search, embedding, reranking - 04-AGENT-SYSTEM: Canvas engine, components, tools - 05-DOCUMENT-PROCESSING: Task executor, PDF parsing - 06-ALGORITHMS: BM25, fusion, RAPTOR Total 28 documentation files with code analysis, diagrams, and formulas.
463 lines
21 KiB
Markdown
463 lines
21 KiB
Markdown
# Request Lifecycle Analysis
|
|
|
|
## Tổng Quan
|
|
|
|
Mỗi HTTP request trong RAGFlow đi qua một pipeline xử lý với nhiều stages: middleware, authentication, validation, và business logic.
|
|
|
|
## Request Lifecycle Flow
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ COMPLETE REQUEST LIFECYCLE │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
[1] CLIENT REQUEST
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [2] NGINX REVERSE PROXY │
|
|
│ ├── SSL termination (HTTPS → HTTP) │
|
|
│ ├── Request buffering │
|
|
│ ├── Rate limiting (optional) │
|
|
│ └── Forward to upstream: ragflow-server:9380 │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [3] QUART ASGI SERVER │
|
|
│ ├── Parse HTTP request │
|
|
│ ├── Create request context │
|
|
│ └── Route to WSGI app │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [4] CORS MIDDLEWARE │
|
|
│ ├── Check Origin header │
|
|
│ ├── Add CORS headers to response │
|
|
│ │ - Access-Control-Allow-Origin: * │
|
|
│ │ - Access-Control-Allow-Methods: GET, POST, PUT, DELETE │
|
|
│ │ - Access-Control-Allow-Headers: * │
|
|
│ └── Handle OPTIONS preflight (return 200) │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [5] SESSION MIDDLEWARE │
|
|
│ ├── Load session from Redis (if cookie present) │
|
|
│ ├── Initialize g.session object │
|
|
│ └── Session data available throughout request │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [6] AUTHENTICATION (@app.before_request) │
|
|
│ ├── _load_user() called │
|
|
│ ├── Parse Authorization header │
|
|
│ ├── Validate JWT or API token │
|
|
│ ├── Query user from database │
|
|
│ └── Set g.user (or None for anonymous) │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [7] BLUEPRINT ROUTER │
|
|
│ ├── Match URL pattern to blueprint │
|
|
│ │ /api/v1/kb/* → kb_app │
|
|
│ │ /api/v1/document/* → document_app │
|
|
│ │ /v1/conversation/* → conversation_app │
|
|
│ ├── Extract URL parameters │
|
|
│ └── Call route handler │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [8] ROUTE DECORATORS │
|
|
│ │ │
|
|
│ ├── @login_required │
|
|
│ │ └── Check if g.user is set, else return 401 │
|
|
│ │ │
|
|
│ ├── @validate_request("param1", "param2") │
|
|
│ │ └── Check required params exist in request body │
|
|
│ │ │
|
|
│ └── Custom decorators (@rate_limit, @cache, etc.) │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [9] ROUTE HANDLER (async function) │
|
|
│ │ │
|
|
│ ├── Parse request body │
|
|
│ │ req = await request.json │
|
|
│ │ form = await request.form │
|
|
│ │ files = await request.files │
|
|
│ │ │
|
|
│ ├── Authorization checks │
|
|
│ │ check_kb_team_permission(kb, user.id) │
|
|
│ │ │
|
|
│ ├── Call Service Layer │
|
|
│ │ result = ServiceClass.method(params) │
|
|
│ │ │
|
|
│ └── Format response │
|
|
│ return get_json_result(data=result) │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [10] SERVICE LAYER │
|
|
│ │ │
|
|
│ ├── Business logic execution │
|
|
│ ├── Database operations (Peewee ORM) │
|
|
│ ├── External service calls (LLM, storage) │
|
|
│ └── Return processed data │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [11] RESPONSE FORMATTING │
|
|
│ │ │
|
|
│ ├── get_json_result(code, message, data) │
|
|
│ │ { │
|
|
│ │ "code": 0, │
|
|
│ │ "message": "success", │
|
|
│ │ "data": {...} │
|
|
│ │ } │
|
|
│ │ │
|
|
│ └── Custom JSON encoder for special types │
|
|
│ - datetime → ISO string │
|
|
│ - Decimal → float │
|
|
│ - Model → dict │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [12] ERROR HANDLING (@app.errorhandler) │
|
|
│ │ │
|
|
│ ├── Catch unhandled exceptions │
|
|
│ ├── Log error with traceback │
|
|
│ ├── Map exception to HTTP status code │
|
|
│ │ - Unauthorized → 401 │
|
|
│ │ - LookupError → 404 │
|
|
│ │ - PermissionError → 403 │
|
|
│ │ - Exception → 500 │
|
|
│ └── Return error response │
|
|
└────────────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────────────────────────────────────────────────────────────┐
|
|
│ [13] RESPONSE SENT TO CLIENT │
|
|
│ ├── HTTP status code │
|
|
│ ├── Response headers │
|
|
│ └── JSON body │
|
|
└───────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Code Examples
|
|
|
|
### Middleware Stack
|
|
|
|
```python
|
|
# /api/apps/__init__.py
|
|
|
|
# 1. Create Quart app
|
|
app = Quart(__name__)
|
|
|
|
# 2. Apply CORS middleware
|
|
app = cors(app, allow_origin="*")
|
|
|
|
# 3. Configure strict slashes
|
|
app.url_map.strict_slashes = False
|
|
|
|
# 4. Custom JSON encoder
|
|
app.json_encoder = CustomJSONEncoder
|
|
|
|
# 5. Session configuration
|
|
app.config["SESSION_TYPE"] = "redis"
|
|
app.config["SESSION_REDIS"] = redis_connection
|
|
app.config["MAX_CONTENT_LENGTH"] = 1024 * 1024 * 1024 # 1GB
|
|
|
|
# 6. Global error handler
|
|
app.errorhandler(Exception)(server_error_response)
|
|
|
|
# 7. Before request hook (authentication)
|
|
@app.before_request
|
|
def before_request():
|
|
_load_user()
|
|
```
|
|
|
|
### Request Validation Decorator
|
|
|
|
```python
|
|
def validate_request(*args, **kwargs):
|
|
"""
|
|
Decorator to validate required request parameters.
|
|
|
|
Usage:
|
|
@validate_request("kb_id", "name") # Required params
|
|
@validate_request("status", status=["active", "inactive"]) # Enum validation
|
|
"""
|
|
def process_args(input_arguments):
|
|
no_arguments = []
|
|
error_arguments = []
|
|
|
|
# Check required args exist
|
|
for arg in args:
|
|
if arg not in input_arguments:
|
|
no_arguments.append(arg)
|
|
|
|
# Check enum values
|
|
for k, v in kwargs.items():
|
|
config_value = input_arguments.get(k, None)
|
|
if config_value is None:
|
|
no_arguments.append(k)
|
|
elif isinstance(v, (tuple, list)):
|
|
if config_value not in v:
|
|
error_arguments.append((k, set(v)))
|
|
|
|
if no_arguments or error_arguments:
|
|
error_string = f"Required arguments missing: {','.join(no_arguments)}"
|
|
if error_arguments:
|
|
error_string += f"; Invalid values: {error_arguments}"
|
|
return error_string
|
|
return None
|
|
|
|
def wrapper(func):
|
|
@wraps(func)
|
|
async def decorated_function(*_args, **_kwargs):
|
|
# Get request data
|
|
body = await request.json or (await request.form).to_dict()
|
|
|
|
# Validate
|
|
errs = process_args(body)
|
|
if errs:
|
|
return get_json_result(
|
|
code=RetCode.ARGUMENT_ERROR,
|
|
message=errs
|
|
)
|
|
|
|
# Call handler
|
|
if inspect.iscoroutinefunction(func):
|
|
return await func(*_args, **_kwargs)
|
|
return func(*_args, **_kwargs)
|
|
|
|
return decorated_function
|
|
return wrapper
|
|
```
|
|
|
|
### Response Formatting
|
|
|
|
```python
|
|
def get_json_result(
|
|
code: RetCode = RetCode.SUCCESS,
|
|
message: str = "success",
|
|
data: Any = None
|
|
) -> Response:
|
|
"""
|
|
Standard JSON response formatter.
|
|
|
|
Args:
|
|
code: Return code (0 = success)
|
|
message: Human-readable message
|
|
data: Response payload
|
|
|
|
Returns:
|
|
Flask Response with JSON body
|
|
"""
|
|
response = {
|
|
"code": code,
|
|
"message": message,
|
|
"data": data
|
|
}
|
|
return jsonify(response)
|
|
|
|
|
|
class CustomJSONEncoder(json.JSONEncoder):
|
|
"""Custom JSON encoder for special types."""
|
|
|
|
def default(self, obj):
|
|
if isinstance(obj, datetime):
|
|
return obj.isoformat()
|
|
if isinstance(obj, Decimal):
|
|
return float(obj)
|
|
if hasattr(obj, 'to_dict'):
|
|
return obj.to_dict()
|
|
if isinstance(obj, bytes):
|
|
return base64.b64encode(obj).decode()
|
|
return super().default(obj)
|
|
```
|
|
|
|
### Error Handler
|
|
|
|
```python
|
|
def server_error_response(e):
|
|
"""
|
|
Global exception handler for unhandled errors.
|
|
"""
|
|
logging.error(
|
|
"Unhandled exception",
|
|
exc_info=(type(e), e, e.__traceback__)
|
|
)
|
|
|
|
msg = repr(e).lower()
|
|
|
|
# Map exception types to HTTP codes
|
|
if getattr(e, "code", None) == 401 or "unauthorized" in msg:
|
|
return get_json_result(
|
|
code=RetCode.UNAUTHORIZED,
|
|
message=repr(e)
|
|
), 401
|
|
|
|
if isinstance(e, LookupError) or "not found" in msg:
|
|
return get_json_result(
|
|
code=RetCode.DATA_ERROR,
|
|
message=repr(e)
|
|
), 404
|
|
|
|
if isinstance(e, PermissionError) or "permission" in msg:
|
|
return get_json_result(
|
|
code=RetCode.FORBIDDEN,
|
|
message=repr(e)
|
|
), 403
|
|
|
|
# Document store specific errors
|
|
if "index_not_found_exception" in repr(e):
|
|
return get_json_result(
|
|
code=RetCode.EXCEPTION_ERROR,
|
|
message="No chunk found. Please upload and parse files first."
|
|
)
|
|
|
|
# Generic server error
|
|
return get_json_result(
|
|
code=RetCode.EXCEPTION_ERROR,
|
|
message=repr(e)
|
|
), 500
|
|
```
|
|
|
|
## Typical Request Example
|
|
|
|
### Request
|
|
|
|
```http
|
|
POST /api/v1/kb/create HTTP/1.1
|
|
Host: localhost:9380
|
|
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
|
|
Content-Type: application/json
|
|
|
|
```
|
|
|
|
### Processing Timeline
|
|
|
|
```
|
|
T+0ms : Request received by Nginx
|
|
T+1ms : Forwarded to Quart server
|
|
T+2ms : CORS headers added
|
|
T+3ms : Session loaded from Redis
|
|
T+5ms : _load_user() validates JWT
|
|
T+10ms : User queried from MySQL
|
|
T+12ms : Blueprint router matches /api/v1/kb/create
|
|
T+13ms : @login_required passes (user exists)
|
|
T+14ms : @validate_request("name") validates params
|
|
T+15ms : Handler async function called
|
|
T+20ms : KnowledgebaseService.create() called
|
|
T+25ms : KB inserted into MySQL
|
|
T+28ms : ES index created
|
|
T+30ms : Response formatted
|
|
T+31ms : JSON response sent
|
|
```
|
|
|
|
### Response
|
|
|
|
```http
|
|
HTTP/1.1 200 OK
|
|
Content-Type: application/json
|
|
Access-Control-Allow-Origin: *
|
|
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### Connection Pooling
|
|
|
|
```python
|
|
# Database connection pool
|
|
db = PooledMySQLDatabase(
|
|
database,
|
|
max_connections=32,
|
|
stale_timeout=300,
|
|
**connection_params
|
|
)
|
|
|
|
# Redis connection pool
|
|
redis_pool = redis.ConnectionPool(
|
|
host=redis_host,
|
|
port=redis_port,
|
|
max_connections=100
|
|
)
|
|
```
|
|
|
|
### Async I/O
|
|
|
|
```python
|
|
# All route handlers are async
|
|
@manager.route("/endpoint", methods=["POST"])
|
|
async def handler():
|
|
# Async request parsing
|
|
req = await request.json
|
|
|
|
# Async file handling
|
|
files = await request.files
|
|
|
|
# Run blocking I/O in thread pool
|
|
result = await asyncio.to_thread(blocking_operation)
|
|
|
|
return get_json_result(data=result)
|
|
```
|
|
|
|
### Response Streaming
|
|
|
|
```python
|
|
# For large responses, use streaming
|
|
def stream():
|
|
for chunk in generate_chunks():
|
|
yield chunk
|
|
|
|
resp = Response(stream(), mimetype="text/event-stream")
|
|
resp.headers.add_header("X-Accel-Buffering", "no")
|
|
return resp
|
|
```
|
|
|
|
## Logging
|
|
|
|
```python
|
|
# Request logging
|
|
@app.before_request
|
|
def log_request():
|
|
logging.info(f"{request.method} {request.path}")
|
|
|
|
# Response logging
|
|
@app.after_request
|
|
def log_response(response):
|
|
logging.info(f"Response: {response.status_code}")
|
|
return response
|
|
```
|
|
|
|
## Related Files
|
|
|
|
- `/api/apps/__init__.py` - App initialization
|
|
- `/api/ragflow_server.py` - Server entry point
|
|
- `/api/utils/api_utils.py` - API utilities
|
|
- `/api/validation.py` - Request validation
|