ragflow/personal_analyze/01-API-LAYER/request_lifecycle.md
Claude a6ee18476d
docs: Add detailed backend module analysis documentation
Add comprehensive documentation covering 6 modules:
- 01-API-LAYER: Authentication, routing, SSE streaming
- 02-SERVICE-LAYER: Dialog, Task, LLM service analysis
- 03-RAG-ENGINE: Hybrid search, embedding, reranking
- 04-AGENT-SYSTEM: Canvas engine, components, tools
- 05-DOCUMENT-PROCESSING: Task executor, PDF parsing
- 06-ALGORITHMS: BM25, fusion, RAPTOR

Total 28 documentation files with code analysis, diagrams, and formulas.
2025-11-26 11:10:54 +00:00

463 lines
21 KiB
Markdown

# Request Lifecycle Analysis
## Tổng Quan
Mỗi HTTP request trong RAGFlow đi qua một pipeline xử lý với nhiều stages: middleware, authentication, validation, và business logic.
## Request Lifecycle Flow
```
┌─────────────────────────────────────────────────────────────────────────┐
│ COMPLETE REQUEST LIFECYCLE │
└─────────────────────────────────────────────────────────────────────────┘
[1] CLIENT REQUEST
┌───────────────────────────────────────────────────────────────────────┐
│ [2] NGINX REVERSE PROXY │
│ ├── SSL termination (HTTPS → HTTP) │
│ ├── Request buffering │
│ ├── Rate limiting (optional) │
│ └── Forward to upstream: ragflow-server:9380 │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [3] QUART ASGI SERVER │
│ ├── Parse HTTP request │
│ ├── Create request context │
│ └── Route to WSGI app │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [4] CORS MIDDLEWARE │
│ ├── Check Origin header │
│ ├── Add CORS headers to response │
│ │ - Access-Control-Allow-Origin: * │
│ │ - Access-Control-Allow-Methods: GET, POST, PUT, DELETE │
│ │ - Access-Control-Allow-Headers: * │
│ └── Handle OPTIONS preflight (return 200) │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [5] SESSION MIDDLEWARE │
│ ├── Load session from Redis (if cookie present) │
│ ├── Initialize g.session object │
│ └── Session data available throughout request │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [6] AUTHENTICATION (@app.before_request) │
│ ├── _load_user() called │
│ ├── Parse Authorization header │
│ ├── Validate JWT or API token │
│ ├── Query user from database │
│ └── Set g.user (or None for anonymous) │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [7] BLUEPRINT ROUTER │
│ ├── Match URL pattern to blueprint │
│ │ /api/v1/kb/* → kb_app │
│ │ /api/v1/document/* → document_app │
│ │ /v1/conversation/* → conversation_app │
│ ├── Extract URL parameters │
│ └── Call route handler │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [8] ROUTE DECORATORS │
│ │ │
│ ├── @login_required │
│ │ └── Check if g.user is set, else return 401 │
│ │ │
│ ├── @validate_request("param1", "param2") │
│ │ └── Check required params exist in request body │
│ │ │
│ └── Custom decorators (@rate_limit, @cache, etc.) │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [9] ROUTE HANDLER (async function) │
│ │ │
│ ├── Parse request body │
│ │ req = await request.json │
│ │ form = await request.form │
│ │ files = await request.files │
│ │ │
│ ├── Authorization checks │
│ │ check_kb_team_permission(kb, user.id) │
│ │ │
│ ├── Call Service Layer │
│ │ result = ServiceClass.method(params) │
│ │ │
│ └── Format response │
│ return get_json_result(data=result) │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [10] SERVICE LAYER │
│ │ │
│ ├── Business logic execution │
│ ├── Database operations (Peewee ORM) │
│ ├── External service calls (LLM, storage) │
│ └── Return processed data │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [11] RESPONSE FORMATTING │
│ │ │
│ ├── get_json_result(code, message, data) │
│ │ { │
│ │ "code": 0, │
│ │ "message": "success", │
│ │ "data": {...} │
│ │ } │
│ │ │
│ └── Custom JSON encoder for special types │
│ - datetime → ISO string │
│ - Decimal → float │
│ - Model → dict │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [12] ERROR HANDLING (@app.errorhandler) │
│ │ │
│ ├── Catch unhandled exceptions │
│ ├── Log error with traceback │
│ ├── Map exception to HTTP status code │
│ │ - Unauthorized → 401 │
│ │ - LookupError → 404 │
│ │ - PermissionError → 403 │
│ │ - Exception → 500 │
│ └── Return error response │
└────────────────────────────────┬──────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────┐
│ [13] RESPONSE SENT TO CLIENT │
│ ├── HTTP status code │
│ ├── Response headers │
│ └── JSON body │
└───────────────────────────────────────────────────────────────────────┘
```
## Code Examples
### Middleware Stack
```python
# /api/apps/__init__.py
# 1. Create Quart app
app = Quart(__name__)
# 2. Apply CORS middleware
app = cors(app, allow_origin="*")
# 3. Configure strict slashes
app.url_map.strict_slashes = False
# 4. Custom JSON encoder
app.json_encoder = CustomJSONEncoder
# 5. Session configuration
app.config["SESSION_TYPE"] = "redis"
app.config["SESSION_REDIS"] = redis_connection
app.config["MAX_CONTENT_LENGTH"] = 1024 * 1024 * 1024 # 1GB
# 6. Global error handler
app.errorhandler(Exception)(server_error_response)
# 7. Before request hook (authentication)
@app.before_request
def before_request():
_load_user()
```
### Request Validation Decorator
```python
def validate_request(*args, **kwargs):
"""
Decorator to validate required request parameters.
Usage:
@validate_request("kb_id", "name") # Required params
@validate_request("status", status=["active", "inactive"]) # Enum validation
"""
def process_args(input_arguments):
no_arguments = []
error_arguments = []
# Check required args exist
for arg in args:
if arg not in input_arguments:
no_arguments.append(arg)
# Check enum values
for k, v in kwargs.items():
config_value = input_arguments.get(k, None)
if config_value is None:
no_arguments.append(k)
elif isinstance(v, (tuple, list)):
if config_value not in v:
error_arguments.append((k, set(v)))
if no_arguments or error_arguments:
error_string = f"Required arguments missing: {','.join(no_arguments)}"
if error_arguments:
error_string += f"; Invalid values: {error_arguments}"
return error_string
return None
def wrapper(func):
@wraps(func)
async def decorated_function(*_args, **_kwargs):
# Get request data
body = await request.json or (await request.form).to_dict()
# Validate
errs = process_args(body)
if errs:
return get_json_result(
code=RetCode.ARGUMENT_ERROR,
message=errs
)
# Call handler
if inspect.iscoroutinefunction(func):
return await func(*_args, **_kwargs)
return func(*_args, **_kwargs)
return decorated_function
return wrapper
```
### Response Formatting
```python
def get_json_result(
code: RetCode = RetCode.SUCCESS,
message: str = "success",
data: Any = None
) -> Response:
"""
Standard JSON response formatter.
Args:
code: Return code (0 = success)
message: Human-readable message
data: Response payload
Returns:
Flask Response with JSON body
"""
response = {
"code": code,
"message": message,
"data": data
}
return jsonify(response)
class CustomJSONEncoder(json.JSONEncoder):
"""Custom JSON encoder for special types."""
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, Decimal):
return float(obj)
if hasattr(obj, 'to_dict'):
return obj.to_dict()
if isinstance(obj, bytes):
return base64.b64encode(obj).decode()
return super().default(obj)
```
### Error Handler
```python
def server_error_response(e):
"""
Global exception handler for unhandled errors.
"""
logging.error(
"Unhandled exception",
exc_info=(type(e), e, e.__traceback__)
)
msg = repr(e).lower()
# Map exception types to HTTP codes
if getattr(e, "code", None) == 401 or "unauthorized" in msg:
return get_json_result(
code=RetCode.UNAUTHORIZED,
message=repr(e)
), 401
if isinstance(e, LookupError) or "not found" in msg:
return get_json_result(
code=RetCode.DATA_ERROR,
message=repr(e)
), 404
if isinstance(e, PermissionError) or "permission" in msg:
return get_json_result(
code=RetCode.FORBIDDEN,
message=repr(e)
), 403
# Document store specific errors
if "index_not_found_exception" in repr(e):
return get_json_result(
code=RetCode.EXCEPTION_ERROR,
message="No chunk found. Please upload and parse files first."
)
# Generic server error
return get_json_result(
code=RetCode.EXCEPTION_ERROR,
message=repr(e)
), 500
```
## Typical Request Example
### Request
```http
POST /api/v1/kb/create HTTP/1.1
Host: localhost:9380
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Content-Type: application/json
```
### Processing Timeline
```
T+0ms : Request received by Nginx
T+1ms : Forwarded to Quart server
T+2ms : CORS headers added
T+3ms : Session loaded from Redis
T+5ms : _load_user() validates JWT
T+10ms : User queried from MySQL
T+12ms : Blueprint router matches /api/v1/kb/create
T+13ms : @login_required passes (user exists)
T+14ms : @validate_request("name") validates params
T+15ms : Handler async function called
T+20ms : KnowledgebaseService.create() called
T+25ms : KB inserted into MySQL
T+28ms : ES index created
T+30ms : Response formatted
T+31ms : JSON response sent
```
### Response
```http
HTTP/1.1 200 OK
Content-Type: application/json
Access-Control-Allow-Origin: *
```
## Performance Considerations
### Connection Pooling
```python
# Database connection pool
db = PooledMySQLDatabase(
database,
max_connections=32,
stale_timeout=300,
**connection_params
)
# Redis connection pool
redis_pool = redis.ConnectionPool(
host=redis_host,
port=redis_port,
max_connections=100
)
```
### Async I/O
```python
# All route handlers are async
@manager.route("/endpoint", methods=["POST"])
async def handler():
# Async request parsing
req = await request.json
# Async file handling
files = await request.files
# Run blocking I/O in thread pool
result = await asyncio.to_thread(blocking_operation)
return get_json_result(data=result)
```
### Response Streaming
```python
# For large responses, use streaming
def stream():
for chunk in generate_chunks():
yield chunk
resp = Response(stream(), mimetype="text/event-stream")
resp.headers.add_header("X-Accel-Buffering", "no")
return resp
```
## Logging
```python
# Request logging
@app.before_request
def log_request():
logging.info(f"{request.method} {request.path}")
# Response logging
@app.after_request
def log_response(response):
logging.info(f"Response: {response.status_code}")
return response
```
## Related Files
- `/api/apps/__init__.py` - App initialization
- `/api/ragflow_server.py` - Server entry point
- `/api/utils/api_utils.py` - API utilities
- `/api/validation.py` - Request validation