ragflow/personal_analyze/01-API-LAYER/request_lifecycle.md
Claude a6ee18476d
docs: Add detailed backend module analysis documentation
Add comprehensive documentation covering 6 modules:
- 01-API-LAYER: Authentication, routing, SSE streaming
- 02-SERVICE-LAYER: Dialog, Task, LLM service analysis
- 03-RAG-ENGINE: Hybrid search, embedding, reranking
- 04-AGENT-SYSTEM: Canvas engine, components, tools
- 05-DOCUMENT-PROCESSING: Task executor, PDF parsing
- 06-ALGORITHMS: BM25, fusion, RAPTOR

Total 28 documentation files with code analysis, diagrams, and formulas.
2025-11-26 11:10:54 +00:00

21 KiB

Request Lifecycle Analysis

Tổng Quan

Mỗi HTTP request trong RAGFlow đi qua một pipeline xử lý với nhiều stages: middleware, authentication, validation, và business logic.

Request Lifecycle Flow

┌─────────────────────────────────────────────────────────────────────────┐
│                      COMPLETE REQUEST LIFECYCLE                          │
└─────────────────────────────────────────────────────────────────────────┘

[1] CLIENT REQUEST
    │
    ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [2] NGINX REVERSE PROXY                                                │
│     ├── SSL termination (HTTPS → HTTP)                                │
│     ├── Request buffering                                             │
│     ├── Rate limiting (optional)                                      │
│     └── Forward to upstream: ragflow-server:9380                      │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [3] QUART ASGI SERVER                                                  │
│     ├── Parse HTTP request                                            │
│     ├── Create request context                                        │
│     └── Route to WSGI app                                             │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [4] CORS MIDDLEWARE                                                    │
│     ├── Check Origin header                                           │
│     ├── Add CORS headers to response                                  │
│     │   - Access-Control-Allow-Origin: *                              │
│     │   - Access-Control-Allow-Methods: GET, POST, PUT, DELETE        │
│     │   - Access-Control-Allow-Headers: *                             │
│     └── Handle OPTIONS preflight (return 200)                         │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [5] SESSION MIDDLEWARE                                                 │
│     ├── Load session from Redis (if cookie present)                   │
│     ├── Initialize g.session object                                   │
│     └── Session data available throughout request                     │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [6] AUTHENTICATION (@app.before_request)                               │
│     ├── _load_user() called                                           │
│     ├── Parse Authorization header                                    │
│     ├── Validate JWT or API token                                     │
│     ├── Query user from database                                      │
│     └── Set g.user (or None for anonymous)                            │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [7] BLUEPRINT ROUTER                                                   │
│     ├── Match URL pattern to blueprint                                │
│     │   /api/v1/kb/* → kb_app                                        │
│     │   /api/v1/document/* → document_app                            │
│     │   /v1/conversation/* → conversation_app                        │
│     ├── Extract URL parameters                                        │
│     └── Call route handler                                            │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [8] ROUTE DECORATORS                                                   │
│     │                                                                 │
│     ├── @login_required                                               │
│     │   └── Check if g.user is set, else return 401                   │
│     │                                                                 │
│     ├── @validate_request("param1", "param2")                         │
│     │   └── Check required params exist in request body               │
│     │                                                                 │
│     └── Custom decorators (@rate_limit, @cache, etc.)                 │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [9] ROUTE HANDLER (async function)                                     │
│     │                                                                 │
│     ├── Parse request body                                            │
│     │   req = await request.json                                      │
│     │   form = await request.form                                     │
│     │   files = await request.files                                   │
│     │                                                                 │
│     ├── Authorization checks                                          │
│     │   check_kb_team_permission(kb, user.id)                         │
│     │                                                                 │
│     ├── Call Service Layer                                            │
│     │   result = ServiceClass.method(params)                          │
│     │                                                                 │
│     └── Format response                                               │
│         return get_json_result(data=result)                           │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [10] SERVICE LAYER                                                     │
│     │                                                                 │
│     ├── Business logic execution                                      │
│     ├── Database operations (Peewee ORM)                              │
│     ├── External service calls (LLM, storage)                         │
│     └── Return processed data                                         │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [11] RESPONSE FORMATTING                                               │
│     │                                                                 │
│     ├── get_json_result(code, message, data)                          │
│     │   {                                                             │
│     │     "code": 0,                                                  │
│     │     "message": "success",                                       │
│     │     "data": {...}                                               │
│     │   }                                                             │
│     │                                                                 │
│     └── Custom JSON encoder for special types                         │
│         - datetime → ISO string                                       │
│         - Decimal → float                                             │
│         - Model → dict                                                │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [12] ERROR HANDLING (@app.errorhandler)                                │
│     │                                                                 │
│     ├── Catch unhandled exceptions                                    │
│     ├── Log error with traceback                                      │
│     ├── Map exception to HTTP status code                             │
│     │   - Unauthorized → 401                                          │
│     │   - LookupError → 404                                           │
│     │   - PermissionError → 403                                       │
│     │   - Exception → 500                                             │
│     └── Return error response                                         │
└────────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
┌───────────────────────────────────────────────────────────────────────┐
│ [13] RESPONSE SENT TO CLIENT                                           │
│     ├── HTTP status code                                              │
│     ├── Response headers                                              │
│     └── JSON body                                                     │
└───────────────────────────────────────────────────────────────────────┘

Code Examples

Middleware Stack

# /api/apps/__init__.py

# 1. Create Quart app
app = Quart(__name__)

# 2. Apply CORS middleware
app = cors(app, allow_origin="*")

# 3. Configure strict slashes
app.url_map.strict_slashes = False

# 4. Custom JSON encoder
app.json_encoder = CustomJSONEncoder

# 5. Session configuration
app.config["SESSION_TYPE"] = "redis"
app.config["SESSION_REDIS"] = redis_connection
app.config["MAX_CONTENT_LENGTH"] = 1024 * 1024 * 1024  # 1GB

# 6. Global error handler
app.errorhandler(Exception)(server_error_response)

# 7. Before request hook (authentication)
@app.before_request
def before_request():
    _load_user()

Request Validation Decorator

def validate_request(*args, **kwargs):
    """
    Decorator to validate required request parameters.

    Usage:
        @validate_request("kb_id", "name")  # Required params
        @validate_request("status", status=["active", "inactive"])  # Enum validation
    """
    def process_args(input_arguments):
        no_arguments = []
        error_arguments = []

        # Check required args exist
        for arg in args:
            if arg not in input_arguments:
                no_arguments.append(arg)

        # Check enum values
        for k, v in kwargs.items():
            config_value = input_arguments.get(k, None)
            if config_value is None:
                no_arguments.append(k)
            elif isinstance(v, (tuple, list)):
                if config_value not in v:
                    error_arguments.append((k, set(v)))

        if no_arguments or error_arguments:
            error_string = f"Required arguments missing: {','.join(no_arguments)}"
            if error_arguments:
                error_string += f"; Invalid values: {error_arguments}"
            return error_string
        return None

    def wrapper(func):
        @wraps(func)
        async def decorated_function(*_args, **_kwargs):
            # Get request data
            body = await request.json or (await request.form).to_dict()

            # Validate
            errs = process_args(body)
            if errs:
                return get_json_result(
                    code=RetCode.ARGUMENT_ERROR,
                    message=errs
                )

            # Call handler
            if inspect.iscoroutinefunction(func):
                return await func(*_args, **_kwargs)
            return func(*_args, **_kwargs)

        return decorated_function
    return wrapper

Response Formatting

def get_json_result(
    code: RetCode = RetCode.SUCCESS,
    message: str = "success",
    data: Any = None
) -> Response:
    """
    Standard JSON response formatter.

    Args:
        code: Return code (0 = success)
        message: Human-readable message
        data: Response payload

    Returns:
        Flask Response with JSON body
    """
    response = {
        "code": code,
        "message": message,
        "data": data
    }
    return jsonify(response)


class CustomJSONEncoder(json.JSONEncoder):
    """Custom JSON encoder for special types."""

    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, Decimal):
            return float(obj)
        if hasattr(obj, 'to_dict'):
            return obj.to_dict()
        if isinstance(obj, bytes):
            return base64.b64encode(obj).decode()
        return super().default(obj)

Error Handler

def server_error_response(e):
    """
    Global exception handler for unhandled errors.
    """
    logging.error(
        "Unhandled exception",
        exc_info=(type(e), e, e.__traceback__)
    )

    msg = repr(e).lower()

    # Map exception types to HTTP codes
    if getattr(e, "code", None) == 401 or "unauthorized" in msg:
        return get_json_result(
            code=RetCode.UNAUTHORIZED,
            message=repr(e)
        ), 401

    if isinstance(e, LookupError) or "not found" in msg:
        return get_json_result(
            code=RetCode.DATA_ERROR,
            message=repr(e)
        ), 404

    if isinstance(e, PermissionError) or "permission" in msg:
        return get_json_result(
            code=RetCode.FORBIDDEN,
            message=repr(e)
        ), 403

    # Document store specific errors
    if "index_not_found_exception" in repr(e):
        return get_json_result(
            code=RetCode.EXCEPTION_ERROR,
            message="No chunk found. Please upload and parse files first."
        )

    # Generic server error
    return get_json_result(
        code=RetCode.EXCEPTION_ERROR,
        message=repr(e)
    ), 500

Typical Request Example

Request

POST /api/v1/kb/create HTTP/1.1
Host: localhost:9380
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Content-Type: application/json

{
    "name": "My Knowledge Base",
    "parser_id": "pdf"
}

Processing Timeline

T+0ms   : Request received by Nginx
T+1ms   : Forwarded to Quart server
T+2ms   : CORS headers added
T+3ms   : Session loaded from Redis
T+5ms   : _load_user() validates JWT
T+10ms  : User queried from MySQL
T+12ms  : Blueprint router matches /api/v1/kb/create
T+13ms  : @login_required passes (user exists)
T+14ms  : @validate_request("name") validates params
T+15ms  : Handler async function called
T+20ms  : KnowledgebaseService.create() called
T+25ms  : KB inserted into MySQL
T+28ms  : ES index created
T+30ms  : Response formatted
T+31ms  : JSON response sent

Response

HTTP/1.1 200 OK
Content-Type: application/json
Access-Control-Allow-Origin: *

{
    "code": 0,
    "message": "success",
    "data": {
        "id": "kb_123abc",
        "name": "My Knowledge Base",
        "parser_id": "pdf",
        "created_at": "2024-01-15T10:30:00Z"
    }
}

Performance Considerations

Connection Pooling

# Database connection pool
db = PooledMySQLDatabase(
    database,
    max_connections=32,
    stale_timeout=300,
    **connection_params
)

# Redis connection pool
redis_pool = redis.ConnectionPool(
    host=redis_host,
    port=redis_port,
    max_connections=100
)

Async I/O

# All route handlers are async
@manager.route("/endpoint", methods=["POST"])
async def handler():
    # Async request parsing
    req = await request.json

    # Async file handling
    files = await request.files

    # Run blocking I/O in thread pool
    result = await asyncio.to_thread(blocking_operation)

    return get_json_result(data=result)

Response Streaming

# For large responses, use streaming
def stream():
    for chunk in generate_chunks():
        yield chunk

resp = Response(stream(), mimetype="text/event-stream")
resp.headers.add_header("X-Accel-Buffering", "no")
return resp

Logging

# Request logging
@app.before_request
def log_request():
    logging.info(f"{request.method} {request.path}")

# Response logging
@app.after_request
def log_response(response):
    logging.info(f"Response: {response.status_code}")
    return response
  • /api/apps/__init__.py - App initialization
  • /api/ragflow_server.py - Server entry point
  • /api/utils/api_utils.py - API utilities
  • /api/validation.py - Request validation