merging main

2025-11-13 19:37:44 +01:00 · 2025-11-13 19:37:44 +01:00 · 4945faf021
commit 4945faf021
parent cdbb0b0826 c164c8f631
40 changed files with 7715 additions and 1398 deletions
--- a/.clinerules/01-basic.md
+++ b/.clinerules/01-basic.md
@ -47,6 +47,31 @@ else np.frombuffer(base64.b64decode(dp.embedding), dtype=np.float32)
 **Location**: Neo4j storage finalization
 **Impact**: Prevents application shutdown failures

+### 6. Async Generator Lock Management (CRITICAL)
+**Pattern**: Never hold locks across async generator yields - create snapshots instead
+**Issue**: Holding locks while yielding causes deadlock when consumers need the same lock
+**Location**: `lightrag/tools/migrate_llm_cache.py` - `stream_default_caches_json`
+**Solution**: Create snapshot of data while holding lock, release lock, then iterate over snapshot
+```python
+# WRONG - Deadlock prone:
+async with storage._storage_lock:
+    for key, value in storage._data.items():
+        batch[key] = value
+        if len(batch) >= batch_size:
+            yield batch  # Lock still held!
+
+# CORRECT - Snapshot approach:
+async with storage._storage_lock:
+    matching_items = [(k, v) for k, v in storage._data.items() if condition]
+# Lock released here
+for key, value in matching_items:
+    batch[key] = value
+    if len(batch) >= batch_size:
+        yield batch  # No lock held
+```
+**Impact**: Prevents deadlocks in Json→Json migrations and similar scenarios where source/target share locks
+**Applicable To**: Any async generator that needs to access shared resources while yielding
+
 ## Architecture Patterns

 ### 1. Dependency Injection
--- a/README-zh.md
+++ b/README-zh.md
@ -90,6 +90,11 @@

 ## 安装

+> **💡 使用 uv 进行包管理**: 本项目使用 [uv](https://docs.astral.sh/uv/) 进行快速可靠的 Python 包管理。
+> 首先安装 uv: `curl -LsSf https://astral.sh/uv/install.sh | sh` (Unix/macOS) 或 `powershell -c "irm https://astral.sh/uv/install.ps1 | iex"` (Windows)
+>
+> **注意**: 如果您更喜欢使用 pip 也可以，但我们推荐使用 uv 以获得更好的性能和更可靠的依赖管理。
+
 ### 安装LightRAG服务器

 LightRAG服务器旨在提供Web UI和API支持。Web UI便于文档索引、知识图谱探索和简单的RAG查询界面。LightRAG服务器还提供兼容Ollama的接口，旨在将LightRAG模拟为Ollama聊天模型。这使得AI聊天机器人（如Open WebUI）可以轻松访问LightRAG。
@ -97,8 +102,13 @@ LightRAG服务器旨在提供Web UI和API支持。Web UI便于文档索引、知
 * 从PyPI安装

 ```bash
-pip install "lightrag-hku[api]"
-cp env.example .env
+# 使用 uv (推荐)
+uv pip install "lightrag-hku[api]"
+# 或使用 pip
+# pip install "lightrag-hku[api]"
+
+cp env.example .env  # 使用你的LLM和Embedding模型访问参数更新.env文件
+
 lightrag-server
 ```

@ -107,9 +117,17 @@ lightrag-server
 ```bash
 git clone https://github.com/HKUDS/LightRAG.git
 cd LightRAG
-# 如有必要，创建Python虚拟环境
-# 以可开发（编辑）模式安装LightRAG服务器
-pip install -e ".[api]"
+
+# 使用 uv (推荐)
+# 注意: uv sync 会自动在 .venv/ 目录创建虚拟环境
+uv sync --extra api
+source .venv/bin/activate  # 激活虚拟环境 (Linux/macOS)
+# Windows 系统: .venv\Scripts\activate
+
+# 或使用 pip 和虚拟环境
+# python -m venv .venv
+# source .venv/bin/activate  # Windows: .venv\Scripts\activate
+# pip install -e ".[api]"

 cp env.example .env  # 使用你的LLM和Embedding模型访问参数更新.env文件

@ -140,13 +158,19 @@ docker compose up

 ```bash
 cd LightRAG
-pip install -e .
+# 注意: uv sync 会自动在 .venv/ 目录创建虚拟环境
+uv sync
+source .venv/bin/activate  # 激活虚拟环境 (Linux/macOS)
+# Windows 系统: .venv\Scripts\activate
+
+# 或: pip install -e .
 ```

 * 从PyPI安装

 ```bash
-pip install lightrag-hku
+uv pip install lightrag-hku
+# 或: pip install lightrag-hku
 ```

 ## 快速开始
@ -868,7 +892,7 @@ rag = LightRAG(
 对于生产级场景，您很可能想要利用企业级解决方案。PostgreSQL可以为您提供一站式储解解决方案，作为KV存储、向量数据库（pgvector）和图数据库（apache AGE）。支持 PostgreSQL 版本为16.6或以上。

 * 如果您是初学者并想避免麻烦，推荐使用docker，请从这个镜像开始（请务必阅读概述）：https://hub.docker.com/r/shangor/postgres-for-rag
-* Apache AGE的性能不如Neo4j。最求高性能的图数据库请使用Noe4j。
+* Apache AGE的性能不如Neo4j。追求高性能的图数据库请使用Noe4j。

 </details>

--- a/README.md
+++ b/README.md
@ -88,6 +88,11 @@

 ## Installation

+> **💡 Using uv for Package Management**: This project uses [uv](https://docs.astral.sh/uv/) for fast and reliable Python package management.
+> Install uv first: `curl -LsSf https://astral.sh/uv/install.sh | sh` (Unix/macOS) or `powershell -c "irm https://astral.sh/uv/install.ps1 | iex"` (Windows)
+>
+> **Note**: You can also use pip if you prefer, but uv is recommended for better performance and more reliable dependency management.
+>
 > **📦 Offline Deployment**: For offline or air-gapped environments, see the [Offline Deployment Guide](./docs/OfflineDeployment.md) for instructions on pre-installing all dependencies and cache files.

 ### Install LightRAG Server
@ -97,8 +102,13 @@ The LightRAG Server is designed to provide Web UI and API support. The Web UI fa
 * Install from PyPI

 ```bash
-pip install "lightrag-hku[api]"
-cp env.example .env
+# Using uv (recommended)
+uv pip install "lightrag-hku[api]"
+# Or using pip
+# pip install "lightrag-hku[api]"
+
+cp env.example .env  # Update the .env with your LLM and embedding configurations
+
 lightrag-server
 ```

@ -107,9 +117,17 @@ lightrag-server
 ```bash
 git clone https://github.com/HKUDS/LightRAG.git
 cd LightRAG
-# Create a Python virtual enviroment if neccesary
-# Install in editable mode with API support
-pip install -e ".[api]"
+
+# Using uv (recommended)
+# Note: uv sync automatically creates a virtual environment in .venv/
+uv sync --extra api
+source .venv/bin/activate  # Activate the virtual environment (Linux/macOS)
+# Or on Windows: .venv\Scripts\activate
+
+# Or using pip with virtual environment
+# python -m venv .venv
+# source .venv/bin/activate  # Windows: .venv\Scripts\activate
+# pip install -e ".[api]"

 cp env.example .env  # Update the .env with your LLM and embedding configurations

@ -136,17 +154,23 @@ docker compose up

 ### Install  LightRAG Core

-* Install from source (Recommend)
+* Install from source (Recommended)

 ```bash
 cd LightRAG
-pip install -e .
+# Note: uv sync automatically creates a virtual environment in .venv/
+uv sync
+source .venv/bin/activate  # Activate the virtual environment (Linux/macOS)
+# Or on Windows: .venv\Scripts\activate
+
+# Or: pip install -e .
 ```

 * Install from PyPI

 ```bash
-pip install lightrag-hku
+uv pip install lightrag-hku
+# Or: pip install lightrag-hku
 ```

 ## Quick Start
--- a/docs/OfflineDeployment.md
+++ b/docs/OfflineDeployment.md
@ -23,10 +23,11 @@ LightRAG uses dynamic package installation (`pipmaster`) for optional features b

 LightRAG dynamically installs packages for:

- **Document Processing**: `docling`, `pypdf2`, `python-docx`, `python-pptx`, `openpyxl`
 - **Storage Backends**: `redis`, `neo4j`, `pymilvus`, `pymongo`, `asyncpg`, `qdrant-client`
 - **LLM Providers**: `openai`, `anthropic`, `ollama`, `zhipuai`, `aioboto3`, `voyageai`, `llama-index`, `lmdeploy`, `transformers`, `torch`
- Tiktoken Models**: BPE encoding models downloaded from OpenAI CDN
+- **Tiktoken Models**: BPE encoding models downloaded from OpenAI CDN
+
+**Note**: Document processing dependencies (`pypdf`, `python-docx`, `python-pptx`, `openpyxl`) are now pre-installed with the `api` extras group and no longer require dynamic installation.

 ## Quick Start

@ -75,32 +76,31 @@ LightRAG provides flexible dependency groups for different use cases:

 | Group | Description | Use Case |
 |-------|-------------|----------|
-| `offline-docs` | Document processing | PDF, DOCX, PPTX, XLSX files |
+| `api` | API server + document processing | FastAPI server with PDF, DOCX, PPTX, XLSX support |
 | `offline-storage` | Storage backends | Redis, Neo4j, MongoDB, PostgreSQL, etc. |
 | `offline-llm` | LLM providers | OpenAI, Anthropic, Ollama, etc. |
-| `offline` | All of the above | Complete offline deployment |
+| `offline` | Complete offline package | API + Storage + LLM (all features) |
+
+**Note**: Document processing (PDF, DOCX, PPTX, XLSX) is included in the `api` extras group. The previous `offline-docs` group has been merged into `api` for better integration.

 > Software packages requiring `transformers`, `torch`, or `cuda` will not be included in the offline dependency group.

 ### Installation Examples

 ```bash
-# Install only document processing dependencies
-pip install lightrag-hku[offline-docs]
+# Install API with document processing
+pip install lightrag-hku[api]

-# Install document processing and storage backends
-pip install lightrag-hku[offline-docs,offline-storage]
+# Install API and storage backends
+pip install lightrag-hku[api,offline-storage]

-# Install all offline dependencies
+# Install all offline dependencies (recommended for offline deployment)
 pip install lightrag-hku[offline]
 ```

 ### Using Individual Requirements Files

 ```bash
-# Document processing only
-pip install -r requirements-offline-docs.txt
-
 # Storage backends only
 pip install -r requirements-offline-storage.txt

@ -244,8 +244,8 @@ ls -la ~/.tiktoken_cache/
 **Solution**:
 ```bash
 # Pre-install the specific package you need
-# For document processing:
-pip install lightrag-hku[offline-docs]
+# For API with document processing:
+pip install lightrag-hku[api]

 # For storage backends:
 pip install lightrag-hku[offline-storage]
@ -297,9 +297,9 @@ mkdir -p ~/my_tiktoken_cache

 5. **Minimal Installation**: Only install what you need:
   ```bash
-   # If you only process PDFs with OpenAI
-   pip install lightrag-hku[offline-docs]
-   # Then manually add: pip install openai
+   # If you only need API with document processing
+   pip install lightrag-hku[api]
+   # Then manually add specific LLM: pip install openai
   ```

 ## Additional Resources
--- a/env.example
+++ b/env.example
@ -29,7 +29,7 @@ WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System"
 # OLLAMA_EMULATING_MODEL_NAME=lightrag
 OLLAMA_EMULATING_MODEL_TAG=latest

-### Max nodes return from graph retrieval in webui
+### Max nodes for graph retrieval (Ensure WebUI local settings are also updated, which is limited to this value)
 # MAX_GRAPH_NODES=1000

 ### Logging level
@ -172,6 +172,8 @@ MAX_PARALLEL_INSERT=2
 ### LLM Configuration
 ### LLM_BINDING type: openai, ollama, lollms, azure_openai, aws_bedrock, gemini
 ### LLM_BINDING_HOST: host only for Ollama, endpoint for other LLM service
+### If LightRAG deployed in Docker:
+###    uses host.docker.internal instead of localhost in LLM_BINDING_HOST
 ###########################################################################
 ### LLM request timeout setting for all llm (0 means no timeout for Ollma)
 # LLM_TIMEOUT=180
@ -181,7 +183,7 @@ LLM_MODEL=gpt-4o
 LLM_BINDING_HOST=https://api.openai.com/v1
 LLM_BINDING_API_KEY=your_api_key

-### Optional for Azure
+### Env vars for Azure openai
 # AZURE_OPENAI_API_VERSION=2024-08-01-preview
 # AZURE_OPENAI_DEPLOYMENT=gpt-4o

@ -196,22 +198,16 @@ LLM_BINDING_API_KEY=your_api_key
 # LLM_MODEL=gemini-flash-latest
 # LLM_BINDING_API_KEY=your_gemini_api_key
 # LLM_BINDING_HOST=https://generativelanguage.googleapis.com
-GEMINI_LLM_THINKING_CONFIG='{"thinking_budget": 0, "include_thoughts": false}'
+
+### use the following command to see all support options for OpenAI, azure_openai or OpenRouter
+### lightrag-server --llm-binding gemini --help
+### Gemini Specific Parameters
 # GEMINI_LLM_MAX_OUTPUT_TOKENS=9000
 # GEMINI_LLM_TEMPERATURE=0.7
-
-### OpenAI Compatible API Specific Parameters
-### Increased temperature values may mitigate infinite inference loops in certain LLM, such as Qwen3-30B.
-# OPENAI_LLM_TEMPERATURE=0.9
-### Set the max_tokens to mitigate endless output of some LLM (less than LLM_TIMEOUT * llm_output_tokens/second, i.e. 9000 = 180s * 50 tokens/s)
-### Typically, max_tokens does not include prompt content, though some models, such as Gemini Models, are exceptions
-### For vLLM/SGLang deployed models, or most of OpenAI compatible API provider
-# OPENAI_LLM_MAX_TOKENS=9000
-### For OpenAI o1-mini or newer modles
-OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
-
-#### OpenAI's new API utilizes max_completion_tokens instead of max_tokens
-# OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
+### Enable Thinking
+# GEMINI_LLM_THINKING_CONFIG='{"thinking_budget": -1, "include_thoughts": true}'
+### Disable Thinking
+# GEMINI_LLM_THINKING_CONFIG='{"thinking_budget": 0, "include_thoughts": false}'

 ### use the following command to see all support options for OpenAI, azure_openai or OpenRouter
 ### lightrag-server --llm-binding openai --help
@ -222,8 +218,17 @@ OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
 ### Qwen3 Specific Parameters deploy by vLLM
 # OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}'

+### OpenAI Compatible API Specific Parameters
+### Increased temperature values may mitigate infinite inference loops in certain LLM, such as Qwen3-30B.
+# OPENAI_LLM_TEMPERATURE=0.9
+### Set the max_tokens to mitigate endless output of some LLM (less than LLM_TIMEOUT * llm_output_tokens/second, i.e. 9000 = 180s * 50 tokens/s)
+### Typically, max_tokens does not include prompt content
+### For vLLM/SGLang deployed models, or most of OpenAI compatible API provider
+# OPENAI_LLM_MAX_TOKENS=9000
+### For OpenAI o1-mini or newer modles utilizes max_completion_tokens instead of max_tokens
+OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
+
 ### use the following command to see all support options for Ollama LLM
-### If LightRAG deployed in Docker uses host.docker.internal instead of localhost in LLM_BINDING_HOST
 ### lightrag-server --llm-binding ollama --help
 ### Ollama Server Specific Parameters
 ### OLLAMA_LLM_NUM_CTX must be provided, and should at least larger than MAX_TOTAL_TOKENS + 2000
@ -240,8 +245,18 @@ OLLAMA_LLM_NUM_CTX=32768
 ### Embedding Configuration (Should not be changed after the first file processed)
 ### EMBEDDING_BINDING: ollama, openai, azure_openai, jina, lollms, aws_bedrock
 ### EMBEDDING_BINDING_HOST: host only for Ollama, endpoint for other Embedding service
+### If LightRAG deployed in Docker:
+###    uses host.docker.internal instead of localhost in EMBEDDING_BINDING_HOST
 #######################################################################################
 # EMBEDDING_TIMEOUT=30
+
+### Control whether to send embedding_dim parameter to embedding API
+### IMPORTANT: Jina ALWAYS sends dimension parameter (API requirement) - this setting is ignored for Jina
+### For OpenAI: Set to 'true' to enable dynamic dimension adjustment
+### For OpenAI: Set to 'false' (default) to disable sending dimension parameter
+### Note: Automatically ignored for backends that don't support dimension parameter (e.g., Ollama)
+# EMBEDDING_SEND_DIM=false
+
 EMBEDDING_BINDING=ollama
 EMBEDDING_MODEL=bge-m3:latest
 EMBEDDING_DIM=1024
--- a/lightrag/init.py
+++ b/lightrag/init.py
@ -1,5 +1,5 @@
 from .lightrag import LightRAG as LightRAG, QueryParam as QueryParam

-__version__ = "1.4.9.8"
+__version__ = "1.4.9.9"
 __author__ = "Zirui Guo"
 __url__ = "https://github.com/HKUDS/LightRAG"
--- a/lightrag/api/README-zh.md
+++ b/lightrag/api/README-zh.md
@ -15,26 +15,34 @@ LightRAG 服务器旨在提供 Web 界面和 API 支持。Web 界面便于文档
 * 从 PyPI 安装

 ```bash
-pip install "lightrag-hku[api]"
+# 使用 uv (推荐)
+uv pip install "lightrag-hku[api]"
+
+# 或使用 pip
+# pip install "lightrag-hku[api]"
 ```

 * 从源代码安装

 ```bash
-# Clone the repository
+# 克隆仓库
 git clone https://github.com/HKUDS/lightrag.git

-# Change to the repository directory
+# 进入仓库目录
 cd lightrag

-# Create a Python virtual environment
-uv venv --seed --python 3.12
-source .venv/bin/activate
+# 使用 uv (推荐)
+# 注意: uv sync 会自动在 .venv/ 目录创建虚拟环境
+uv sync --extra api
+source .venv/bin/activate  # 激活虚拟环境 (Linux/macOS)
+# Windows 系统: .venv\Scripts\activate

-# Install in editable mode with API support
-pip install -e ".[api]"
+# 或使用 pip 与虚拟环境
+# python -m venv .venv
+# source .venv/bin/activate  # Windows: .venv\Scripts\activate
+# pip install -e ".[api]"

-# Build front-end artifacts
+# 构建前端代码
 cd lightrag_webui
 bun install --frozen-lockfile
 bun run build
@ -400,6 +408,10 @@ LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage

 在向 LightRAG 添加文档后，您不能更改存储实现选择。目前尚不支持从一个存储实现迁移到另一个存储实现。更多配置信息请阅读示例 `env.exampl`e文件。

+### 在不同存储类型之间迁移LLM缓存
+
+当LightRAG更换存储实现方式的时候，可以LLM缓存从就的存储迁移到新的存储。先以后在新的存储上重新上传文件时，将利用利用原有存储的LLM缓存大幅度加快文件处理的速度。LLM缓存迁移工具的使用方法请参考[README_MIGRATE_LLM_CACHE.md](../tools/README_MIGRATE_LLM_CACHE.md)
+
 ### LightRag API 服务器命令行选项

 | 参数 | 默认值 | 描述 |
--- a/lightrag/api/README.md
+++ b/lightrag/api/README.md
@ -15,7 +15,11 @@ The LightRAG Server is designed to provide a Web UI and API support. The Web UI
 * Install from PyPI

 ```bash
-pip install "lightrag-hku[api]"
+# Using uv (recommended)
+uv pip install "lightrag-hku[api]"
+
+# Or using pip
+# pip install "lightrag-hku[api]"
 ```

 * Installation from Source
@ -27,12 +31,16 @@ git clone https://github.com/HKUDS/lightrag.git
 # Change to the repository directory
 cd lightrag

-# Create a Python virtual environment
-uv venv --seed --python 3.12
-source .venv/bin/activate
+# Using uv (recommended)
+# Note: uv sync automatically creates a virtual environment in .venv/
+uv sync --extra api
+source .venv/bin/activate  # Activate the virtual environment (Linux/macOS)
+# Or on Windows: .venv\Scripts\activate

-# Install in editable mode with API support
-pip install -e ".[api]"
+# Or using pip with virtual environment
+# python -m venv .venv
+# source .venv/bin/activate  # Windows: .venv\Scripts\activate
+# pip install -e ".[api]"

 # Build front-end artifacts
 cd lightrag_webui
@ -410,6 +418,10 @@ LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage

 You cannot change storage implementation selection after adding documents to LightRAG. Data migration from one storage implementation to another is not supported yet. For further information, please read the sample env file or config.ini file.

+### LLM Cache Migration Between Storage Types
+
+When switching the storage implementation in LightRAG, the LLM cache can be migrated from the existing storage to the new one. Subsequently, when re-uploading files to the new storage, the pre-existing LLM cache will significantly accelerate file processing. For detailed instructions on using the LLM cache migration tool, please refer to[README_MIGRATE_LLM_CACHE.md](../tools/README_MIGRATE_LLM_CACHE.md)
+
 ### LightRAG API Server Command Line Options

 | Parameter             | Default       | Description                                                                                                                     |
--- a/lightrag/api/init.py
+++ b/lightrag/api/init.py
@ -1 +1 @@
-__api_version__ = "0251"
+__api_version__ = "0254"
--- a/lightrag/api/config.py
+++ b/lightrag/api/config.py
@ -8,6 +8,7 @@ import logging
 from dotenv import load_dotenv
 from lightrag.utils import get_env_value
 from lightrag.llm.binding_options import (
+    GeminiEmbeddingOptions,
    GeminiLLMOptions,
    OllamaEmbeddingOptions,
    OllamaLLMOptions,
@ -238,7 +239,15 @@ def parse_args() -> argparse.Namespace:
        "--embedding-binding",
        type=str,
        default=get_env_value("EMBEDDING_BINDING", "ollama"),
-        choices=["lollms", "ollama", "openai", "azure_openai", "aws_bedrock", "jina"],
+        choices=[
+            "lollms",
+            "ollama",
+            "openai",
+            "azure_openai",
+            "aws_bedrock",
+            "jina",
+            "gemini",
+        ],
        help="Embedding binding type (default: from env or ollama)",
    )
    parser.add_argument(
@ -249,6 +258,14 @@ def parse_args() -> argparse.Namespace:
        help=f"Rerank binding type (default: from env or {DEFAULT_RERANK_BINDING})",
    )

+    # Document loading engine configuration
+    parser.add_argument(
+        "--docling",
+        action="store_true",
+        default=False,
+        help="Enable DOCLING document loading engine (default: from env or DEFAULT)",
+    )
+
    # Conditionally add binding options defined in binding_options module
    # This will add command line arguments for all binding options (e.g., --ollama-embedding-num_ctx)
    # and corresponding environment variables (e.g., OLLAMA_EMBEDDING_NUM_CTX)
@ -265,12 +282,19 @@ def parse_args() -> argparse.Namespace:
    if "--embedding-binding" in sys.argv:
        try:
            idx = sys.argv.index("--embedding-binding")
-            if idx + 1 < len(sys.argv) and sys.argv[idx + 1] == "ollama":
-                OllamaEmbeddingOptions.add_args(parser)
+            if idx + 1 < len(sys.argv):
+                if sys.argv[idx + 1] == "ollama":
+                    OllamaEmbeddingOptions.add_args(parser)
+                elif sys.argv[idx + 1] == "gemini":
+                    GeminiEmbeddingOptions.add_args(parser)
        except IndexError:
            pass
-    elif os.environ.get("EMBEDDING_BINDING") == "ollama":
-        OllamaEmbeddingOptions.add_args(parser)
+    else:
+        env_embedding_binding = os.environ.get("EMBEDDING_BINDING")
+        if env_embedding_binding == "ollama":
+            OllamaEmbeddingOptions.add_args(parser)
+        elif env_embedding_binding == "gemini":
+            GeminiEmbeddingOptions.add_args(parser)

    # Add OpenAI LLM options when llm-binding is openai or azure_openai
    if "--llm-binding" in sys.argv:
@ -343,6 +367,7 @@ def parse_args() -> argparse.Namespace:
    args.llm_model = get_env_value("LLM_MODEL", "mistral-nemo:latest")
    args.embedding_model = get_env_value("EMBEDDING_MODEL", "bge-m3:latest")
    args.embedding_dim = get_env_value("EMBEDDING_DIM", 1024, int)
+    args.embedding_send_dim = get_env_value("EMBEDDING_SEND_DIM", False, bool)

    # Inject chunk configuration
    args.chunk_size = get_env_value("CHUNK_SIZE", 1200, int)
@ -354,8 +379,13 @@ def parse_args() -> argparse.Namespace:
    )
    args.enable_llm_cache = get_env_value("ENABLE_LLM_CACHE", True, bool)

-    # Select Document loading tool (DOCLING, DEFAULT)
-    args.document_loading_engine = get_env_value("DOCUMENT_LOADING_ENGINE", "DEFAULT")
+    # Set document_loading_engine from --docling flag
+    if args.docling:
+        args.document_loading_engine = "DOCLING"
+    else:
+        args.document_loading_engine = get_env_value(
+            "DOCUMENT_LOADING_ENGINE", "DEFAULT"
+        )

    # PDF decryption password
    args.pdf_decrypt_password = get_env_value("PDF_DECRYPT_PASSWORD", None)
@ -432,4 +462,83 @@ def update_uvicorn_mode_config():
        )


-global_args = parse_args()
+# Global configuration with lazy initialization
+_global_args = None
+_initialized = False
+
+
+def initialize_config(args=None, force=False):
+    """Initialize global configuration
+
+    This function allows explicit initialization of the configuration,
+    which is useful for programmatic usage, testing, or embedding LightRAG
+    in other applications.
+
+    Args:
+        args: Pre-parsed argparse.Namespace or None to parse from sys.argv
+        force: Force re-initialization even if already initialized
+
+    Returns:
+        argparse.Namespace: The configured arguments
+
+    Example:
+        # Use parsed command line arguments (default)
+        initialize_config()
+
+        # Use custom configuration programmatically
+        custom_args = argparse.Namespace(
+            host='localhost',
+            port=8080,
+            working_dir='./custom_rag',
+            # ... other config
+        )
+        initialize_config(custom_args)
+    """
+    global _global_args, _initialized
+
+    if _initialized and not force:
+        return _global_args
+
+    _global_args = args if args is not None else parse_args()
+    _initialized = True
+    return _global_args
+
+
+def get_config():
+    """Get global configuration, auto-initializing if needed
+
+    Returns:
+        argparse.Namespace: The configured arguments
+    """
+    if not _initialized:
+        initialize_config()
+    return _global_args
+
+
+class _GlobalArgsProxy:
+    """Proxy object that auto-initializes configuration on first access
+
+    This maintains backward compatibility with existing code while
+    allowing programmatic control over initialization timing.
+    """
+
+    def __getattr__(self, name):
+        if not _initialized:
+            initialize_config()
+        return getattr(_global_args, name)
+
+    def __setattr__(self, name, value):
+        if not _initialized:
+            initialize_config()
+        setattr(_global_args, name, value)
+
+    def __repr__(self):
+        if not _initialized:
+            return "<GlobalArgsProxy: Not initialized>"
+        return repr(_global_args)
+
+
+# Create proxy instance for backward compatibility
+# Existing code like `from config import global_args` continues to work
+# The proxy will auto-initialize on first attribute access
+global_args = _GlobalArgsProxy()
--- a/lightrag/api/lightrag_server.py
+++ b/lightrag/api/lightrag_server.py
@ -15,7 +15,6 @@ import logging.config
 import sys
 import uvicorn
 import pipmaster as pm
-import inspect
 from fastapi.staticfiles import StaticFiles
 from fastapi.responses import RedirectResponse
 from pathlib import Path
@ -90,6 +89,7 @@ class LLMConfigCache:
        # Initialize configurations based on binding conditions
        self.openai_llm_options = None
        self.gemini_llm_options = None
+        self.gemini_embedding_options = None
        self.ollama_llm_options = None
        self.ollama_embedding_options = None

@ -136,6 +136,23 @@ class LLMConfigCache:
                )
                self.ollama_embedding_options = {}

+        # Only initialize and log Gemini Embedding options when using Gemini Embedding binding
+        if args.embedding_binding == "gemini":
+            try:
+                from lightrag.llm.binding_options import GeminiEmbeddingOptions
+
+                self.gemini_embedding_options = GeminiEmbeddingOptions.options_dict(
+                    args
+                )
+                logger.info(
+                    f"Gemini Embedding Options: {self.gemini_embedding_options}"
+                )
+            except ImportError:
+                logger.warning(
+                    "GeminiEmbeddingOptions not available, using default configuration"
+                )
+                self.gemini_embedding_options = {}
+

 def check_frontend_build():
    """Check if frontend is built and optionally check if source is up-to-date
@ -297,6 +314,7 @@ def create_app(args):
        "azure_openai",
        "aws_bedrock",
        "jina",
+        "gemini",
    ]:
        raise Exception("embedding binding not supported")

@ -599,14 +617,14 @@ def create_app(args):
        return {}

    def create_optimized_embedding_function(
-        config_cache: LLMConfigCache, binding, model, host, api_key, dimensions, args
+        config_cache: LLMConfigCache, binding, model, host, api_key, args
    ):
        """
        Create optimized embedding function with pre-processed configuration for applicable bindings.
        Uses lazy imports for all bindings and avoids repeated configuration parsing.
        """

-        async def optimized_embedding_function(texts):
+        async def optimized_embedding_function(texts, embedding_dim=None):
            try:
                if binding == "lollms":
                    from lightrag.llm.lollms import lollms_embed
@ -645,13 +663,40 @@ def create_app(args):
                    from lightrag.llm.jina import jina_embed

                    return await jina_embed(
-                        texts, dimensions=dimensions, base_url=host, api_key=api_key
+                        texts,
+                        embedding_dim=embedding_dim,
+                        base_url=host,
+                        api_key=api_key,
+                    )
+                elif binding == "gemini":
+                    from lightrag.llm.gemini import gemini_embed
+
+                    # Use pre-processed configuration if available, otherwise fallback to dynamic parsing
+                    if config_cache.gemini_embedding_options is not None:
+                        gemini_options = config_cache.gemini_embedding_options
+                    else:
+                        # Fallback for cases where config cache wasn't initialized properly
+                        from lightrag.llm.binding_options import GeminiEmbeddingOptions
+
+                        gemini_options = GeminiEmbeddingOptions.options_dict(args)
+
+                    return await gemini_embed(
+                        texts,
+                        model=model,
+                        base_url=host,
+                        api_key=api_key,
+                        embedding_dim=embedding_dim,
+                        task_type=gemini_options.get("task_type", "RETRIEVAL_DOCUMENT"),
                    )
                else:  # openai and compatible
                    from lightrag.llm.openai import openai_embed

                    return await openai_embed(
-                        texts, model=model, base_url=host, api_key=api_key
+                        texts,
+                        model=model,
+                        base_url=host,
+                        api_key=api_key,
+                        embedding_dim=embedding_dim,
                    )
            except ImportError as e:
                raise Exception(f"Failed to import {binding} embedding: {e}")
@ -691,17 +736,52 @@ def create_app(args):
        )

    # Create embedding function with optimized configuration
+    import inspect
+
+    # Create the optimized embedding function
+    optimized_embedding_func = create_optimized_embedding_function(
+        config_cache=config_cache,
+        binding=args.embedding_binding,
+        model=args.embedding_model,
+        host=args.embedding_binding_host,
+        api_key=args.embedding_binding_api_key,
+        args=args,  # Pass args object for fallback option generation
+    )
+
+    # Get embedding_send_dim from centralized configuration
+    embedding_send_dim = args.embedding_send_dim
+
+    # Check if the function signature has embedding_dim parameter
+    # Note: Since optimized_embedding_func is an async function, inspect its signature
+    sig = inspect.signature(optimized_embedding_func)
+    has_embedding_dim_param = "embedding_dim" in sig.parameters
+
+    # Determine send_dimensions value based on binding type
+    # Jina and Gemini REQUIRE dimension parameter (forced to True)
+    # OpenAI and others: controlled by EMBEDDING_SEND_DIM environment variable
+    if args.embedding_binding in ["jina", "gemini"]:
+        # Jina and Gemini APIs require dimension parameter - always send it
+        send_dimensions = has_embedding_dim_param
+        dimension_control = f"forced by {args.embedding_binding.title()} API"
+    else:
+        # For OpenAI and other bindings, respect EMBEDDING_SEND_DIM setting
+        send_dimensions = embedding_send_dim and has_embedding_dim_param
+        if send_dimensions or not embedding_send_dim:
+            dimension_control = "by env var"
+        else:
+            dimension_control = "by not hasparam"
+
+    logger.info(
+        f"Send embedding dimension: {send_dimensions} {dimension_control} "
+        f"(dimensions={args.embedding_dim}, has_param={has_embedding_dim_param}, "
+        f"binding={args.embedding_binding})"
+    )
+
+    # Create EmbeddingFunc with send_dimensions attribute
    embedding_func = EmbeddingFunc(
        embedding_dim=args.embedding_dim,
-        func=create_optimized_embedding_function(
-            config_cache=config_cache,
-            binding=args.embedding_binding,
-            model=args.embedding_model,
-            host=args.embedding_binding_host,
-            api_key=args.embedding_binding_api_key,
-            dimensions=args.embedding_dim,
-            args=args,  # Pass args object for fallback option generation
-        ),
+        func=optimized_embedding_func,
+        send_dimensions=send_dimensions,
    )

    # Configure rerank function based on args.rerank_bindingparameter
@ -1134,6 +1214,12 @@ def check_and_install_dependencies():


 def main():
+    # Explicitly initialize configuration for clarity
+    # (The proxy will auto-initialize anyway, but this makes intent clear)
+    from .config import initialize_config
+
+    initialize_config()
+
    # Check if running under Gunicorn
    if "GUNICORN_CMD_ARGS" in os.environ:
        # If started with Gunicorn, return directly as Gunicorn will call get_application
--- a/lightrag/api/routers/document_routes.py
+++ b/lightrag/api/routers/document_routes.py
@ -3,14 +3,15 @@ This module contains all document-related routes for the LightRAG API.
 """

 import asyncio
+from functools import lru_cache
 from lightrag.utils import logger, get_pinyin_sort_key
 import aiofiles
 import shutil
 import traceback
-import pipmaster as pm
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Dict, List, Optional, Any, Literal
+from io import BytesIO
 from fastapi import (
    APIRouter,
    BackgroundTasks,
@ -28,6 +29,24 @@ from lightrag.api.utils_api import get_combined_auth_dependency
 from ..config import global_args


+@lru_cache(maxsize=1)
+def _is_docling_available() -> bool:
+    """Check if docling is available (cached check).
+
+    This function uses lru_cache to avoid repeated import attempts.
+    The result is cached after the first call.
+
+    Returns:
+        bool: True if docling is available, False otherwise
+    """
+    try:
+        import docling  # noqa: F401  # type: ignore[import-not-found]
+
+        return True
+    except ImportError:
+        return False
+
+
 # Function to format datetime to ISO format string with timezone information
 def format_datetime(dt: Any) -> Optional[str]:
    """Format datetime to ISO format string with timezone information
@ -879,7 +898,6 @@ def get_unique_filename_in_enqueued(target_dir: Path, original_name: str) -> str
    Returns:
        str: Unique filename (may have numeric suffix added)
    """
-    from pathlib import Path
    import time

    original_path = Path(original_name)
@ -902,6 +920,122 @@ def get_unique_filename_in_enqueued(target_dir: Path, original_name: str) -> str
    return f"{base_name}_{timestamp}{extension}"


+# Document processing helper functions (synchronous)
+# These functions run in thread pool via asyncio.to_thread() to avoid blocking the event loop
+
+
+def _convert_with_docling(file_path: Path) -> str:
+    """Convert document using docling (synchronous).
+
+    Args:
+        file_path: Path to the document file
+
+    Returns:
+        str: Extracted markdown content
+    """
+    from docling.document_converter import DocumentConverter  # type: ignore
+
+    converter = DocumentConverter()
+    result = converter.convert(file_path)
+    return result.document.export_to_markdown()
+
+
+def _extract_pdf_pypdf(file_bytes: bytes, password: str = None) -> str:
+    """Extract PDF content using pypdf (synchronous).
+
+    Args:
+        file_bytes: PDF file content as bytes
+        password: Optional password for encrypted PDFs
+
+    Returns:
+        str: Extracted text content
+
+    Raises:
+        Exception: If PDF is encrypted and password is incorrect or missing
+    """
+    from pypdf import PdfReader  # type: ignore
+
+    pdf_file = BytesIO(file_bytes)
+    reader = PdfReader(pdf_file)
+
+    # Check if PDF is encrypted
+    if reader.is_encrypted:
+        if not password:
+            raise Exception("PDF is encrypted but no password provided")
+
+        decrypt_result = reader.decrypt(password)
+        if decrypt_result == 0:
+            raise Exception("Incorrect PDF password")
+
+    # Extract text from all pages
+    content = ""
+    for page in reader.pages:
+        content += page.extract_text() + "\n"
+
+    return content
+
+
+def _extract_docx(file_bytes: bytes) -> str:
+    """Extract DOCX content (synchronous).
+
+    Args:
+        file_bytes: DOCX file content as bytes
+
+    Returns:
+        str: Extracted text content
+    """
+    from docx import Document  # type: ignore
+
+    docx_file = BytesIO(file_bytes)
+    doc = Document(docx_file)
+    return "\n".join([paragraph.text for paragraph in doc.paragraphs])
+
+
+def _extract_pptx(file_bytes: bytes) -> str:
+    """Extract PPTX content (synchronous).
+
+    Args:
+        file_bytes: PPTX file content as bytes
+
+    Returns:
+        str: Extracted text content
+    """
+    from pptx import Presentation  # type: ignore
+
+    pptx_file = BytesIO(file_bytes)
+    prs = Presentation(pptx_file)
+    content = ""
+    for slide in prs.slides:
+        for shape in slide.shapes:
+            if hasattr(shape, "text"):
+                content += shape.text + "\n"
+    return content
+
+
+def _extract_xlsx(file_bytes: bytes) -> str:
+    """Extract XLSX content (synchronous).
+
+    Args:
+        file_bytes: XLSX file content as bytes
+
+    Returns:
+        str: Extracted text content
+    """
+    from openpyxl import load_workbook  # type: ignore
+
+    xlsx_file = BytesIO(file_bytes)
+    wb = load_workbook(xlsx_file)
+    content = ""
+    for sheet in wb:
+        content += f"Sheet: {sheet.title}\n"
+        for row in sheet.iter_rows(values_only=True):
+            content += (
+                "\t".join(str(cell) if cell is not None else "" for cell in row) + "\n"
+            )
+        content += "\n"
+    return content
+
+
 async def pipeline_enqueue_file(
    rag: LightRAG, file_path: Path, track_id: str = None
 ) -> tuple[bool, str]:
@ -1072,87 +1206,28 @@ async def pipeline_enqueue_file(

                case ".pdf":
                    try:
-                        if global_args.document_loading_engine == "DOCLING":
-                            if not pm.is_installed("docling"):  # type: ignore
-                                pm.install("docling")
-                            from docling.document_converter import DocumentConverter  # type: ignore
-
-                            converter = DocumentConverter()
-                            result = converter.convert(file_path)
-                            content = result.document.export_to_markdown()
+                        # Try DOCLING first if configured and available
+                        if (
+                            global_args.document_loading_engine == "DOCLING"
+                            and _is_docling_available()
+                        ):
+                            content = await asyncio.to_thread(
+                                _convert_with_docling, file_path
+                            )
                        else:
-                            if not pm.is_installed("pypdf2"):  # type: ignore
-                                pm.install("pypdf2")
-                            if not pm.is_installed("pycryptodome"):  # type: ignore
-                                pm.install("pycryptodome")
-                            from PyPDF2 import PdfReader  # type: ignore
-                            from io import BytesIO
-
-                            pdf_file = BytesIO(file)
-                            reader = PdfReader(pdf_file)
-
-                            # Check if PDF is encrypted
-                            if reader.is_encrypted:
-                                pdf_password = global_args.pdf_decrypt_password
-                                if not pdf_password:
-                                    # PDF is encrypted but no password provided
-                                    error_files = [
-                                        {
-                                            "file_path": str(file_path.name),
-                                            "error_description": "[File Extraction]PDF is encrypted but no password provided",
-                                            "original_error": "Please set PDF_DECRYPT_PASSWORD environment variable to decrypt this PDF file",
-                                            "file_size": file_size,
-                                        }
-                                    ]
-                                    await rag.apipeline_enqueue_error_documents(
-                                        error_files, track_id
-                                    )
-                                    logger.error(
-                                        f"[File Extraction]PDF is encrypted but no password provided: {file_path.name}"
-                                    )
-                                    return False, track_id
-
-                                # Try to decrypt with password
-                                try:
-                                    decrypt_result = reader.decrypt(pdf_password)
-                                    if decrypt_result == 0:
-                                        # Password is incorrect
-                                        error_files = [
-                                            {
-                                                "file_path": str(file_path.name),
-                                                "error_description": "[File Extraction]Failed to decrypt PDF - incorrect password",
-                                                "original_error": "The provided PDF_DECRYPT_PASSWORD is incorrect for this file",
-                                                "file_size": file_size,
-                                            }
-                                        ]
-                                        await rag.apipeline_enqueue_error_documents(
-                                            error_files, track_id
-                                        )
-                                        logger.error(
-                                            f"[File Extraction]Incorrect PDF password: {file_path.name}"
-                                        )
-                                        return False, track_id
-                                except Exception as decrypt_error:
-                                    # Decryption process error
-                                    error_files = [
-                                        {
-                                            "file_path": str(file_path.name),
-                                            "error_description": "[File Extraction]PDF decryption failed",
-                                            "original_error": f"Error during PDF decryption: {str(decrypt_error)}",
-                                            "file_size": file_size,
-                                        }
-                                    ]
-                                    await rag.apipeline_enqueue_error_documents(
-                                        error_files, track_id
-                                    )
-                                    logger.error(
-                                        f"[File Extraction]PDF decryption error for {file_path.name}: {str(decrypt_error)}"
-                                    )
-                                    return False, track_id
-
-                            # Extract text from PDF (encrypted PDFs are now decrypted, unencrypted PDFs proceed directly)
-                            for page in reader.pages:
-                                content += page.extract_text() + "\n"
+                            if (
+                                global_args.document_loading_engine == "DOCLING"
+                                and not _is_docling_available()
+                            ):
+                                logger.warning(
+                                    f"DOCLING engine configured but not available for {file_path.name}. Falling back to pypdf."
+                                )
+                            # Use pypdf (non-blocking via to_thread)
+                            content = await asyncio.to_thread(
+                                _extract_pdf_pypdf,
+                                file,
+                                global_args.pdf_decrypt_password,
+                            )
                    except Exception as e:
                        error_files = [
                            {
@ -1172,28 +1247,24 @@ async def pipeline_enqueue_file(

                case ".docx":
                    try:
-                        if global_args.document_loading_engine == "DOCLING":
-                            if not pm.is_installed("docling"):  # type: ignore
-                                pm.install("docling")
-                            from docling.document_converter import DocumentConverter  # type: ignore
-
-                            converter = DocumentConverter()
-                            result = converter.convert(file_path)
-                            content = result.document.export_to_markdown()
-                        else:
-                            if not pm.is_installed("python-docx"):  # type: ignore
-                                try:
-                                    pm.install("python-docx")
-                                except Exception:
-                                    pm.install("docx")
-                            from docx import Document  # type: ignore
-                            from io import BytesIO
-
-                            docx_file = BytesIO(file)
-                            doc = Document(docx_file)
-                            content = "\n".join(
-                                [paragraph.text for paragraph in doc.paragraphs]
+                        # Try DOCLING first if configured and available
+                        if (
+                            global_args.document_loading_engine == "DOCLING"
+                            and _is_docling_available()
+                        ):
+                            content = await asyncio.to_thread(
+                                _convert_with_docling, file_path
                            )
+                        else:
+                            if (
+                                global_args.document_loading_engine == "DOCLING"
+                                and not _is_docling_available()
+                            ):
+                                logger.warning(
+                                    f"DOCLING engine configured but not available for {file_path.name}. Falling back to python-docx."
+                                )
+                            # Use python-docx (non-blocking via to_thread)
+                            content = await asyncio.to_thread(_extract_docx, file)
                    except Exception as e:
                        error_files = [
                            {
@ -1213,26 +1284,24 @@ async def pipeline_enqueue_file(

                case ".pptx":
                    try:
-                        if global_args.document_loading_engine == "DOCLING":
-                            if not pm.is_installed("docling"):  # type: ignore
-                                pm.install("docling")
-                            from docling.document_converter import DocumentConverter  # type: ignore
-
-                            converter = DocumentConverter()
-                            result = converter.convert(file_path)
-                            content = result.document.export_to_markdown()
+                        # Try DOCLING first if configured and available
+                        if (
+                            global_args.document_loading_engine == "DOCLING"
+                            and _is_docling_available()
+                        ):
+                            content = await asyncio.to_thread(
+                                _convert_with_docling, file_path
+                            )
                        else:
-                            if not pm.is_installed("python-pptx"):  # type: ignore
-                                pm.install("pptx")
-                            from pptx import Presentation  # type: ignore
-                            from io import BytesIO
-
-                            pptx_file = BytesIO(file)
-                            prs = Presentation(pptx_file)
-                            for slide in prs.slides:
-                                for shape in slide.shapes:
-                                    if hasattr(shape, "text"):
-                                        content += shape.text + "\n"
+                            if (
+                                global_args.document_loading_engine == "DOCLING"
+                                and not _is_docling_available()
+                            ):
+                                logger.warning(
+                                    f"DOCLING engine configured but not available for {file_path.name}. Falling back to python-pptx."
+                                )
+                            # Use python-pptx (non-blocking via to_thread)
+                            content = await asyncio.to_thread(_extract_pptx, file)
                    except Exception as e:
                        error_files = [
                            {
@ -1252,33 +1321,24 @@ async def pipeline_enqueue_file(

                case ".xlsx":
                    try:
-                        if global_args.document_loading_engine == "DOCLING":
-                            if not pm.is_installed("docling"):  # type: ignore
-                                pm.install("docling")
-                            from docling.document_converter import DocumentConverter  # type: ignore
-
-                            converter = DocumentConverter()
-                            result = converter.convert(file_path)
-                            content = result.document.export_to_markdown()
+                        # Try DOCLING first if configured and available
+                        if (
+                            global_args.document_loading_engine == "DOCLING"
+                            and _is_docling_available()
+                        ):
+                            content = await asyncio.to_thread(
+                                _convert_with_docling, file_path
+                            )
                        else:
-                            if not pm.is_installed("openpyxl"):  # type: ignore
-                                pm.install("openpyxl")
-                            from openpyxl import load_workbook  # type: ignore
-                            from io import BytesIO
-
-                            xlsx_file = BytesIO(file)
-                            wb = load_workbook(xlsx_file)
-                            for sheet in wb:
-                                content += f"Sheet: {sheet.title}\n"
-                                for row in sheet.iter_rows(values_only=True):
-                                    content += (
-                                        "\t".join(
-                                            str(cell) if cell is not None else ""
-                                            for cell in row
-                                        )
-                                        + "\n"
-                                    )
-                                content += "\n"
+                            if (
+                                global_args.document_loading_engine == "DOCLING"
+                                and not _is_docling_available()
+                            ):
+                                logger.warning(
+                                    f"DOCLING engine configured but not available for {file_path.name}. Falling back to openpyxl."
+                                )
+                            # Use openpyxl (non-blocking via to_thread)
+                            content = await asyncio.to_thread(_extract_xlsx, file)
                    except Exception as e:
                        error_files = [
                            {
--- a/lightrag/api/run_with_gunicorn.py
+++ b/lightrag/api/run_with_gunicorn.py
@ -5,6 +5,7 @@ Start LightRAG server with Gunicorn

 import os
 import sys
+import platform
 import pipmaster as pm
 from lightrag.api.utils_api import display_splash_screen, check_env_file
 from lightrag.api.config import global_args
@ -34,6 +35,11 @@ def check_and_install_dependencies():


 def main():
+    # Explicitly initialize configuration for Gunicorn mode
+    from lightrag.api.config import initialize_config
+
+    initialize_config()
+
    # Set Gunicorn mode flag for lifespan cleanup detection
    os.environ["LIGHTRAG_GUNICORN_MODE"] = "1"

@ -41,6 +47,35 @@ def main():
    if not check_env_file():
        sys.exit(1)

+    # Check DOCLING compatibility with Gunicorn multi-worker mode on macOS
+    if (
+        platform.system() == "Darwin"
+        and global_args.document_loading_engine == "DOCLING"
+        and global_args.workers > 1
+    ):
+        print("\n" + "=" * 80)
+        print("❌ ERROR: Incompatible configuration detected!")
+        print("=" * 80)
+        print(
+            "\nDOCLING engine with Gunicorn multi-worker mode is not supported on macOS"
+        )
+        print("\nReason:")
+        print("  PyTorch (required by DOCLING) has known compatibility issues with")
+        print("  fork-based multiprocessing on macOS, which can cause crashes or")
+        print("  unexpected behavior when using Gunicorn with multiple workers.")
+        print("\nCurrent configuration:")
+        print("  - Operating System: macOS (Darwin)")
+        print(f"  - Document Engine: {global_args.document_loading_engine}")
+        print(f"  - Workers: {global_args.workers}")
+        print("\nPossible solutions:")
+        print("  1. Use single worker mode:")
+        print("     --workers 1")
+        print("\n  2. Change document loading engine in .env:")
+        print("     DOCUMENT_LOADING_ENGINE=DEFAULT")
+        print("\n  3. Deploy on Linux where multi-worker mode is fully supported")
+        print("=" * 80 + "\n")
+        sys.exit(1)
+
    # Check and install dependencies
    check_and_install_dependencies()

--- a/lightrag/evaluation/README_EVALUASTION_RAGAS.md
+++ b/lightrag/evaluation/README_EVALUASTION_RAGAS.md
--- a/lightrag/kg/init.py
+++ b/lightrag/kg/init.py
@ -46,13 +46,19 @@ STORAGE_IMPLEMENTATIONS = {
 STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
    # KV Storage Implementations
    "JsonKVStorage": [],
-    "MongoKVStorage": [],
+    "MongoKVStorage": [
+        "MONGO_URI",
+        "MONGO_DATABASE",
+    ],
    "RedisKVStorage": ["REDIS_URI"],
    "PGKVStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
    # Graph Storage Implementations
    "NetworkXStorage": [],
    "Neo4JStorage": ["NEO4J_URI", "NEO4J_USERNAME", "NEO4J_PASSWORD"],
-    "MongoGraphStorage": [],
+    "MongoGraphStorage": [
+        "MONGO_URI",
+        "MONGO_DATABASE",
+    ],
    "MemgraphStorage": ["MEMGRAPH_URI"],
    "TigerGraphStorage": [
        "TIGERGRAPH_URI",
@ -71,17 +77,26 @@ STORAGE_ENV_REQUIREMENTS: dict[str, list[str]] = {
    ],
    # Vector Storage Implementations
    "NanoVectorDBStorage": [],
-    "MilvusVectorDBStorage": [],
-    "ChromaVectorDBStorage": [],
+    "MilvusVectorDBStorage": [
+        "MILVUS_URI",
+        "MILVUS_DB_NAME",
+    ],
+    # "ChromaVectorDBStorage": [],
    "PGVectorStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
    "FaissVectorDBStorage": [],
    "QdrantVectorDBStorage": ["QDRANT_URL"],  # QDRANT_API_KEY has default value None
-    "MongoVectorDBStorage": [],
+    "MongoVectorDBStorage": [
+        "MONGO_URI",
+        "MONGO_DATABASE",
+    ],
    # Document Status Storage Implementations
    "JsonDocStatusStorage": [],
    "RedisDocStatusStorage": ["REDIS_URI"],
    "PGDocStatusStorage": ["POSTGRES_USER", "POSTGRES_PASSWORD", "POSTGRES_DATABASE"],
-    "MongoDocStatusStorage": [],
+    "MongoDocStatusStorage": [
+        "MONGO_URI",
+        "MONGO_DATABASE",
+    ],
 }

 # Storage implementation module mapping
--- a/lightrag/kg/json_doc_status_impl.py
+++ b/lightrag/kg/json_doc_status_impl.py
@ -161,7 +161,20 @@ class JsonDocStatusStorage(DocStatusStorage):
                logger.debug(
                    f"[{self.workspace}] Process {os.getpid()} doc status writting {len(data_dict)} records to {self.namespace}"
                )
-                write_json(data_dict, self._file_name)
+
+                # Write JSON and check if sanitization was applied
+                needs_reload = write_json(data_dict, self._file_name)
+
+                # If data was sanitized, reload cleaned data to update shared memory
+                if needs_reload:
+                    logger.info(
+                        f"[{self.workspace}] Reloading sanitized data into shared memory for {self.namespace}"
+                    )
+                    cleaned_data = load_json(self._file_name)
+                    if cleaned_data is not None:
+                        self._data.clear()
+                        self._data.update(cleaned_data)
+
                await clear_all_update_flags(self.final_namespace)

    async def upsert(self, data: dict[str, dict[str, Any]]) -> None:
--- a/lightrag/kg/json_kv_impl.py
+++ b/lightrag/kg/json_kv_impl.py
@ -81,7 +81,20 @@ class JsonKVStorage(BaseKVStorage):
                logger.debug(
                    f"[{self.workspace}] Process {os.getpid()} KV writting {data_count} records to {self.namespace}"
                )
-                write_json(data_dict, self._file_name)
+
+                # Write JSON and check if sanitization was applied
+                needs_reload = write_json(data_dict, self._file_name)
+
+                # If data was sanitized, reload cleaned data to update shared memory
+                if needs_reload:
+                    logger.info(
+                        f"[{self.workspace}] Reloading sanitized data into shared memory for {self.namespace}"
+                    )
+                    cleaned_data = load_json(self._file_name)
+                    if cleaned_data is not None:
+                        self._data.clear()
+                        self._data.update(cleaned_data)
+
                await clear_all_update_flags(self.final_namespace)

    async def get_by_id(self, id: str) -> dict[str, Any] | None:
@ -224,7 +237,7 @@ class JsonKVStorage(BaseKVStorage):
            data: Original data dictionary that may contain legacy structure

        Returns:
-            Migrated data dictionary with flattened cache keys
+            Migrated data dictionary with flattened cache keys (sanitized if needed)
        """
        from lightrag.utils import generate_cache_key

@ -261,8 +274,17 @@ class JsonKVStorage(BaseKVStorage):
            logger.info(
                f"[{self.workspace}] Migrated {migration_count} legacy cache entries to flattened structure"
            )
-            # Persist migrated data immediately
-            write_json(migrated_data, self._file_name)
+            # Persist migrated data immediately and check if sanitization was applied
+            needs_reload = write_json(migrated_data, self._file_name)
+
+            # If data was sanitized during write, reload cleaned data
+            if needs_reload:
+                logger.info(
+                    f"[{self.workspace}] Reloading sanitized migration data for {self.namespace}"
+                )
+                cleaned_data = load_json(self._file_name)
+                if cleaned_data is not None:
+                    return cleaned_data  # Return cleaned data to update shared memory

        return migrated_data

--- a/lightrag/kg/tigergraph_impl.py
+++ b/lightrag/kg/tigergraph_impl.py
@ -776,7 +776,8 @@ class TigerGraphStorage(BaseGraphStorage):
                    )
                if matching_vertices:
                    node_data = matching_vertices[0]["attributes"].copy()
-                    # Convert labels to list if needed, and filter out workspace label
+                    # Convert labels to list if needed, and filter out workspace label, entity_type, and "UNKNOWN"
+                    # Labels should only be used for workspace filtering, not for storing entity_type
                    if "labels" in node_data:
                        labels = node_data["labels"]
                        if isinstance(labels, (set, tuple)):
@ -785,9 +786,15 @@ class TigerGraphStorage(BaseGraphStorage):
                            labels_list = (
                                labels.copy() if isinstance(labels, list) else []
                            )
-                        # Remove workspace label from labels list (similar to Memgraph)
+                        # Remove workspace label, entity_type, and "UNKNOWN" from labels list
+                        # Only workspace should be in labels for filtering - everything else should be filtered out
+                        entity_type = node_data.get("entity_type")
                        labels_list = [
-                            label for label in labels_list if label != workspace_label
+                            label
+                            for label in labels_list
+                            if label != workspace_label
+                            and label != entity_type
+                            and label != "UNKNOWN"
                        ]
                        node_data["labels"] = labels_list
                    # Keep entity_id in the dict
@ -874,16 +881,20 @@ class TigerGraphStorage(BaseGraphStorage):
                                    # Check if workspace label is in labels
                                    if workspace_label in labels:
                                        node_data = attrs.copy()
-                                        # Convert labels to list and filter out workspace label
+                                        # Convert labels to list and filter out workspace label, entity_type, and "UNKNOWN"
+                                        # Labels should only be used for workspace filtering, not for storing entity_type
                                        if isinstance(labels, (set, tuple)):
                                            labels_list = list(labels)
                                        else:
                                            labels_list = labels.copy()

+                                        entity_type = node_data.get("entity_type")
                                        labels_list = [
                                            label
                                            for label in labels_list
                                            if label != workspace_label
+                                            and label != entity_type
+                                            and label != "UNKNOWN"
                                        ]
                                        node_data["labels"] = labels_list
                                        # Ensure entity_id is in the dict
@ -1217,7 +1228,19 @@ class TigerGraphStorage(BaseGraphStorage):
                # entity_type should NOT be in labels - it's stored in entity_type property
                # Always set labels to contain only workspace, regardless of what's in node_data
                # This ensures entity_type never seeps into labels, even if it was there before
-                node_data_copy["labels"] = [workspace_label]
+                # Explicitly remove labels key first, then set it fresh to avoid any merge behavior
+                if "labels" in node_data_copy:
+                    del node_data_copy["labels"]
+
+                # Set labels to only contain workspace_label, explicitly filtering out "UNKNOWN"
+                # Even though workspace_label should never be "UNKNOWN", we filter to be safe
+                # and to prevent any accidental inclusion of "UNKNOWN" from old data
+                labels_to_set = (
+                    [workspace_label]
+                    if workspace_label and workspace_label != "UNKNOWN"
+                    else []
+                )
+                node_data_copy["labels"] = labels_to_set

                # Upsert vertex
                self._conn.upsertVertex(
@ -1240,74 +1263,42 @@ class TigerGraphStorage(BaseGraphStorage):
        self, source_node_id: str, target_node_id: str, edge_data: dict[str, str]
    ) -> None:
        """Upsert an edge and its properties between two nodes."""
-        workspace_label = self._get_workspace_label()

+        # Ensure both nodes exist first - use upsert_node to ensure clean labels
+        # This ensures all node creation goes through the same label-cleaning logic
+        source_exists = await self.has_node(source_node_id)
+        if not source_exists:
+            # Create source node with minimal data - upsert_node will ensure clean labels
+            await self.upsert_node(
+                source_node_id,
+                {
+                    "entity_id": source_node_id,
+                    "entity_type": "UNKNOWN",
+                    "description": "",
+                    "source_id": "",
+                    "file_path": "",
+                    "created_at": 0,
+                },
+            )
+
+        target_exists = await self.has_node(target_node_id)
+        if not target_exists:
+            # Create target node with minimal data - upsert_node will ensure clean labels
+            await self.upsert_node(
+                target_node_id,
+                {
+                    "entity_id": target_node_id,
+                    "entity_type": "UNKNOWN",
+                    "description": "",
+                    "source_id": "",
+                    "file_path": "",
+                    "created_at": 0,
+                },
+            )
+
+        # Upsert edge (undirected, so direction doesn't matter)
        def _upsert_edge():
            try:
-                # Ensure both nodes exist first
-                # Check if source node exists using getVerticesById
-                source_exists = False
-                try:
-                    source_result = self._conn.getVerticesById(
-                        VertexType.ENTITY.value, source_node_id
-                    )
-                    if (
-                        isinstance(source_result, dict)
-                        and source_node_id in source_result
-                    ):
-                        attrs = source_result[source_node_id].get("attributes", {})
-                        labels = attrs.get("labels", set())
-                        if isinstance(labels, set) and workspace_label in labels:
-                            source_exists = True
-                except Exception:
-                    pass  # Node doesn't exist
-
-                if not source_exists:
-                    # Create source node with minimal data and labels
-                    self._conn.upsertVertex(
-                        VertexType.ENTITY.value,
-                        source_node_id,
-                        {
-                            "entity_id": source_node_id,
-                            "labels": list(
-                                {workspace_label}
-                            ),  # Only workspace in labels
-                            "entity_type": "UNKNOWN",
-                        },
-                    )
-
-                # Check if target node exists using getVerticesById
-                target_exists = False
-                try:
-                    target_result = self._conn.getVerticesById(
-                        VertexType.ENTITY.value, target_node_id
-                    )
-                    if (
-                        isinstance(target_result, dict)
-                        and target_node_id in target_result
-                    ):
-                        attrs = target_result[target_node_id].get("attributes", {})
-                        labels = attrs.get("labels", set())
-                        if isinstance(labels, set) and workspace_label in labels:
-                            target_exists = True
-                except Exception:
-                    pass  # Node doesn't exist
-
-                if not target_exists:
-                    # Create target node with minimal data and labels
-                    self._conn.upsertVertex(
-                        VertexType.ENTITY.value,
-                        target_node_id,
-                        {
-                            "entity_id": target_node_id,
-                            "labels": list(
-                                {workspace_label}
-                            ),  # Only workspace in labels
-                            "entity_type": "UNKNOWN",
-                        },
-                    )
-
-                # Upsert edge (undirected, so direction doesn't matter)
                self._conn.upsertEdge(
                    VertexType.ENTITY.value,
                    source_node_id,
@ -1663,12 +1654,16 @@ class TigerGraphStorage(BaseGraphStorage):
                for vertex in vertices:
                    attrs = vertex.get("attributes", {})
                    attrs["id"] = attrs.get("entity_id")
-                    # Convert labels SET to list and filter out workspace label
+                    # Convert labels SET to list and filter out workspace label, entity_type, and "UNKNOWN"
+                    # Labels should only be used for workspace filtering, not for storing entity_type
                    if "labels" in attrs and isinstance(attrs["labels"], set):
+                        entity_type = attrs.get("entity_type")
                        attrs["labels"] = [
                            label
                            for label in attrs["labels"]
                            if label != workspace_label
+                            and label != entity_type
+                            and label != "UNKNOWN"
                        ]
                    nodes.append(attrs)
                return nodes
--- a/lightrag/lightrag.py
+++ b/lightrag/lightrag.py
@ -3,6 +3,7 @@ from __future__ import annotations
 import traceback
 import asyncio
 import configparser
+import inspect
 import os
 import time
 import warnings
@ -12,6 +13,7 @@ from functools import partial
 from typing import (
    Any,
    AsyncIterator,
+    Awaitable,
    Callable,
    Iterator,
    cast,
@ -20,6 +22,7 @@ from typing import (
    Optional,
    List,
    Dict,
+    Union,
 )
 from lightrag.prompt import PROMPTS
 from lightrag.exceptions import PipelineCancelledException
@ -243,11 +246,13 @@ class LightRAG:
            int,
            int,
        ],
-        List[Dict[str, Any]],
+        Union[List[Dict[str, Any]], Awaitable[List[Dict[str, Any]]]],
    ] = field(default_factory=lambda: chunking_by_token_size)
    """
    Custom chunking function for splitting text into chunks before processing.

+    The function can be either synchronous or asynchronous.
+
    The function should take the following parameters:

        - `tokenizer`: A Tokenizer instance to use for tokenization.
@ -257,7 +262,8 @@ class LightRAG:
        - `chunk_token_size`: The maximum number of tokens per chunk.
        - `chunk_overlap_token_size`: The number of overlapping tokens between consecutive chunks.

-    The function should return a list of dictionaries, where each dictionary contains the following keys:
+    The function should return a list of dictionaries (or an awaitable that resolves to a list),
+    where each dictionary contains the following keys:
        - `tokens`: The number of tokens in the chunk.
        - `content`: The text content of the chunk.

@ -1756,7 +1762,28 @@ class LightRAG:
                                )
                            content = content_data["content"]

-                            # Generate chunks from document
+                            # Call chunking function, supporting both sync and async implementations
+                            chunking_result = self.chunking_func(
+                                self.tokenizer,
+                                content,
+                                split_by_character,
+                                split_by_character_only,
+                                self.chunk_overlap_token_size,
+                                self.chunk_token_size,
+                            )
+
+                            # If result is awaitable, await to get actual result
+                            if inspect.isawaitable(chunking_result):
+                                chunking_result = await chunking_result
+
+                            # Validate return type
+                            if not isinstance(chunking_result, (list, tuple)):
+                                raise TypeError(
+                                    f"chunking_func must return a list or tuple of dicts, "
+                                    f"got {type(chunking_result)}"
+                                )
+
+                            # Build chunks dictionary
                            chunks: dict[str, Any] = {
                                compute_mdhash_id(dp["content"], prefix="chunk-"): {
                                    **dp,
@ -1764,14 +1791,7 @@ class LightRAG:
                                    "file_path": file_path,  # Add file path to each chunk
                                    "llm_cache_list": [],  # Initialize empty LLM cache list for each chunk
                                }
-                                for dp in self.chunking_func(
-                                    self.tokenizer,
-                                    content,
-                                    split_by_character,
-                                    split_by_character_only,
-                                    self.chunk_overlap_token_size,
-                                    self.chunk_token_size,
-                                )
+                                for dp in chunking_result
                            }

                            if not chunks:
--- a/lightrag/llm/binding_options.py
+++ b/lightrag/llm/binding_options.py
@ -472,6 +472,9 @@ class OllamaLLMOptions(_OllamaOptionsMixin, BindingOptions):
    _binding_name: ClassVar[str] = "ollama_llm"


+# =============================================================================
+# Binding Options for Gemini
+# =============================================================================
@dataclass
 class GeminiLLMOptions(BindingOptions):
    """Options for Google Gemini models."""
@ -505,6 +508,19 @@ class GeminiLLMOptions(BindingOptions):
    }


+@dataclass
+class GeminiEmbeddingOptions(BindingOptions):
+    """Options for Google Gemini embedding models."""
+
+    _binding_name: ClassVar[str] = "gemini_embedding"
+
+    task_type: str = "RETRIEVAL_DOCUMENT"
+
+    _help: ClassVar[dict[str, str]] = {
+        "task_type": "Task type for embedding optimization (RETRIEVAL_DOCUMENT, RETRIEVAL_QUERY, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, CODE_RETRIEVAL_QUERY, QUESTION_ANSWERING, FACT_VERIFICATION)",
+    }
+
+
 # =============================================================================
 # Binding Options for OpenAI
 # =============================================================================
--- a/lightrag/llm/gemini.py
+++ b/lightrag/llm/gemini.py
@ -16,22 +16,44 @@ from collections.abc import AsyncIterator
 from functools import lru_cache
 from typing import Any

-from lightrag.utils import logger, remove_think_tags, safe_unicode_decode
+import numpy as np
+from tenacity import (
+    retry,
+    stop_after_attempt,
+    wait_exponential,
+    retry_if_exception_type,
+)
+
+from lightrag.utils import (
+    logger,
+    remove_think_tags,
+    safe_unicode_decode,
+    wrap_embedding_func_with_attrs,
+)

 import pipmaster as pm

-# Install the Google Gemini client on demand
+# Install the Google Gemini client and its dependencies on demand
 if not pm.is_installed("google-genai"):
    pm.install("google-genai")
+if not pm.is_installed("google-api-core"):
+    pm.install("google-api-core")

 from google import genai  # type: ignore
 from google.genai import types  # type: ignore
+from google.api_core import exceptions as google_api_exceptions  # type: ignore

 DEFAULT_GEMINI_ENDPOINT = "https://generativelanguage.googleapis.com"

 LOG = logging.getLogger(__name__)


+class InvalidResponseError(Exception):
+    """Custom exception class for triggering retry mechanism when Gemini returns empty responses"""
+
+    pass
+
+
@lru_cache(maxsize=8)
 def _get_gemini_client(
    api_key: str, base_url: str | None, timeout: int | None = None
@ -163,6 +185,21 @@ def _extract_response_text(
    return ("\n".join(regular_parts), "\n".join(thought_parts))


+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, min=4, max=60),
+    retry=(
+        retry_if_exception_type(google_api_exceptions.InternalServerError)
+        | retry_if_exception_type(google_api_exceptions.ServiceUnavailable)
+        | retry_if_exception_type(google_api_exceptions.ResourceExhausted)
+        | retry_if_exception_type(google_api_exceptions.GatewayTimeout)
+        | retry_if_exception_type(google_api_exceptions.BadGateway)
+        | retry_if_exception_type(google_api_exceptions.DeadlineExceeded)
+        | retry_if_exception_type(google_api_exceptions.Aborted)
+        | retry_if_exception_type(google_api_exceptions.Unknown)
+        | retry_if_exception_type(InvalidResponseError)
+    ),
+)
 async def gemini_complete_if_cache(
    model: str,
    prompt: str,
@ -369,7 +406,7 @@ async def gemini_complete_if_cache(
        final_text = regular_text or ""

    if not final_text:
-        raise RuntimeError("Gemini response did not contain any text content.")
+        raise InvalidResponseError("Gemini response did not contain any text content.")

    if "\\u" in final_text:
        final_text = safe_unicode_decode(final_text.encode("utf-8"))
@ -416,7 +453,143 @@ async def gemini_model_complete(
    )


+@wrap_embedding_func_with_attrs(embedding_dim=1536)
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=1, min=4, max=60),
+    retry=(
+        retry_if_exception_type(google_api_exceptions.InternalServerError)
+        | retry_if_exception_type(google_api_exceptions.ServiceUnavailable)
+        | retry_if_exception_type(google_api_exceptions.ResourceExhausted)
+        | retry_if_exception_type(google_api_exceptions.GatewayTimeout)
+        | retry_if_exception_type(google_api_exceptions.BadGateway)
+        | retry_if_exception_type(google_api_exceptions.DeadlineExceeded)
+        | retry_if_exception_type(google_api_exceptions.Aborted)
+        | retry_if_exception_type(google_api_exceptions.Unknown)
+    ),
+)
+async def gemini_embed(
+    texts: list[str],
+    model: str = "gemini-embedding-001",
+    base_url: str | None = None,
+    api_key: str | None = None,
+    embedding_dim: int | None = None,
+    task_type: str = "RETRIEVAL_DOCUMENT",
+    timeout: int | None = None,
+    token_tracker: Any | None = None,
+) -> np.ndarray:
+    """Generate embeddings for a list of texts using Gemini's API.
+
+    This function uses Google's Gemini embedding model to generate text embeddings.
+    It supports dynamic dimension control and automatic normalization for dimensions
+    less than 3072.
+
+    Args:
+        texts: List of texts to embed.
+        model: The Gemini embedding model to use. Default is "gemini-embedding-001".
+        base_url: Optional custom API endpoint.
+        api_key: Optional Gemini API key. If None, uses environment variables.
+        embedding_dim: Optional embedding dimension for dynamic dimension reduction.
+            **IMPORTANT**: This parameter is automatically injected by the EmbeddingFunc wrapper.
+            Do NOT manually pass this parameter when calling the function directly.
+            The dimension is controlled by the @wrap_embedding_func_with_attrs decorator
+            or the EMBEDDING_DIM environment variable.
+            Supported range: 128-3072. Recommended values: 768, 1536, 3072.
+        task_type: Task type for embedding optimization. Default is "RETRIEVAL_DOCUMENT".
+            Supported types: SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING,
+            RETRIEVAL_DOCUMENT, RETRIEVAL_QUERY, CODE_RETRIEVAL_QUERY,
+            QUESTION_ANSWERING, FACT_VERIFICATION.
+        timeout: Request timeout in seconds (will be converted to milliseconds for Gemini API).
+        token_tracker: Optional token usage tracker for monitoring API usage.
+
+    Returns:
+        A numpy array of embeddings, one per input text. For dimensions < 3072,
+        the embeddings are L2-normalized to ensure optimal semantic similarity performance.
+
+    Raises:
+        ValueError: If API key is not provided or configured.
+        RuntimeError: If the response from Gemini is invalid or empty.
+
+    Note:
+        - For dimension 3072: Embeddings are already normalized by the API
+        - For dimensions < 3072: Embeddings are L2-normalized after retrieval
+        - Normalization ensures accurate semantic similarity via cosine distance
+    """
+    loop = asyncio.get_running_loop()
+
+    key = _ensure_api_key(api_key)
+    # Convert timeout from seconds to milliseconds for Gemini API
+    timeout_ms = timeout * 1000 if timeout else None
+    client = _get_gemini_client(key, base_url, timeout_ms)
+
+    # Prepare embedding configuration
+    config_kwargs: dict[str, Any] = {}
+
+    # Add task_type to config
+    if task_type:
+        config_kwargs["task_type"] = task_type
+
+    # Add output_dimensionality if embedding_dim is provided
+    if embedding_dim is not None:
+        config_kwargs["output_dimensionality"] = embedding_dim
+
+    # Create config object if we have parameters
+    config_obj = types.EmbedContentConfig(**config_kwargs) if config_kwargs else None
+
+    def _call_embed() -> Any:
+        """Call Gemini embedding API in executor thread."""
+        request_kwargs: dict[str, Any] = {
+            "model": model,
+            "contents": texts,
+        }
+        if config_obj is not None:
+            request_kwargs["config"] = config_obj
+
+        return client.models.embed_content(**request_kwargs)
+
+    # Execute API call in thread pool
+    response = await loop.run_in_executor(None, _call_embed)
+
+    # Extract embeddings from response
+    if not hasattr(response, "embeddings") or not response.embeddings:
+        raise RuntimeError("Gemini response did not contain embeddings.")
+
+    # Convert embeddings to numpy array
+    embeddings = np.array(
+        [np.array(e.values, dtype=np.float32) for e in response.embeddings]
+    )
+
+    # Apply L2 normalization for dimensions < 3072
+    # The 3072 dimension embedding is already normalized by Gemini API
+    if embedding_dim and embedding_dim < 3072:
+        # Normalize each embedding vector to unit length
+        norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
+        # Avoid division by zero
+        norms = np.where(norms == 0, 1, norms)
+        embeddings = embeddings / norms
+        logger.debug(
+            f"Applied L2 normalization to {len(embeddings)} embeddings of dimension {embedding_dim}"
+        )
+
+    # Track token usage if tracker is provided
+    # Note: Gemini embedding API may not provide usage metadata
+    if token_tracker and hasattr(response, "usage_metadata"):
+        usage = response.usage_metadata
+        token_counts = {
+            "prompt_tokens": getattr(usage, "prompt_token_count", 0),
+            "total_tokens": getattr(usage, "total_token_count", 0),
+        }
+        token_tracker.add_usage(token_counts)
+
+    logger.debug(
+        f"Generated {len(embeddings)} Gemini embeddings with dimension {embeddings.shape[1]}"
+    )
+
+    return embeddings
+
+
 __all__ = [
    "gemini_complete_if_cache",
    "gemini_model_complete",
+    "gemini_embed",
 ]
--- a/lightrag/llm/jina.py
+++ b/lightrag/llm/jina.py
@ -69,7 +69,7 @@ async def fetch_data(url, headers, data):
 )
 async def jina_embed(
    texts: list[str],
-    dimensions: int = 2048,
+    embedding_dim: int = 2048,
    late_chunking: bool = False,
    base_url: str = None,
    api_key: str = None,
@ -78,7 +78,12 @@ async def jina_embed(

    Args:
        texts: List of texts to embed.
-        dimensions: The embedding dimensions (default: 2048 for jina-embeddings-v4).
+        embedding_dim: The embedding dimensions (default: 2048 for jina-embeddings-v4).
+            **IMPORTANT**: This parameter is automatically injected by the EmbeddingFunc wrapper.
+            Do NOT manually pass this parameter when calling the function directly.
+            The dimension is controlled by the @wrap_embedding_func_with_attrs decorator.
+            Manually passing a different value will trigger a warning and be ignored.
+            When provided (by EmbeddingFunc), it will be passed to the Jina API for dimension reduction.
        late_chunking: Whether to use late chunking.
        base_url: Optional base URL for the Jina API.
        api_key: Optional Jina API key. If None, uses the JINA_API_KEY environment variable.
@ -104,7 +109,7 @@ async def jina_embed(
    data = {
        "model": "jina-embeddings-v4",
        "task": "text-matching",
-        "dimensions": dimensions,
+        "dimensions": embedding_dim,
        "embedding_type": "base64",
        "input": texts,
    }
@ -114,7 +119,7 @@ async def jina_embed(
        data["late_chunking"] = late_chunking

    logger.debug(
-        f"Jina embedding request: {len(texts)} texts, dimensions: {dimensions}"
+        f"Jina embedding request: {len(texts)} texts, dimensions: {embedding_dim}"
    )

    try:
--- a/lightrag/llm/ollama.py
+++ b/lightrag/llm/ollama.py
@ -1,4 +1,6 @@
 from collections.abc import AsyncIterator
+import os
+import re

 import pipmaster as pm

@ -22,10 +24,30 @@ from lightrag.exceptions import (
 from lightrag.api import __api_version__

 import numpy as np
-from typing import Union
+from typing import Optional, Union
 from lightrag.utils import logger


+_OLLAMA_CLOUD_HOST = "https://ollama.com"
+_CLOUD_MODEL_SUFFIX_PATTERN = re.compile(r"(?:-cloud|:cloud)$")
+
+
+def _coerce_host_for_cloud_model(host: Optional[str], model: object) -> Optional[str]:
+    if host:
+        return host
+    try:
+        model_name_str = str(model) if model is not None else ""
+    except (TypeError, ValueError, AttributeError) as e:
+        logger.warning(f"Failed to convert model to string: {e}, using empty string")
+        model_name_str = ""
+    if _CLOUD_MODEL_SUFFIX_PATTERN.search(model_name_str):
+        logger.debug(
+            f"Detected cloud model '{model_name_str}', using Ollama Cloud host"
+        )
+        return _OLLAMA_CLOUD_HOST
+    return host
+
+
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10),
@ -53,6 +75,9 @@ async def _ollama_model_if_cache(
        timeout = None
    kwargs.pop("hashing_kv", None)
    api_key = kwargs.pop("api_key", None)
+    # fallback to environment variable when not provided explicitly
+    if not api_key:
+        api_key = os.getenv("OLLAMA_API_KEY")
    headers = {
        "Content-Type": "application/json",
        "User-Agent": f"LightRAG/{__api_version__}",
@ -60,6 +85,8 @@ async def _ollama_model_if_cache(
    if api_key:
        headers["Authorization"] = f"Bearer {api_key}"

+    host = _coerce_host_for_cloud_model(host, model)
+
    ollama_client = ollama.AsyncClient(host=host, timeout=timeout, headers=headers)

    try:
@ -144,6 +171,8 @@ async def ollama_model_complete(

 async def ollama_embed(texts: list[str], embed_model, **kwargs) -> np.ndarray:
    api_key = kwargs.pop("api_key", None)
+    if not api_key:
+        api_key = os.getenv("OLLAMA_API_KEY")
    headers = {
        "Content-Type": "application/json",
        "User-Agent": f"LightRAG/{__api_version__}",
@ -154,6 +183,8 @@ async def ollama_embed(texts: list[str], embed_model, **kwargs) -> np.ndarray:
    host = kwargs.pop("host", None)
    timeout = kwargs.pop("timeout", None)

+    host = _coerce_host_for_cloud_model(host, embed_model)
+
    ollama_client = ollama.AsyncClient(host=host, timeout=timeout, headers=headers)
    try:
        options = kwargs.pop("options", {})
--- a/lightrag/llm/openai.py
+++ b/lightrag/llm/openai.py
@ -138,9 +138,9 @@ async def openai_complete_if_cache(
    base_url: str | None = None,
    api_key: str | None = None,
    token_tracker: Any | None = None,
-    keyword_extraction: bool = False,  # Will be removed from kwargs before passing to OpenAI
    stream: bool | None = None,
    timeout: int | None = None,
+    keyword_extraction: bool = False,
    **kwargs: Any,
 ) -> str:
    """Complete a prompt using OpenAI's API with caching support and Chain of Thought (COT) integration.
@ -170,14 +170,15 @@ async def openai_complete_if_cache(
        api_key: Optional OpenAI API key. If None, uses the OPENAI_API_KEY environment variable.
        token_tracker: Optional token usage tracker for monitoring API usage.
        enable_cot: Whether to enable Chain of Thought (COT) processing. Default is False.
+        stream: Whether to stream the response. Default is False.
+        timeout: Request timeout in seconds. Default is None.
+        keyword_extraction: Whether to enable keyword extraction mode. When True, triggers
+            special response formatting for keyword extraction. Default is False.
        **kwargs: Additional keyword arguments to pass to the OpenAI API.
            Special kwargs:
            - openai_client_configs: Dict of configuration options for the AsyncOpenAI client.
                These will be passed to the client constructor but will be overridden by
                explicit parameters (api_key, base_url).
-            - keyword_extraction: Will be removed from kwargs before passing to OpenAI.
-            - stream: Whether to stream the response. Default is False.
-            - timeout: Request timeout in seconds. Default is None.

    Returns:
        The completed text (with integrated COT content if available) or an async iterator
@ -198,7 +199,6 @@ async def openai_complete_if_cache(

    # Remove special kwargs that shouldn't be passed to OpenAI
    kwargs.pop("hashing_kv", None)
-    kwargs.pop("keyword_extraction", None)

    # Extract client configuration options
    client_configs = kwargs.pop("openai_client_configs", {})
@ -228,6 +228,12 @@ async def openai_complete_if_cache(

    messages = kwargs.pop("messages", messages)

+    # Add explicit parameters back to kwargs so they're passed to OpenAI API
+    if stream is not None:
+        kwargs["stream"] = stream
+    if timeout is not None:
+        kwargs["timeout"] = timeout
+
    try:
        # Don't use async with context manager, use client directly
        if "response_format" in kwargs:
@ -516,7 +522,6 @@ async def openai_complete(
 ) -> Union[str, AsyncIterator[str]]:
    if history_messages is None:
        history_messages = []
-    keyword_extraction = kwargs.pop("keyword_extraction", None)
    if keyword_extraction:
        kwargs["response_format"] = "json"
    model_name = kwargs["hashing_kv"].global_config["llm_model_name"]
@ -525,6 +530,7 @@ async def openai_complete(
        prompt,
        system_prompt=system_prompt,
        history_messages=history_messages,
+        keyword_extraction=keyword_extraction,
        **kwargs,
    )

@ -539,7 +545,6 @@ async def gpt_4o_complete(
 ) -> str:
    if history_messages is None:
        history_messages = []
-    keyword_extraction = kwargs.pop("keyword_extraction", None)
    if keyword_extraction:
        kwargs["response_format"] = GPTKeywordExtractionFormat
    return await openai_complete_if_cache(
@ -548,6 +553,7 @@ async def gpt_4o_complete(
        system_prompt=system_prompt,
        history_messages=history_messages,
        enable_cot=enable_cot,
+        keyword_extraction=keyword_extraction,
        **kwargs,
    )

@ -562,7 +568,6 @@ async def gpt_4o_mini_complete(
 ) -> str:
    if history_messages is None:
        history_messages = []
-    keyword_extraction = kwargs.pop("keyword_extraction", None)
    if keyword_extraction:
        kwargs["response_format"] = GPTKeywordExtractionFormat
    return await openai_complete_if_cache(
@ -571,6 +576,7 @@ async def gpt_4o_mini_complete(
        system_prompt=system_prompt,
        history_messages=history_messages,
        enable_cot=enable_cot,
+        keyword_extraction=keyword_extraction,
        **kwargs,
    )

@ -585,13 +591,13 @@ async def nvidia_openai_complete(
 ) -> str:
    if history_messages is None:
        history_messages = []
-    kwargs.pop("keyword_extraction", None)
    result = await openai_complete_if_cache(
        "nvidia/llama-3.1-nemotron-70b-instruct",  # context length 128k
        prompt,
        system_prompt=system_prompt,
        history_messages=history_messages,
        enable_cot=enable_cot,
+        keyword_extraction=keyword_extraction,
        base_url="https://integrate.api.nvidia.com/v1",
        **kwargs,
    )
@ -613,6 +619,7 @@ async def openai_embed(
    model: str = "text-embedding-3-small",
    base_url: str | None = None,
    api_key: str | None = None,
+    embedding_dim: int | None = None,
    client_configs: dict[str, Any] | None = None,
    token_tracker: Any | None = None,
 ) -> np.ndarray:
@ -623,6 +630,12 @@ async def openai_embed(
        model: The OpenAI embedding model to use.
        base_url: Optional base URL for the OpenAI API.
        api_key: Optional OpenAI API key. If None, uses the OPENAI_API_KEY environment variable.
+        embedding_dim: Optional embedding dimension for dynamic dimension reduction.
+            **IMPORTANT**: This parameter is automatically injected by the EmbeddingFunc wrapper.
+            Do NOT manually pass this parameter when calling the function directly.
+            The dimension is controlled by the @wrap_embedding_func_with_attrs decorator.
+            Manually passing a different value will trigger a warning and be ignored.
+            When provided (by EmbeddingFunc), it will be passed to the OpenAI API for dimension reduction.
        client_configs: Additional configuration options for the AsyncOpenAI client.
            These will override any default configurations but will be overridden by
            explicit parameters (api_key, base_url).
@ -642,9 +655,19 @@ async def openai_embed(
    )

    async with openai_async_client:
-        response = await openai_async_client.embeddings.create(
-            model=model, input=texts, encoding_format="base64"
-        )
+        # Prepare API call parameters
+        api_params = {
+            "model": model,
+            "input": texts,
+            "encoding_format": "base64",
+        }
+
+        # Add dimensions parameter only if embedding_dim is provided
+        if embedding_dim is not None:
+            api_params["dimensions"] = embedding_dim
+
+        # Make API call
+        response = await openai_async_client.embeddings.create(**api_params)

        if token_tracker and hasattr(response, "usage"):
            token_counts = {
--- a/lightrag/operate.py
+++ b/lightrag/operate.py
@ -3425,10 +3425,10 @@ async def _perform_kg_search(
    )
    query_embedding = None
    if query and (kg_chunk_pick_method == "VECTOR" or chunks_vdb):
-        embedding_func_config = text_chunks_db.embedding_func
-        if embedding_func_config and embedding_func_config.func:
+        actual_embedding_func = text_chunks_db.embedding_func
+        if actual_embedding_func:
            try:
-                query_embedding = await embedding_func_config.func([query])
+                query_embedding = await actual_embedding_func([query])
                query_embedding = query_embedding[
                    0
                ]  # Extract first embedding from batch result
@ -4336,25 +4336,21 @@ async def _find_related_text_unit_from_entities(
        num_of_chunks = int(max_related_chunks * len(entities_with_chunks) / 2)

        # Get embedding function from global config
-        embedding_func_config = text_chunks_db.embedding_func
-        if not embedding_func_config:
+        actual_embedding_func = text_chunks_db.embedding_func
+        if not actual_embedding_func:
            logger.warning("No embedding function found, falling back to WEIGHT method")
            kg_chunk_pick_method = "WEIGHT"
        else:
            try:
-                actual_embedding_func = embedding_func_config.func
-
-                selected_chunk_ids = None
-                if actual_embedding_func:
-                    selected_chunk_ids = await pick_by_vector_similarity(
-                        query=query,
-                        text_chunks_storage=text_chunks_db,
-                        chunks_vdb=chunks_vdb,
-                        num_of_chunks=num_of_chunks,
-                        entity_info=entities_with_chunks,
-                        embedding_func=actual_embedding_func,
-                        query_embedding=query_embedding,
-                    )
+                selected_chunk_ids = await pick_by_vector_similarity(
+                    query=query,
+                    text_chunks_storage=text_chunks_db,
+                    chunks_vdb=chunks_vdb,
+                    num_of_chunks=num_of_chunks,
+                    entity_info=entities_with_chunks,
+                    embedding_func=actual_embedding_func,
+                    query_embedding=query_embedding,
+                )

                if selected_chunk_ids == []:
                    kg_chunk_pick_method = "WEIGHT"
@ -4629,24 +4625,21 @@ async def _find_related_text_unit_from_relations(
        num_of_chunks = int(max_related_chunks * len(relations_with_chunks) / 2)

        # Get embedding function from global config
-        embedding_func_config = text_chunks_db.embedding_func
-        if not embedding_func_config:
+        actual_embedding_func = text_chunks_db.embedding_func
+        if not actual_embedding_func:
            logger.warning("No embedding function found, falling back to WEIGHT method")
            kg_chunk_pick_method = "WEIGHT"
        else:
            try:
-                actual_embedding_func = embedding_func_config.func
-
-                if actual_embedding_func:
-                    selected_chunk_ids = await pick_by_vector_similarity(
-                        query=query,
-                        text_chunks_storage=text_chunks_db,
-                        chunks_vdb=chunks_vdb,
-                        num_of_chunks=num_of_chunks,
-                        entity_info=relations_with_chunks,
-                        embedding_func=actual_embedding_func,
-                        query_embedding=query_embedding,
-                    )
+                selected_chunk_ids = await pick_by_vector_similarity(
+                    query=query,
+                    text_chunks_storage=text_chunks_db,
+                    chunks_vdb=chunks_vdb,
+                    num_of_chunks=num_of_chunks,
+                    entity_info=relations_with_chunks,
+                    embedding_func=actual_embedding_func,
+                    query_embedding=query_embedding,
+                )

                if selected_chunk_ids == []:
                    kg_chunk_pick_method = "WEIGHT"
--- a/lightrag/tools/README_CLEAN_LLM_QUERY_CACHE.md
+++ b/lightrag/tools/README_CLEAN_LLM_QUERY_CACHE.md
@ -0,0 +1,661 @@
+# LLM Query Cache Cleanup Tool - User Guide
+
+## Overview
+
+This tool cleans up LightRAG's LLM query cache from KV storage implementations. It specifically targets query caches generated during RAG query operations (modes: `mix`, `hybrid`, `local`, `global`), including both query and keywords caches.
+
+## Supported Storage Types
+
+1. **JsonKVStorage** - File-based JSON storage
+2. **RedisKVStorage** - Redis database storage
+3. **PGKVStorage** - PostgreSQL database storage
+4. **MongoKVStorage** - MongoDB database storage
+
+## Cache Types
+
+The tool cleans up the following query cache types:
+
+### Query Cache Modes (4 types)
+- `mix:*` - Mixed mode query caches
+- `hybrid:*` - Hybrid mode query caches
+- `local:*` - Local mode query caches
+- `global:*` - Global mode query caches
+
+### Cache Content Types (2 types)
+- `*:query:*` - Query result caches
+- `*:keywords:*` - Keywords extraction caches
+
+### Cache Key Format
+```
+<mode>:<cache_type>:<hash>
+```
+
+Examples:
+- `mix:query:5ce04d25e957c290216cee5bfe6344fa`
+- `mix:keywords:fee77b98244a0b047ce95e21060de60e`
+- `global:query:abc123def456...`
+- `local:keywords:789xyz...`
+
+**Important Note**: This tool does NOT clean extraction caches (`default:extract:*` and `default:summary:*`). Use the migration tool or manual deletion for those caches.
+
+## Prerequisites
+
+- The tool reads storage configuration from environment variables or `config.ini`
+- Ensure the target storage is properly configured and accessible
+- Backup important data before running cleanup operations
+
+## Usage
+
+### Basic Usage
+
+Run from the LightRAG project root directory:
+
+```bash
+python -m lightrag.tools.clean_llm_query_cache
+# or
+python lightrag/tools/clean_llm_query_cache.py
+```
+
+### Interactive Workflow
+
+The tool guides you through the following steps:
+
+#### 1. Select Storage Type
+```
+============================================================
+LLM Query Cache Cleanup Tool - LightRAG
+============================================================
+
+=== Storage Setup ===
+
+Supported KV Storage Types:
+[1] JsonKVStorage
+[2] RedisKVStorage
+[3] PGKVStorage
+[4] MongoKVStorage
+
+Select storage type (1-4) (Press Enter to exit): 1
+```
+
+**Note**: You can press Enter or type `0` at any prompt to exit gracefully.
+
+#### 2. Storage Validation
+The tool will:
+- Check required environment variables
+- Auto-detect workspace configuration
+- Initialize and connect to storage
+- Verify connection status
+
+```
+Checking configuration...
+✓ All required environment variables are set
+
+Initializing storage...
+- Storage Type: JsonKVStorage
+- Workspace: space1
+- Connection Status: ✓ Success
+```
+
+#### 3. View Cache Statistics
+
+The tool displays a detailed breakdown of query caches by mode and type:
+
+```
+Counting query cache records...
+
+📊 Query Cache Statistics (Before Cleanup):
+┌────────────┬────────────┬────────────┬────────────┐
+│ Mode       │ Query      │ Keywords   │ Total      │
+├────────────┼────────────┼────────────┼────────────┤
+│ mix        │      1,234 │        567 │      1,801 │
+│ hybrid     │        890 │        423 │      1,313 │
+│ local      │      2,345 │      1,123 │      3,468 │
+│ global     │        678 │        345 │      1,023 │
+├────────────┼────────────┼────────────┼────────────┤
+│ Total      │      5,147 │      2,458 │      7,605 │
+└────────────┴────────────┴────────────┴────────────┘
+```
+
+#### 4. Select Cleanup Scope
+
+Choose what type of caches to delete:
+
+```
+=== Cleanup Options ===
+[1] Delete all query caches (both query and keywords)
+[2] Delete query caches only (keep keywords)
+[3] Delete keywords caches only (keep query)
+[0] Cancel
+
+Select cleanup option (0-3): 1
+```
+
+**Cleanup Types:**
+- **Option 1 (all)**: Deletes both query and keywords caches across all modes
+- **Option 2 (query)**: Deletes only query caches, preserves keywords caches
+- **Option 3 (keywords)**: Deletes only keywords caches, preserves query caches
+
+#### 5. Confirm Deletion
+
+Review the cleanup plan and confirm:
+
+```
+============================================================
+Cleanup Confirmation
+============================================================
+Storage: JsonKVStorage (workspace: space1)
+Cleanup Type: all
+Records to Delete: 7,605 / 7,605
+
+⚠️  WARNING: This will delete ALL query caches across all modes!
+
+Continue with deletion? (y/n): y
+```
+
+#### 6. Execute Cleanup
+
+The tool performs batch deletion with real-time progress:
+
+**JsonKVStorage Example:**
+```
+=== Starting Cleanup ===
+💡 Processing 1,000 records at a time from JsonKVStorage
+
+Batch 1/8: ████░░░░░░░░░░░░░░░░ 1,000/7,605 (13.1%) ✓
+Batch 2/8: ████████░░░░░░░░░░░░ 2,000/7,605 (26.3%) ✓
+...
+Batch 8/8: ████████████████████ 7,605/7,605 (100.0%) ✓
+
+Persisting changes to storage...
+✓ Changes persisted successfully
+```
+
+**RedisKVStorage Example:**
+```
+=== Starting Cleanup ===
+💡 Processing Redis keys in batches of 1,000
+
+Batch 1: Deleted 1,000 keys (Total: 1,000) ✓
+Batch 2: Deleted 1,000 keys (Total: 2,000) ✓
+...
+```
+
+**PostgreSQL Example:**
+```
+=== Starting Cleanup ===
+💡 Executing PostgreSQL DELETE query
+
+✓ Deleted 7,605 records in 0.45s
+```
+
+**MongoDB Example:**
+```
+=== Starting Cleanup ===
+💡 Executing MongoDB deleteMany operations
+
+Pattern 1/8: Deleted 1,234 records ✓
+Pattern 2/8: Deleted 567 records ✓
+...
+Total deleted: 7,605 records
+```
+
+#### 7. Review Cleanup Report
+
+The tool provides a comprehensive final report:
+
+**Successful Cleanup:**
+```
+============================================================
+Cleanup Complete - Final Report
+============================================================
+
+📊 Statistics:
+  Total records to delete:  7,605
+  Total batches:            8
+  Successful batches:       8
+  Failed batches:           0
+  Successfully deleted:     7,605
+  Failed to delete:         0
+  Success rate:             100.00%
+
+📈 Before/After Comparison:
+  Total caches before:      7,605
+  Total caches after:       0
+  Net reduction:            7,605
+
+============================================================
+✓ SUCCESS: All records cleaned up successfully!
+============================================================
+
+📊 Query Cache Statistics (After Cleanup):
+┌────────────┬────────────┬────────────┬────────────┐
+│ Mode       │ Query      │ Keywords   │ Total      │
+├────────────┼────────────┼────────────┼────────────┤
+│ mix        │          0 │          0 │          0 │
+│ hybrid     │          0 │          0 │          0 │
+│ local      │          0 │          0 │          0 │
+│ global     │          0 │          0 │          0 │
+├────────────┼────────────┼────────────┼────────────┤
+│ Total      │          0 │          0 │          0 │
+└────────────┴────────────┴────────────┴────────────┘
+```
+
+**Cleanup with Errors:**
+```
+============================================================
+Cleanup Complete - Final Report
+============================================================
+
+📊 Statistics:
+  Total records to delete:  7,605
+  Total batches:            8
+  Successful batches:       7
+  Failed batches:           1
+  Successfully deleted:     6,605
+  Failed to delete:         1,000
+  Success rate:             86.85%
+
+📈 Before/After Comparison:
+  Total caches before:      7,605
+  Total caches after:       1,000
+  Net reduction:            6,605
+
+⚠️  Errors encountered: 1
+
+Error Details:
+------------------------------------------------------------
+
+Error Summary:
+  - ConnectionError: 1 occurrence(s)
+
+First 5 errors:
+
+  1. Batch 3
+     Type: ConnectionError
+     Message: Connection timeout after 30s
+     Records lost: 1,000
+
+============================================================
+⚠️  WARNING: Cleanup completed with errors!
+   Please review the error details above.
+============================================================
+```
+
+## Technical Details
+
+### Workspace Handling
+
+The tool retrieves workspace in the following priority order:
+
+1. **Storage-specific workspace environment variables**
+   - PGKVStorage: `POSTGRES_WORKSPACE`
+   - MongoKVStorage: `MONGODB_WORKSPACE`
+   - RedisKVStorage: `REDIS_WORKSPACE`
+
+2. **Generic workspace environment variable**
+   - `WORKSPACE`
+
+3. **Default value**
+   - Empty string (uses storage's default workspace)
+
+### Batch Deletion
+
+- Default batch size: 1000 records/batch
+- Prevents memory overflow and connection timeouts
+- Each batch is processed independently
+- Failed batches are logged but don't stop cleanup
+
+### Storage-Specific Deletion Strategies
+
+#### JsonKVStorage
+- Collects all matching keys first (snapshot approach)
+- Deletes in batches with lock protection
+- Fast in-memory operations
+
+#### RedisKVStorage
+- Uses SCAN with pattern matching
+- Pipeline DELETE for batch operations
+- Cursor-based iteration for large datasets
+
+#### PostgreSQL
+- Single DELETE query with OR conditions
+- Efficient server-side bulk deletion
+- Uses LIKE patterns for mode/type matching
+
+#### MongoDB
+- Multiple deleteMany operations (one per pattern)
+- Regex-based document matching
+- Returns exact deletion counts
+
+### Pattern Matching Implementation
+
+**JsonKVStorage:**
+```python
+# Direct key prefix matching
+if key.startswith("mix:query:") or key.startswith("mix:keywords:")
+```
+
+**RedisKVStorage:**
+```python
+# SCAN with namespace-prefixed patterns
+pattern = f"{namespace}:mix:query:*"
+cursor, keys = await redis.scan(cursor, match=pattern)
+```
+
+**PostgreSQL:**
+```python
+# SQL LIKE conditions
+WHERE id LIKE 'mix:query:%' OR id LIKE 'mix:keywords:%'
+```
+
+**MongoDB:**
+```python
+# Regex queries on _id field
+{"_id": {"$regex": "^mix:query:"}}
+```
+
+## Error Handling & Resilience
+
+The tool implements comprehensive error tracking:
+
+### Batch-Level Error Tracking
+- Each batch is independently error-checked
+- Failed batches are logged with full details
+- Successful batches commit even if later batches fail
+- Real-time progress shows ✓ (success) or ✗ (failed)
+
+### Error Reporting
+After cleanup completes, a detailed report includes:
+- **Statistics**: Total records, success/failure counts, success rate
+- **Before/After Comparison**: Net reduction in cache count
+- **Error Summary**: Grouped by error type with occurrence counts
+- **Error Details**: Batch number, error type, message, and records lost
+- **Recommendations**: Clear indication of success or need for review
+
+### Verification
+- Post-cleanup count verification
+- Before/after statistics comparison
+- Identifies partial cleanup scenarios
+
+## Important Notes
+
+1. **Irreversible Operation**
+   - Deleted caches cannot be recovered
+   - Always backup important data before cleanup
+   - Test on non-production data first
+
+2. **Performance Impact**
+   - Query performance may degrade temporarily after cleanup
+   - Caches will rebuild on subsequent queries
+   - Consider cleanup during off-peak hours
+
+3. **Selective Cleanup**
+   - Choose cleanup scope carefully
+   - Keywords caches may be valuable for future queries
+   - Query caches rebuild faster than keywords caches
+
+4. **Workspace Isolation**
+   - Cleanup only affects the selected workspace
+   - Other workspaces remain untouched
+   - Verify workspace before confirming cleanup
+
+5. **Interrupt and Resume**
+   - Cleanup can be interrupted at any time (Ctrl+C)
+   - Already deleted records cannot be recovered
+   - No automatic resume - must run tool again
+
+## Storage Configuration
+
+The tool supports multiple configuration methods with the following priority:
+
+1. **Environment variables** (highest priority)
+2. **config.ini file** (medium priority)
+3. **Default values** (lowest priority)
+
+### Environment Variable Configuration
+
+Configure storage settings in your `.env` file:
+
+#### Workspace Configuration (Optional)
+
+```bash
+# Generic workspace (shared by all storages)
+WORKSPACE=space1
+
+# Or configure independent workspace for specific storage
+POSTGRES_WORKSPACE=pg_space
+MONGODB_WORKSPACE=mongo_space
+REDIS_WORKSPACE=redis_space
+```
+
+**Workspace Priority**: Storage-specific > Generic WORKSPACE > Empty string
+
+#### JsonKVStorage
+
+```bash
+WORKING_DIR=./rag_storage
+```
+
+#### RedisKVStorage
+
+```bash
+REDIS_URI=redis://localhost:6379
+```
+
+#### PGKVStorage
+
+```bash
+POSTGRES_HOST=localhost
+POSTGRES_PORT=5432
+POSTGRES_USER=your_username
+POSTGRES_PASSWORD=your_password
+POSTGRES_DATABASE=your_database
+```
+
+#### MongoKVStorage
+
+```bash
+MONGO_URI=mongodb://root:root@localhost:27017/
+MONGO_DATABASE=LightRAG
+```
+
+### config.ini Configuration
+
+Alternatively, create a `config.ini` file in the project root:
+
+```ini
+[redis]
+uri = redis://localhost:6379
+
+[postgres]
+host = localhost
+port = 5432
+user = postgres
+password = yourpassword
+database = lightrag
+
+[mongodb]
+uri = mongodb://root:root@localhost:27017/
+database = LightRAG
+```
+
+**Note**: Environment variables take precedence over config.ini settings.
+
+## Troubleshooting
+
+### Missing Environment Variables
+```
+⚠️  Warning: Missing environment variables: POSTGRES_USER, POSTGRES_PASSWORD
+```
+**Solution**: Add missing variables to your `.env` file or configure in `config.ini`
+
+### Connection Failed
+```
+✗ Initialization failed: Connection refused
+```
+**Solutions**:
+- Check if database service is running
+- Verify connection parameters (host, port, credentials)
+- Check firewall settings
+- Ensure network connectivity for remote databases
+
+### No Caches Found
+```
+⚠️  No query caches found in storage
+```
+**Possible Reasons**:
+- No queries have been run yet
+- Caches were already cleaned
+- Wrong workspace selected
+- Different storage type was used for queries
+
+### Partial Cleanup
+```
+⚠️  WARNING: Cleanup completed with errors!
+```
+**Solutions**:
+- Check error details in the report
+- Verify storage connection stability
+- Re-run tool to clean remaining caches
+- Check storage capacity and permissions
+
+## Use Cases
+
+### Use Case 1: Clean All Query Caches
+
+**Scenario**: Free up storage space by removing all query caches
+
+```bash
+# Run tool
+python -m lightrag.tools.clean_llm_query_cache
+
+# Select: Storage type -> Option 1 (all) -> Confirm (y)
+```
+
+**Result**: All query and keywords caches deleted, maximum storage freed
+
+### Use Case 2: Refresh Query Caches Only
+
+**Scenario**: Force query cache rebuild while keeping keywords
+
+```bash
+# Run tool
+python -m lightrag.tools.clean_llm_query_cache
+
+# Select: Storage type -> Option 2 (query only) -> Confirm (y)
+```
+
+**Result**: Query caches deleted, keywords preserved for faster rebuild
+
+### Use Case 3: Clean Stale Keywords
+
+**Scenario**: Remove outdated keywords while keeping recent query results
+
+```bash
+# Run tool
+python -m lightrag.tools.clean_llm_query_cache
+
+# Select: Storage type -> Option 3 (keywords only) -> Confirm (y)
+```
+
+**Result**: Keywords deleted, query caches preserved
+
+### Use Case 4: Workspace-Specific Cleanup
+
+**Scenario**: Clean caches for a specific workspace
+
+```bash
+# Configure workspace
+export WORKSPACE=development
+
+# Run tool
+python -m lightrag.tools.clean_llm_query_cache
+
+# Select: Storage type -> Cleanup option -> Confirm (y)
+```
+
+**Result**: Only development workspace caches cleaned
+
+## Best Practices
+
+1. **Backup Before Cleanup**
+   - Always backup your storage before major cleanup
+   - Test cleanup on non-production data first
+   - Document cleanup decisions
+
+2. **Monitor Performance**
+   - Watch storage metrics during cleanup
+   - Monitor query performance after cleanup
+   - Allow time for cache rebuild
+
+3. **Scheduled Cleanup**
+   - Clean caches periodically (weekly/monthly)
+   - Automate cleanup for development environments
+   - Keep production cleanup manual for safety
+
+4. **Selective Deletion**
+   - Consider cleanup scope based on needs
+   - Keywords caches are harder to rebuild
+   - Query caches rebuild automatically
+
+5. **Storage Capacity**
+   - Monitor storage usage trends
+   - Clean caches before reaching capacity limits
+   - Archive old data if needed
+
+## Comparison with Migration Tool
+
+| Feature | Cleanup Tool | Migration Tool |
+|---------|-------------|----------------|
+| **Purpose** | Delete query caches | Migrate extraction caches |
+| **Cache Types** | mix/hybrid/local/global | default:extract/summary |
+| **Modes** | query, keywords | extract, summary |
+| **Operation** | Deletion | Copy between storages |
+| **Reversible** | No | Yes (source unchanged) |
+| **Use Case** | Free storage, refresh caches | Change storage backend |
+
+## Limitations
+
+1. **Single Storage Operation**
+   - Can only clean one storage type at a time
+   - To clean multiple storages, run tool multiple times
+
+2. **No Dry Run Mode**
+   - Deletion is immediate after confirmation
+   - No preview-only mode available
+   - Test on non-production first
+
+3. **No Selective Mode Cleanup**
+   - Cannot clean only specific modes (e.g., only `mix`)
+   - Cleanup applies to all modes for selected cache type
+   - All-or-nothing per cache type
+
+4. **No Scheduled Cleanup**
+   - Manual execution required
+   - No built-in scheduling
+   - Use cron/scheduler if automation needed
+
+5. **Verification Limitations**
+   - Post-cleanup verification may fail in error scenarios
+   - Manual verification recommended for critical operations
+
+## Future Enhancements
+
+Potential improvements for future versions:
+
+- Selective mode cleanup (e.g., clean only `mix` mode)
+- Age-based cleanup (delete caches older than X days)
+- Size-based cleanup (delete largest caches first)
+- Dry run mode for safe preview
+- Automated scheduling support
+- Cache statistics export
+- Incremental cleanup with pause/resume
+
+## Support
+
+For issues, questions, or feature requests:
+- Check the error details in the cleanup report
+- Review storage configuration
+- Verify workspace settings
+- Test with a small dataset first
+- Report bugs through project issue tracker
--- a/lightrag/tools/README_MIGRATE_LLM_CACHE.md
+++ b/lightrag/tools/README_MIGRATE_LLM_CACHE.md
@ -0,0 +1,471 @@
+# LLM Cache Migration Tool - User Guide
+
+## Overview
+
+This tool migrates LightRAG's LLM response cache between different KV storage implementations. It specifically migrates caches generated during file extraction (mode `default`), including entity extraction and summary caches.
+
+## Supported Storage Types
+
+1. **JsonKVStorage** - File-based JSON storage
+2. **RedisKVStorage** - Redis database storage
+3. **PGKVStorage** - PostgreSQL database storage
+4. **MongoKVStorage** - MongoDB database storage
+
+## Cache Types
+
+The tool migrates the following cache types:
+- `default:extract:*` - Entity and relationship extraction caches
+- `default:summary:*` - Entity and relationship summary caches
+
+**Note**: Query caches (modes like `mix`,`local`, `global`, etc.) are NOT migrated.
+
+## Prerequisites
+
+The LLM Cache Migration Tool reads the storage configuration of the LightRAG Server and provides an LLM migration option to select source and destination storage. Ensure that both the source and destination storage have been correctly configured and are accessible via the LightRAG Server before cache migration.
+
+## Usage
+
+### Basic Usage
+
+Run from the LightRAG project root directory:
+
+```bash
+python -m lightrag.tools.migrate_llm_cache
+# or
+python lightrag/tools/migrate_llm_cache.py
+```
+
+### Interactive Workflow
+
+The tool guides you through the following steps:
+
+#### 1. Select Source Storage Type
+```
+Supported KV Storage Types:
+[1] JsonKVStorage
+[2] RedisKVStorage
+[3] PGKVStorage
+[4] MongoKVStorage
+
+Select Source storage type (1-4) (Press Enter to exit): 1
+```
+
+**Note**: You can press Enter or type `0` at any storage selection prompt to exit gracefully.
+
+#### 2. Source Storage Validation
+The tool will:
+- Check required environment variables
+- Auto-detect workspace configuration
+- Initialize and connect to storage
+- Count cache records available for migration
+
+```
+Checking environment variables...
+✓ All required environment variables are set
+
+Initializing Source storage...
+- Storage Type: JsonKVStorage
+- Workspace: space1
+- Connection Status: ✓ Success
+
+Counting cache records...
+- Total: 8,734 records
+```
+
+**Progress Display by Storage Type:**
+- **JsonKVStorage**: Fast in-memory counting, displays final count without incremental progress
+  ```
+  Counting cache records...
+  - Total: 8,734 records
+  ```
+- **RedisKVStorage**: Real-time scanning progress with incremental counts
+  ```
+  Scanning Redis keys... found 8,734 records
+  ```
+- **PostgreSQL**: Quick COUNT(*) query, shows timing only if operation takes >1 second
+  ```
+  Counting PostgreSQL records... (took 2.3s)
+  ```
+- **MongoDB**: Fast count_documents(), shows timing only if operation takes >1 second
+  ```
+  Counting MongoDB documents... (took 1.8s)
+  ```
+
+#### 3. Select Target Storage Type
+
+The tool automatically excludes the source storage type from the target selection and renumbers the remaining options sequentially:
+
+```
+Available Storage Types for Target (source: JsonKVStorage excluded):
+[1] RedisKVStorage
+[2] PGKVStorage
+[3] MongoKVStorage
+
+Select Target storage type (1-3) (Press Enter or 0 to exit): 1
+```
+
+**Important Notes:**
+- You **cannot** select the same storage type for both source and target
+- Options are automatically renumbered (e.g., [1], [2], [3] instead of [2], [3], [4])
+- You can press Enter or type `0` to exit at this point as well
+
+The tool then validates the target storage following the same process as the source (checking environment variables, initializing connection, counting records).
+
+#### 4. Confirm Migration
+
+```
+==================================================
+Migration Confirmation
+Source: JsonKVStorage (workspace: space1) - 8,734 records
+Target: MongoKVStorage (workspace: space1) - 0 records
+Batch Size: 1,000 records/batch
+Memory Mode: Streaming (memory-optimized)
+
+⚠️  Warning: Target storage already has 0 records
+Migration will overwrite records with the same keys
+
+Continue? (y/n): y
+```
+
+#### 5. Execute Migration
+
+The tool uses **streaming migration** by default for memory efficiency. Observe migration progress:
+
+```
+=== Starting Streaming Migration ===
+💡 Memory-optimized mode: Processing 1,000 records at a time
+
+Batch 1/9: ████████░░░░░░░░░░░░ 1000/8734 (11.4%) - default:extract ✓
+Batch 2/9: ████████████░░░░░░░░ 2000/8734 (22.9%) - default:extract ✓
+...
+Batch 9/9: ████████████████████ 8734/8734 (100.0%) - default:summary ✓
+
+Persisting data to disk...
+✓ Data persisted successfully
+```
+
+**Key Features:**
+- **Streaming mode**: Processes data in batches without loading entire dataset into memory
+- **Real-time progress**: Shows progress bar with precise percentage and cache type
+- **Success indicators**: ✓ for successful batches, ✗ for failed batches
+- **Constant memory usage**: Handles millions of records efficiently
+
+#### 6. Review Migration Report
+
+The tool provides a comprehensive final report showing statistics and any errors encountered:
+
+**Successful Migration:**
+```
+Migration Complete - Final Report
+
+📊 Statistics:
+  Total source records:    8,734
+  Total batches:           9
+  Successful batches:      9
+  Failed batches:          0
+  Successfully migrated:   8,734
+  Failed to migrate:       0
+  Success rate:            100.00%
+
+✓ SUCCESS: All records migrated successfully!
+```
+
+**Migration with Errors:**
+```
+Migration Complete - Final Report
+
+📊 Statistics:
+  Total source records:    8,734
+  Total batches:           9
+  Successful batches:      8
+  Failed batches:          1
+  Successfully migrated:   7,734
+  Failed to migrate:       1,000
+  Success rate:            88.55%
+
+⚠️  Errors encountered: 1
+
+Error Details:
+------------------------------------------------------------
+
+Error Summary:
+  - ConnectionError: 1 occurrence(s)
+
+First 5 errors:
+
+  1. Batch 2
+     Type: ConnectionError
+     Message: Connection timeout after 30s
+     Records lost: 1,000
+
+⚠️  WARNING: Migration completed with errors!
+   Please review the error details above.
+```
+
+## Technical Details
+
+### Workspace Handling
+
+The tool retrieves workspace in the following priority order:
+
+1. **Storage-specific workspace environment variables**
+   - PGKVStorage: `POSTGRES_WORKSPACE`
+   - MongoKVStorage: `MONGODB_WORKSPACE`
+   - RedisKVStorage: `REDIS_WORKSPACE`
+
+2. **Generic workspace environment variable**
+   - `WORKSPACE`
+
+3. **Default value**
+   - Empty string (uses storage's default workspace)
+
+### Batch Migration
+
+- Default batch size: 1000 records/batch
+- Avoids memory overflow from loading too much data at once
+- Each batch is committed independently, supporting resume capability
+
+### Memory-Efficient Pagination
+
+For large datasets, the tool implements storage-specific pagination strategies:
+
+- **JsonKVStorage**: Direct in-memory access (data already loaded in shared storage)
+- **RedisKVStorage**: Cursor-based SCAN with pipeline batching (1000 keys/batch)
+- **PGKVStorage**: SQL LIMIT/OFFSET pagination (1000 records/batch)
+- **MongoKVStorage**: Cursor streaming with batch_size (1000 documents/batch)
+
+This ensures the tool can handle millions of cache records without memory issues.
+
+### Prefix Filtering Implementation
+
+The tool uses optimized filtering methods for different storage types:
+
+- **JsonKVStorage**: Direct dictionary iteration with lock protection
+- **RedisKVStorage**: SCAN command with namespace-prefixed patterns + pipeline for bulk GET
+- **PGKVStorage**: SQL LIKE queries with proper field mapping (id, return_value, etc.)
+- **MongoKVStorage**: MongoDB regex queries on `_id` field with cursor streaming
+
+## Error Handling & Resilience
+
+The tool implements comprehensive error tracking to ensure transparent and resilient migrations:
+
+### Batch-Level Error Tracking
+- Each batch is independently error-checked
+- Failed batches are logged but don't stop the migration
+- Successful batches are committed even if later batches fail
+- Real-time progress shows ✓ (success) or ✗ (failed) for each batch
+
+### Error Reporting
+After migration completes, a detailed report includes:
+- **Statistics**: Total records, success/failure counts, success rate
+- **Error Summary**: Grouped by error type with occurrence counts
+- **Error Details**: Batch number, error type, message, and records lost
+- **Recommendations**: Clear indication of success or need for review
+
+### No Double Data Loading
+- Unlike traditional verification approaches, the tool does NOT reload all target data
+- Errors are detected during migration, not after
+- This eliminates memory overhead and handles pre-existing target data correctly
+
+## Important Notes
+
+1. **Data Overwrite Warning**
+   - Migration will overwrite records with the same keys in the target storage
+   - Tool displays a warning if target storage already has data
+   - Data migration can be performed repeatedly
+   - Pre-existing data in target storage is handled correctly
+3. **Interrupt and Resume**
+   - Migration can be interrupted at any time (Ctrl+C)
+   - Already migrated data will remain in target storage
+   - Re-running will overwrite existing records
+   - Failed batches can be manually retried
+4. **Performance Considerations**
+   - Large data migration may take considerable time
+   - Recommend migrating during off-peak hours
+   - Ensure stable network connection (for remote databases)
+   - Memory usage stays constant regardless of dataset size
+
+## Storage Configuration
+
+The tool supports multiple configuration methods with the following priority:
+
+1. **Environment variables** (highest priority)
+2. **config.ini file** (medium priority)
+3. **Default values** (lowest priority)
+
+#### Option A: Environment Variable Configuration
+
+Configure storage settings in your `.env` file:
+
+#### Workspace Configuration (Optional)
+
+```bash
+# Generic workspace (shared by all storages)
+WORKSPACE=space1
+
+# Or configure independent workspace for specific storage
+POSTGRES_WORKSPACE=pg_space
+MONGODB_WORKSPACE=mongo_space
+REDIS_WORKSPACE=redis_space
+```
+
+**Workspace Priority**: Storage-specific > Generic WORKSPACE > Empty string
+
+#### JsonKVStorage
+
+```bash
+WORKING_DIR=./rag_storage
+```
+
+#### RedisKVStorage
+
+```bash
+REDIS_URI=redis://localhost:6379
+```
+
+#### PGKVStorage
+
+```bash
+POSTGRES_HOST=localhost
+POSTGRES_PORT=5432
+POSTGRES_USER=your_username
+POSTGRES_PASSWORD=your_password
+POSTGRES_DATABASE=your_database
+```
+
+#### MongoKVStorage
+
+```bash
+MONGO_URI=mongodb://root:root@localhost:27017/
+MONGO_DATABASE=LightRAG
+```
+
+#### Option B: config.ini Configuration
+
+Alternatively, create a `config.ini` file in the project root:
+
+```ini
+[redis]
+uri = redis://localhost:6379
+
+[postgres]
+host = localhost
+port = 5432
+user = postgres
+password = yourpassword
+database = lightrag
+
+[mongodb]
+uri = mongodb://root:root@localhost:27017/
+database = LightRAG
+```
+
+**Note**: Environment variables take precedence over config.ini settings. JsonKVStorage uses `WORKING_DIR` environment variable or defaults to `./rag_storage`.
+
+## Troubleshooting
+
+### Missing Environment Variables
+```
+✗ Missing required environment variables: POSTGRES_USER, POSTGRES_PASSWORD
+```
+**Solution**: Add missing variables to your `.env` file
+
+### Connection Failed
+```
+✗ Initialization failed: Connection refused
+```
+**Solutions**:
+- Check if database service is running
+- Verify connection parameters (host, port, credentials)
+- Check firewall settings
+
+**Solutions**:
+- Check migration process for error logs
+- Re-run migration tool
+- Check target storage capacity and permissions
+
+## Example Scenarios
+
+### Scenario 1: JSON to MongoDB Migration
+
+Use case: Migrating from single-machine development to production
+
+```bash
+# 1. Configure environment variables
+WORKSPACE=production
+MONGO_URI=mongodb://user:pass@prod-server:27017/
+MONGO_DATABASE=LightRAG
+
+# 2. Run tool
+python -m lightrag.tools.migrate_llm_cache
+
+# 3. Select: 1 (JsonKVStorage) -> 1 (MongoKVStorage - renumbered from 4)
+```
+
+**Note**: After selecting JsonKVStorage as source, MongoKVStorage will be shown as option [1] in the target selection since options are renumbered after excluding the source.
+
+### Scenario 2: Redis to PostgreSQL
+
+Use case: Migrating from cache storage to relational database
+
+```bash
+# 1. Ensure both databases are accessible
+REDIS_URI=redis://old-redis:6379
+POSTGRES_HOST=new-postgres-server
+# ... Other PostgreSQL configs
+
+# 2. Run tool
+python -m lightrag.tools.migrate_llm_cache
+
+# 3. Select: 2 (RedisKVStorage) -> 2 (PGKVStorage - renumbered from 3)
+```
+
+**Note**: After selecting RedisKVStorage as source, PGKVStorage will be shown as option [2] in the target selection.
+
+### Scenario 3: Different Workspaces Migration
+
+Use case: Migrating data between different workspace environments
+
+```bash
+# Configure separate workspaces for source and target
+POSTGRES_WORKSPACE=dev_workspace  # For development environment
+MONGODB_WORKSPACE=prod_workspace  # For production environment
+
+# Run tool
+python -m lightrag.tools.migrate_llm_cache
+
+# Select: 3 (PGKVStorage with dev_workspace) -> 3 (MongoKVStorage with prod_workspace)
+```
+
+**Note**: This allows you to migrate between different logical data partitions while changing storage backends.
+
+## Tool Limitations
+
+1. **Same Storage Type Not Allowed**
+   - You cannot migrate between the same storage type (e.g., PostgreSQL to PostgreSQL)
+   - This is enforced by the tool automatically excluding the source storage type from target selection
+   - For same-storage migrations (e.g., database switches), use database-native tools instead
+2. **Only Default Mode Caches**
+   - Only migrates `default:extract:*` and `default:summary:*`
+   - Query caches are not included
+4. **Network Dependency**
+   - Tool requires stable network connection for remote databases
+   - Large datasets may fail if connection is interrupted
+
+## Best Practices
+
+1. **Backup Before Migration**
+   - Always backup your data before migration
+   - Test migration on non-production data first
+
+2. **Verify Results**
+   - Check the verification output after migration
+   - Manually verify a few cache entries if needed
+
+3. **Monitor Performance**
+   - Watch database resource usage during migration
+   - Consider migrating in smaller batches if needed
+
+4. **Clean Old Data**
+   - After successful migration, consider cleaning old cache data
+   - Keep backups for a reasonable period before deletion
--- a/lightrag/tools/clean_llm_query_cache.py
+++ b/lightrag/tools/clean_llm_query_cache.py
--- a/lightrag/tools/migrate_llm_cache.py
+++ b/lightrag/tools/migrate_llm_cache.py
--- a/lightrag/utils.py
+++ b/lightrag/utils.py
@ -56,6 +56,9 @@ if not logger.handlers:
 # Set httpx logging level to WARNING
 logging.getLogger("httpx").setLevel(logging.WARNING)

+# Precompile regex pattern for JSON sanitization (module-level, compiled once)
+_SURROGATE_PATTERN = re.compile(r"[\uD800-\uDFFF\uFFFE\uFFFF]")
+
 # Global import for pypinyin with startup-time logging
 try:
    import pypinyin
@ -353,8 +356,29 @@ class EmbeddingFunc:
    embedding_dim: int
    func: callable
    max_token_size: int | None = None  # deprecated keep it for compatible only
+    send_dimensions: bool = (
+        False  # Control whether to send embedding_dim to the function
+    )

    async def __call__(self, *args, **kwargs) -> np.ndarray:
+        # Only inject embedding_dim when send_dimensions is True
+        if self.send_dimensions:
+            # Check if user provided embedding_dim parameter
+            if "embedding_dim" in kwargs:
+                user_provided_dim = kwargs["embedding_dim"]
+                # If user's value differs from class attribute, output warning
+                if (
+                    user_provided_dim is not None
+                    and user_provided_dim != self.embedding_dim
+                ):
+                    logger.warning(
+                        f"Ignoring user-provided embedding_dim={user_provided_dim}, "
+                        f"using declared embedding_dim={self.embedding_dim} from decorator"
+                    )
+
+            # Inject embedding_dim from decorator
+            kwargs["embedding_dim"] = self.embedding_dim
+
        return await self.func(*args, **kwargs)


@ -906,9 +930,123 @@ def load_json(file_name):
        return json.load(f)


+def _sanitize_string_for_json(text: str) -> str:
+    """Remove characters that cannot be encoded in UTF-8 for JSON serialization.
+
+    Uses regex for optimal performance with zero-copy optimization for clean strings.
+    Fast detection path for clean strings (99% of cases) with efficient removal for dirty strings.
+
+    Args:
+        text: String to sanitize
+
+    Returns:
+        Original string if clean (zero-copy), sanitized string if dirty
+    """
+    if not text:
+        return text
+
+    # Fast path: Check if sanitization is needed using C-level regex search
+    if not _SURROGATE_PATTERN.search(text):
+        return text  # Zero-copy for clean strings - most common case
+
+    # Slow path: Remove problematic characters using C-level regex substitution
+    return _SURROGATE_PATTERN.sub("", text)
+
+
+class SanitizingJSONEncoder(json.JSONEncoder):
+    """
+    Custom JSON encoder that sanitizes data during serialization.
+
+    This encoder cleans strings during the encoding process without creating
+    a full copy of the data structure, making it memory-efficient for large datasets.
+    """
+
+    def encode(self, o):
+        """Override encode method to handle simple string cases"""
+        if isinstance(o, str):
+            return json.encoder.encode_basestring(_sanitize_string_for_json(o))
+        return super().encode(o)
+
+    def iterencode(self, o, _one_shot=False):
+        """
+        Override iterencode to sanitize strings during serialization.
+        This is the core method that handles complex nested structures.
+        """
+        # Preprocess: sanitize all strings in the object
+        sanitized = self._sanitize_for_encoding(o)
+
+        # Call parent's iterencode with sanitized data
+        for chunk in super().iterencode(sanitized, _one_shot):
+            yield chunk
+
+    def _sanitize_for_encoding(self, obj):
+        """
+        Recursively sanitize strings in an object.
+        Creates new objects only when necessary to avoid deep copies.
+
+        Args:
+            obj: Object to sanitize
+
+        Returns:
+            Sanitized object with cleaned strings
+        """
+        if isinstance(obj, str):
+            return _sanitize_string_for_json(obj)
+
+        elif isinstance(obj, dict):
+            # Create new dict with sanitized keys and values
+            new_dict = {}
+            for k, v in obj.items():
+                clean_k = _sanitize_string_for_json(k) if isinstance(k, str) else k
+                clean_v = self._sanitize_for_encoding(v)
+                new_dict[clean_k] = clean_v
+            return new_dict
+
+        elif isinstance(obj, (list, tuple)):
+            # Sanitize list/tuple elements
+            cleaned = [self._sanitize_for_encoding(item) for item in obj]
+            return type(obj)(cleaned) if isinstance(obj, tuple) else cleaned
+
+        else:
+            # Numbers, booleans, None, etc. remain unchanged
+            return obj
+
+
 def write_json(json_obj, file_name):
+    """
+    Write JSON data to file with optimized sanitization strategy.
+
+    This function uses a two-stage approach:
+    1. Fast path: Try direct serialization (works for clean data ~99% of time)
+    2. Slow path: Use custom encoder that sanitizes during serialization
+
+    The custom encoder approach avoids creating a deep copy of the data,
+    making it memory-efficient. When sanitization occurs, the caller should
+    reload the cleaned data from the file to update shared memory.
+
+    Args:
+        json_obj: Object to serialize (may be a shallow copy from shared memory)
+        file_name: Output file path
+
+    Returns:
+        bool: True if sanitization was applied (caller should reload data),
+              False if direct write succeeded (no reload needed)
+    """
+    try:
+        # Strategy 1: Fast path - try direct serialization
+        with open(file_name, "w", encoding="utf-8") as f:
+            json.dump(json_obj, f, indent=2, ensure_ascii=False)
+        return False  # No sanitization needed, no reload required
+
+    except (UnicodeEncodeError, UnicodeDecodeError) as e:
+        logger.debug(f"Direct JSON write failed, using sanitizing encoder: {e}")
+
+    # Strategy 2: Use custom encoder (sanitizes during serialization, zero memory copy)
    with open(file_name, "w", encoding="utf-8") as f:
-        json.dump(json_obj, f, indent=2, ensure_ascii=False)
+        json.dump(json_obj, f, indent=2, ensure_ascii=False, cls=SanitizingJSONEncoder)
+
+    logger.info(f"JSON sanitization applied during write: {file_name}")
+    return True  # Sanitization applied, reload recommended


 class TokenizerInterface(Protocol):
--- a/lightrag_webui/src/components/retrieval/QuerySettings.tsx
+++ b/lightrag_webui/src/components/retrieval/QuerySettings.tsx
@ -40,7 +40,6 @@ export default function QuerySettings() {
  // Default values for reset functionality
  const defaultValues = useMemo(() => ({
    mode: 'mix' as QueryMode,
-    response_type: 'Multiple Paragraphs',
    top_k: 40,
    chunk_top_k: 20,
    max_entity_tokens: 6000,
@ -153,46 +152,6 @@ export default function QuerySettings() {
              </div>
            </>

-            {/* Response Format */}
-            <>
-              <TooltipProvider>
-                <Tooltip>
-                  <TooltipTrigger asChild>
-                    <label htmlFor="response_format_select" className="ml-1 cursor-help">
-                      {t('retrievePanel.querySettings.responseFormat')}
-                    </label>
-                  </TooltipTrigger>
-                  <TooltipContent side="left">
-                    <p>{t('retrievePanel.querySettings.responseFormatTooltip')}</p>
-                  </TooltipContent>
-                </Tooltip>
-              </TooltipProvider>
-              <div className="flex items-center gap-1">
-                <Select
-                  value={querySettings.response_type}
-                  onValueChange={(v) => handleChange('response_type', v)}
-                >
-                  <SelectTrigger
-                    id="response_format_select"
-                    className="hover:bg-primary/5 h-9 cursor-pointer focus:ring-0 focus:ring-offset-0 focus:outline-0 active:right-0 flex-1 text-left [&>span]:break-all [&>span]:line-clamp-1"
-                  >
-                    <SelectValue />
-                  </SelectTrigger>
-                  <SelectContent>
-                    <SelectGroup>
-                      <SelectItem value="Multiple Paragraphs">{t('retrievePanel.querySettings.responseFormatOptions.multipleParagraphs')}</SelectItem>
-                      <SelectItem value="Single Paragraph">{t('retrievePanel.querySettings.responseFormatOptions.singleParagraph')}</SelectItem>
-                      <SelectItem value="Bullet Points">{t('retrievePanel.querySettings.responseFormatOptions.bulletPoints')}</SelectItem>
-                    </SelectGroup>
-                  </SelectContent>
-                </Select>
-                <ResetButton
-                  onClick={() => handleReset('response_type')}
-                  title="Reset to default (Multiple Paragraphs)"
-                />
-              </div>
-            </>
-
            {/* Top K */}
            <>
              <TooltipProvider>
--- a/lightrag_webui/src/features/RetrievalTesting.tsx
+++ b/lightrag_webui/src/features/RetrievalTesting.tsx
@ -357,6 +357,7 @@ export default function RetrievalTesting() {
      const queryParams = {
        ...state.querySettings,
        query: actualQuery,
+        response_type: 'Multiple Paragraphs',
        conversation_history: effectiveHistoryTurns > 0
          ? prevMessages
            .filter((m) => m.isError !== true)
--- a/lightrag_webui/src/stores/settings.ts
+++ b/lightrag_webui/src/stores/settings.ts
@ -123,7 +123,6 @@ const useSettingsStoreBase = create<SettingsState>()(

      querySettings: {
        mode: 'global',
-        response_type: 'Multiple Paragraphs',
        top_k: 40,
        chunk_top_k: 20,
        max_entity_tokens: 6000,
@ -141,12 +140,6 @@ const useSettingsStoreBase = create<SettingsState>()(

      setLanguage: (language: Language) => {
        set({ language })
-        // Update i18n after state is updated
-        import('i18next').then(({ default: i18n }) => {
-          if (i18n.language !== language) {
-            i18n.changeLanguage(language)
-          }
-        })
      },

      setGraphLayoutMaxIterations: (iterations: number) =>
@ -245,7 +238,7 @@ const useSettingsStoreBase = create<SettingsState>()(
    {
      name: 'settings-storage',
      storage: createJSONStorage(() => localStorage),
-      version: 18,
+      version: 19,
      migrate: (state: any, version: number) => {
        if (version < 2) {
          state.showEdgeLabel = false
@ -342,6 +335,12 @@ const useSettingsStoreBase = create<SettingsState>()(
          // Add userPromptHistory field for older versions
          state.userPromptHistory = []
        }
+        if (version < 19) {
+          // Remove deprecated response_type parameter
+          if (state.querySettings) {
+            delete state.querySettings.response_type
+          }
+        }
        return state
      }
    }
--- a/pyproject.toml
+++ b/pyproject.toml
@ -24,11 +24,12 @@ dependencies = [
    "aiohttp",
    "configparser",
    "future",
+    "google-api-core>=2.0.0,<3.0.0",
    "google-genai>=1.0.0,<2.0.0",
    "json_repair",
    "nano-vectordb",
    "networkx",
-    "numpy",
+    "numpy>=1.24.0,<2.0.0",
    "pandas>=2.0.0,<2.4.0",
    "pipmaster",
    "pydantic",
@ -49,7 +50,7 @@ api = [
    "json_repair",
    "nano-vectordb",
    "networkx",
-    "numpy",
+    "numpy>=1.24.0,<2.0.0",
    "openai>=1.0.0,<3.0.0",
    "pandas>=2.0.0,<2.4.0",
    "pipmaster",
@ -60,6 +61,7 @@ api = [
    "tenacity",
    "tiktoken",
    "xlsxwriter>=3.1.0",
+    "google-api-core>=2.0.0,<3.0.0",
    "google-genai>=1.0.0,<2.0.0",
    # API-specific dependencies
    "aiofiles",
@ -77,18 +79,23 @@ api = [
    "python-multipart",
    "pytz",
    "uvicorn",
+    "gunicorn",
+    # Document processing dependencies (required for API document upload functionality)
+    "openpyxl>=3.0.0,<4.0.0",      # XLSX processing
+    "pycryptodome>=3.0.0,<4.0.0",  # PDF encryption support
+    "pypdf>=6.1.0",                 # PDF processing
+    "python-docx>=0.8.11,<2.0.0",  # DOCX processing
+    "python-pptx>=0.6.21,<2.0.0",  # PPTX processing
+]
+
+# Advanced document processing engine (optional)
+docling = [
+    # On macOS, pytorch and frameworks use Objective-C are not fork-safe,
+    # and not compatible to gunicorn multi-worker mode
+    "docling>=2.0.0,<3.0.0; sys_platform != 'darwin'",
 ]

 # Offline deployment dependencies (layered design for flexibility)
-offline-docs = [
-    # Document processing dependencies
-    "openpyxl>=3.0.0,<4.0.0",
-    "pycryptodome>=3.0.0,<4.0.0",
-    "pypdf2>=3.0.0",
-    "python-docx>=0.8.11,<2.0.0",
-    "python-pptx>=0.6.21,<2.0.0",
-]
-
 offline-storage = [
    # Storage backend dependencies
    "redis>=5.0.0,<8.0.0",
@ -96,8 +103,8 @@ offline-storage = [
    "pymilvus>=2.6.2,<3.0.0",
    "pymongo>=4.0.0,<5.0.0",
    "asyncpg>=0.29.0,<1.0.0",
-    "qdrant-client>=1.7.0,<2.0.0",
    "pyTigerGraph>=1.9.0,<2.0.0",
+    "qdrant-client>=1.11.0,<2.0.0",
 ]

 offline-llm = [
@ -109,12 +116,13 @@ offline-llm = [
    "aioboto3>=12.0.0,<16.0.0",
    "voyageai>=0.2.0,<1.0.0",
    "llama-index>=0.9.0,<1.0.0",
+    "google-api-core>=2.0.0,<3.0.0",
    "google-genai>=1.0.0,<2.0.0",
 ]

 offline = [
-    # Complete offline package (includes all offline dependencies)
-    "lightrag-hku[offline-docs,offline-storage,offline-llm]",
+    # Complete offline package (includes api for document processing, plus storage and LLM)
+    "lightrag-hku[api,offline-storage,offline-llm]",
 ]

 evaluation = [
--- a/requirements-offline-docs.txt
+++ b/requirements-offline-docs.txt
@ -1,15 +0,0 @@
-# LightRAG Offline Dependencies - Document Processing
-# Install with: pip install -r requirements-offline-docs.txt
-# For offline installation:
-#   pip download -r requirements-offline-docs.txt -d ./packages
-#   pip install --no-index --find-links=./packages -r requirements-offline-docs.txt
-#
-# Recommended: Use pip install lightrag-hku[offline-docs] for the same effect
-# Or use constraints: pip install --constraint constraints-offline.txt -r requirements-offline-docs.txt
-
-# Document processing dependencies (with version constraints matching pyproject.toml)
-openpyxl>=3.0.0,<4.0.0
-pycryptodome>=3.0.0,<4.0.0
-pypdf2>=3.0.0
-python-docx>=0.8.11,<2.0.0
-python-pptx>=0.6.21,<2.0.0
--- a/requirements-offline-llm.txt
+++ b/requirements-offline-llm.txt
@ -10,6 +10,7 @@
 # LLM provider dependencies (with version constraints matching pyproject.toml)
 aioboto3>=12.0.0,<16.0.0
 anthropic>=0.18.0,<1.0.0
+google-api-core>=2.0.0,<3.0.0
 google-genai>=1.0.0,<2.0.0
 llama-index>=0.9.0,<1.0.0
 ollama>=0.1.0,<1.0.0
--- a/requirements-offline-storage.txt
+++ b/requirements-offline-storage.txt
@ -13,5 +13,5 @@ neo4j>=5.0.0,<7.0.0
 pymilvus>=2.6.2,<3.0.0
 pymongo>=4.0.0,<5.0.0
 pyTigerGraph>=1.9.0,<2.0.0
-qdrant-client>=1.7.0,<2.0.0
+qdrant-client>=1.11.0,<2.0.0
 redis>=5.0.0,<8.0.0
--- a/requirements-offline.txt
+++ b/requirements-offline.txt
@ -24,10 +24,10 @@ openpyxl>=3.0.0,<4.0.0
 pycryptodome>=3.0.0,<4.0.0
 pymilvus>=2.6.2,<3.0.0
 pymongo>=4.0.0,<5.0.0
-pypdf2>=3.0.0
+pypdf>=6.1.0
 python-docx>=0.8.11,<2.0.0
 python-pptx>=0.6.21,<2.0.0
-qdrant-client>=1.7.0,<2.0.0
+qdrant-client>=1.11.0,<2.0.0
 redis>=5.0.0,<8.0.0
 voyageai>=0.2.0,<1.0.0
 zhipuai>=2.0.0,<3.0.0
--- a/tests/test_write_json_optimization.py
+++ b/tests/test_write_json_optimization.py
@ -0,0 +1,387 @@
+"""
+Test suite for write_json optimization
+
+This test verifies:
+1. Fast path works for clean data (no sanitization)
+2. Slow path applies sanitization for dirty data
+3. Sanitization is done during encoding (memory-efficient)
+4. Reloading updates shared memory with cleaned data
+"""
+
+import os
+import json
+import tempfile
+from lightrag.utils import write_json, load_json, SanitizingJSONEncoder
+
+
+class TestWriteJsonOptimization:
+    """Test write_json optimization with two-stage approach"""
+
+    def test_fast_path_clean_data(self):
+        """Test that clean data takes the fast path without sanitization"""
+        clean_data = {
+            "name": "John Doe",
+            "age": 30,
+            "items": ["apple", "banana", "cherry"],
+            "nested": {"key": "value", "number": 42},
+        }
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            # Write clean data - should return False (no sanitization)
+            needs_reload = write_json(clean_data, temp_file)
+            assert not needs_reload, "Clean data should not require sanitization"
+
+            # Verify data was written correctly
+            loaded_data = load_json(temp_file)
+            assert loaded_data == clean_data, "Loaded data should match original"
+        finally:
+            os.unlink(temp_file)
+
+    def test_slow_path_dirty_data(self):
+        """Test that dirty data triggers sanitization"""
+        # Create data with surrogate characters (U+D800 to U+DFFF)
+        dirty_string = "Hello\ud800World"  # Contains surrogate character
+        dirty_data = {"text": dirty_string, "number": 123}
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            # Write dirty data - should return True (sanitization applied)
+            needs_reload = write_json(dirty_data, temp_file)
+            assert needs_reload, "Dirty data should trigger sanitization"
+
+            # Verify data was written and sanitized
+            loaded_data = load_json(temp_file)
+            assert loaded_data is not None, "Data should be written"
+            assert loaded_data["number"] == 123, "Clean fields should remain unchanged"
+            # Surrogate character should be removed
+            assert (
+                "\ud800" not in loaded_data["text"]
+            ), "Surrogate character should be removed"
+        finally:
+            os.unlink(temp_file)
+
+    def test_sanitizing_encoder_removes_surrogates(self):
+        """Test that SanitizingJSONEncoder removes surrogate characters"""
+        data_with_surrogates = {
+            "text": "Hello\ud800\udc00World",  # Contains surrogate pair
+            "clean": "Clean text",
+            "nested": {"dirty_key\ud801": "value", "clean_key": "clean\ud802value"},
+        }
+
+        # Encode using custom encoder
+        encoded = json.dumps(
+            data_with_surrogates, cls=SanitizingJSONEncoder, ensure_ascii=False
+        )
+
+        # Verify no surrogate characters in output
+        assert "\ud800" not in encoded, "Surrogate U+D800 should be removed"
+        assert "\udc00" not in encoded, "Surrogate U+DC00 should be removed"
+        assert "\ud801" not in encoded, "Surrogate U+D801 should be removed"
+        assert "\ud802" not in encoded, "Surrogate U+D802 should be removed"
+
+        # Verify clean parts remain
+        assert "Clean text" in encoded, "Clean text should remain"
+        assert "clean_key" in encoded, "Clean keys should remain"
+
+    def test_nested_structure_sanitization(self):
+        """Test sanitization of deeply nested structures"""
+        nested_data = {
+            "level1": {
+                "level2": {
+                    "level3": {"dirty": "text\ud800here", "clean": "normal text"},
+                    "list": ["item1", "item\ud801dirty", "item3"],
+                }
+            }
+        }
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            needs_reload = write_json(nested_data, temp_file)
+            assert needs_reload, "Nested dirty data should trigger sanitization"
+
+            # Verify nested structure is preserved
+            loaded_data = load_json(temp_file)
+            assert "level1" in loaded_data
+            assert "level2" in loaded_data["level1"]
+            assert "level3" in loaded_data["level1"]["level2"]
+
+            # Verify surrogates are removed
+            dirty_text = loaded_data["level1"]["level2"]["level3"]["dirty"]
+            assert "\ud800" not in dirty_text, "Nested surrogate should be removed"
+
+            # Verify list items are sanitized
+            list_items = loaded_data["level1"]["level2"]["list"]
+            assert (
+                "\ud801" not in list_items[1]
+            ), "List item surrogates should be removed"
+        finally:
+            os.unlink(temp_file)
+
+    def test_unicode_non_characters_removed(self):
+        """Test that Unicode non-characters (U+FFFE, U+FFFF) don't cause encoding errors
+
+        Note: U+FFFE and U+FFFF are valid UTF-8 characters (though discouraged),
+        so they don't trigger sanitization. They only get removed when explicitly
+        using the SanitizingJSONEncoder.
+        """
+        data_with_nonchars = {"text1": "Hello\ufffeWorld", "text2": "Test\uffffString"}
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            # These characters are valid UTF-8, so they take the fast path
+            needs_reload = write_json(data_with_nonchars, temp_file)
+            assert not needs_reload, "U+FFFE/U+FFFF are valid UTF-8 characters"
+
+            loaded_data = load_json(temp_file)
+            # They're written as-is in the fast path
+            assert loaded_data == data_with_nonchars
+        finally:
+            os.unlink(temp_file)
+
+    def test_mixed_clean_dirty_data(self):
+        """Test data with both clean and dirty fields"""
+        mixed_data = {
+            "clean_field": "This is perfectly fine",
+            "dirty_field": "This has\ud800issues",
+            "number": 42,
+            "boolean": True,
+            "null_value": None,
+            "clean_list": [1, 2, 3],
+            "dirty_list": ["clean", "dirty\ud801item"],
+        }
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            needs_reload = write_json(mixed_data, temp_file)
+            assert (
+                needs_reload
+            ), "Mixed data with dirty fields should trigger sanitization"
+
+            loaded_data = load_json(temp_file)
+
+            # Clean fields should remain unchanged
+            assert loaded_data["clean_field"] == "This is perfectly fine"
+            assert loaded_data["number"] == 42
+            assert loaded_data["boolean"]
+            assert loaded_data["null_value"] is None
+            assert loaded_data["clean_list"] == [1, 2, 3]
+
+            # Dirty fields should be sanitized
+            assert "\ud800" not in loaded_data["dirty_field"]
+            assert "\ud801" not in loaded_data["dirty_list"][1]
+        finally:
+            os.unlink(temp_file)
+
+    def test_empty_and_none_strings(self):
+        """Test handling of empty and None values"""
+        data = {
+            "empty": "",
+            "none": None,
+            "zero": 0,
+            "false": False,
+            "empty_list": [],
+            "empty_dict": {},
+        }
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            needs_reload = write_json(data, temp_file)
+            assert (
+                not needs_reload
+            ), "Clean empty values should not trigger sanitization"
+
+            loaded_data = load_json(temp_file)
+            assert loaded_data == data, "Empty/None values should be preserved"
+        finally:
+            os.unlink(temp_file)
+
+    def test_specific_surrogate_udc9a(self):
+        """Test specific surrogate character \\udc9a mentioned in the issue"""
+        # Test the exact surrogate character from the error message:
+        # UnicodeEncodeError: 'utf-8' codec can't encode character '\\udc9a'
+        data_with_udc9a = {
+            "text": "Some text with surrogate\udc9acharacter",
+            "position": 201,  # As mentioned in the error
+            "clean_field": "Normal text",
+        }
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            # Write data - should trigger sanitization
+            needs_reload = write_json(data_with_udc9a, temp_file)
+            assert needs_reload, "Data with \\udc9a should trigger sanitization"
+
+            # Verify surrogate was removed
+            loaded_data = load_json(temp_file)
+            assert loaded_data is not None
+            assert "\udc9a" not in loaded_data["text"], "\\udc9a should be removed"
+            assert (
+                loaded_data["clean_field"] == "Normal text"
+            ), "Clean fields should remain"
+        finally:
+            os.unlink(temp_file)
+
+    def test_migration_with_surrogate_sanitization(self):
+        """Test that migration process handles surrogate characters correctly
+
+        This test simulates the scenario where legacy cache contains surrogate
+        characters and ensures they are cleaned during migration.
+        """
+        # Simulate legacy cache data with surrogate characters
+        legacy_data_with_surrogates = {
+            "cache_entry_1": {
+                "return": "Result with\ud800surrogate",
+                "cache_type": "extract",
+                "original_prompt": "Some\udc9aprompt",
+            },
+            "cache_entry_2": {
+                "return": "Clean result",
+                "cache_type": "query",
+                "original_prompt": "Clean prompt",
+            },
+        }
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            # First write the dirty data directly (simulating legacy cache file)
+            # Use custom encoder to force write even with surrogates
+            with open(temp_file, "w", encoding="utf-8") as f:
+                json.dump(
+                    legacy_data_with_surrogates,
+                    f,
+                    cls=SanitizingJSONEncoder,
+                    ensure_ascii=False,
+                )
+
+            # Load and verify surrogates were cleaned during initial write
+            loaded_data = load_json(temp_file)
+            assert loaded_data is not None
+
+            # The data should be sanitized
+            assert (
+                "\ud800" not in loaded_data["cache_entry_1"]["return"]
+            ), "Surrogate in return should be removed"
+            assert (
+                "\udc9a" not in loaded_data["cache_entry_1"]["original_prompt"]
+            ), "Surrogate in prompt should be removed"
+
+            # Clean data should remain unchanged
+            assert (
+                loaded_data["cache_entry_2"]["return"] == "Clean result"
+            ), "Clean data should remain"
+
+        finally:
+            os.unlink(temp_file)
+
+    def test_empty_values_after_sanitization(self):
+        """Test that data with empty values after sanitization is properly handled
+
+        Critical edge case: When sanitization results in data with empty string values,
+        we must use 'if cleaned_data is not None' instead of 'if cleaned_data' to ensure
+        proper reload, since truthy check on dict depends on content, not just existence.
+        """
+        # Create data where ALL values are only surrogate characters
+        all_dirty_data = {
+            "key1": "\ud800\udc00\ud801",
+            "key2": "\ud802\ud803",
+        }
+
+        with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".json") as f:
+            temp_file = f.name
+
+        try:
+            # Write dirty data - should trigger sanitization
+            needs_reload = write_json(all_dirty_data, temp_file)
+            assert needs_reload, "All-dirty data should trigger sanitization"
+
+            # Load the sanitized data
+            cleaned_data = load_json(temp_file)
+
+            # Critical assertions for the edge case
+            assert cleaned_data is not None, "Cleaned data should not be None"
+            # Sanitization removes surrogates but preserves keys with empty values
+            assert cleaned_data == {
+                "key1": "",
+                "key2": "",
+            }, "Surrogates should be removed, keys preserved"
+            # This dict is truthy because it has keys (even with empty values)
+            assert cleaned_data, "Dict with keys is truthy"
+
+            # Test the actual edge case: empty dict
+            empty_data = {}
+            needs_reload2 = write_json(empty_data, temp_file)
+            assert not needs_reload2, "Empty dict is clean"
+
+            reloaded_empty = load_json(temp_file)
+            assert reloaded_empty is not None, "Empty dict should not be None"
+            assert reloaded_empty == {}, "Empty dict should remain empty"
+            assert (
+                not reloaded_empty
+            ), "Empty dict evaluates to False (the critical check)"
+
+        finally:
+            os.unlink(temp_file)
+
+
+if __name__ == "__main__":
+    # Run tests
+    test = TestWriteJsonOptimization()
+
+    print("Running test_fast_path_clean_data...")
+    test.test_fast_path_clean_data()
+    print("✓ Passed")
+
+    print("Running test_slow_path_dirty_data...")
+    test.test_slow_path_dirty_data()
+    print("✓ Passed")
+
+    print("Running test_sanitizing_encoder_removes_surrogates...")
+    test.test_sanitizing_encoder_removes_surrogates()
+    print("✓ Passed")
+
+    print("Running test_nested_structure_sanitization...")
+    test.test_nested_structure_sanitization()
+    print("✓ Passed")
+
+    print("Running test_unicode_non_characters_removed...")
+    test.test_unicode_non_characters_removed()
+    print("✓ Passed")
+
+    print("Running test_mixed_clean_dirty_data...")
+    test.test_mixed_clean_dirty_data()
+    print("✓ Passed")
+
+    print("Running test_empty_and_none_strings...")
+    test.test_empty_and_none_strings()
+    print("✓ Passed")
+
+    print("Running test_specific_surrogate_udc9a...")
+    test.test_specific_surrogate_udc9a()
+    print("✓ Passed")
+
+    print("Running test_migration_with_surrogate_sanitization...")
+    test.test_migration_with_surrogate_sanitization()
+    print("✓ Passed")
+
+    print("Running test_empty_values_after_sanitization...")
+    test.test_empty_values_after_sanitization()
+    print("✓ Passed")
+
+    print("\n✅ All tests passed!")
--- a/uv.lock
+++ b/uv.lock