Merge branch 'fix-rerank-example'

2025-08-23 23:07:38 +08:00 · 2025-08-23 23:07:38 +08:00 · bac946fd98
commit bac946fd98
parent df4927f837 49ea9a79a7
7 changed files with 142 additions and 140 deletions
--- a/README-zh.md
+++ b/README-zh.md
@ -181,7 +181,7 @@ python examples/lightrag_openai_demo.py

 ## 使用LightRAG Core进行编程

-> 如果您希望将LightRAG集成到您的项目中，建议您使用LightRAG Server提供的REST API。LightRAG Core通常用于嵌入式应用，或供希望进行研究与评估的学者使用。
+> ⚠️ **如果您希望将LightRAG集成到您的项目中，建议您使用LightRAG Server提供的REST API**。LightRAG Core通常用于嵌入式应用，或供希望进行研究与评估的学者使用。

 ### 一个简单程序

@ -594,31 +594,15 @@ if __name__ == "__main__":

 </details>

-### 对话历史
+### Rerank函数注入

-LightRAG现在通过对话历史功能支持多轮对话。以下是使用方法：
+为了提高检索质量，可以根据更有效的相关性评分模型对文档进行重排序。`rerank.py`文件提供了三个Reranker提供商的驱动函数：

-```python
-# 创建对话历史
-conversation_history = [
-    {"role": "user", "content": "主角对圣诞节的态度是什么？"},
-    {"role": "assistant", "content": "在故事开始时，埃比尼泽·斯克鲁奇对圣诞节持非常消极的态度..."},
-    {"role": "user", "content": "他的态度是如何改变的？"}
-]
+*   **Cohere / vLLM**: `cohere_rerank`
+*   **Jina AI**: `jina_rerank`
+*   **Aliyun阿里云**: `ali_rerank`

-# 创建带有对话历史的查询参数
-query_param = QueryParam(
-    mode="mix",  # 或其他模式："local"、"global"、"hybrid"
-    conversation_history=conversation_history,  # 添加对话历史
-    history_turns=3  # 考虑最近的对话轮数
-)
-
-# 进行考虑对话历史的查询
-response = rag.query(
-    "是什么导致了他性格的这种变化？",
-    param=query_param
-)
-```
+您可以将这些函数之一注入到LightRAG对象的`rerank_model_func`属性中。这将使LightRAG的查询功能能够使用注入的函数对检索到的文本块进行重新排序。有关详细用法，请参阅`examples/rerank_example.py`文件。

 ### 用户提示词 vs. 查询内容

--- a/README.md
+++ b/README.md
@ -179,11 +179,12 @@ For a streaming response implementation example, please see `examples/lightrag_o

 ## Programing with LightRAG Core

-> If you would like to integrate LightRAG into your project, we recommend utilizing the REST API provided by the LightRAG Server. LightRAG Core is typically intended for embedded applications or for researchers who wish to conduct studies and evaluations.
+> ⚠️ **If you would like to integrate LightRAG into your project, we recommend utilizing the REST API provided by the LightRAG Server**. LightRAG Core is typically intended for embedded applications or for researchers who wish to conduct studies and evaluations.

 ### ⚠️ Important: Initialization Requirements

 **LightRAG requires explicit initialization before use.** You must call both `await rag.initialize_storages()` and `await initialize_pipeline_status()` after creating a LightRAG instance, otherwise you will encounter errors like:
+
 - `AttributeError: __aenter__` - if storages are not initialized
 - `KeyError: 'history_messages'` - if pipeline status is not initialized

@ -596,36 +597,15 @@ if __name__ == "__main__":

 </details>

-### Conversation History Support
+### Rerank Function Injection

+To enhance retrieval quality, documents can be re-ranked based on a more effective relevance scoring model. The `rerank.py` file provides three Reranker provider driver functions:

-LightRAG now supports multi-turn dialogue through the conversation history feature. Here's how to use it:
+* **Cohere / vLLM**: `cohere_rerank`
+* **Jina AI**: `jina_rerank`
+* **Aliyun**: `ali_rerank`

-<details>
-  <summary> <b> Usage Example </b></summary>
-
-```python
-# Create conversation history
-conversation_history = [
-    {"role": "user", "content": "What is the main character's attitude towards Christmas?"},
-    {"role": "assistant", "content": "At the beginning of the story, Ebenezer Scrooge has a very negative attitude towards Christmas..."},
-    {"role": "user", "content": "How does his attitude change?"}
-]
-
-# Create query parameters with conversation history
-query_param = QueryParam(
-    mode="mix",  # or any other mode: "local", "global", "hybrid"
-    conversation_history=conversation_history,  # Add the conversation history
-)
-
-# Make a query that takes into account the conversation history
-response = rag.query(
-    "What causes this change in his character?",
-    param=query_param
-)
-```
-
-</details>
+You can inject one of these functions into the `rerank_model_func` attribute of the LightRAG object. This will enable LightRAG's query function to re-order retrieved text blocks using the injected function. For detailed usage, please refer to the `examples/rerank_example.py` file.

 ### User Prompt vs. Query

@ -646,8 +626,6 @@ response_default = rag.query(
 print(response_default)
 ```

-
-
 ### Insert

 <details>
--- a/env.example
+++ b/env.example
@ -96,14 +96,14 @@ RERANK_BINDING=null
 ### rerank score chunk filter(set to 0.0 to keep all chunks, 0.6 or above if LLM is not strong enought)
 # MIN_RERANK_SCORE=0.0

-### For local deployment
+### For local deployment with vLLM
 # RERANK_MODEL=BAAI/bge-reranker-v2-m3
-# RERANK_BINDING_HOST=http://localhost:8000
+# RERANK_BINDING_HOST=http://localhost:8000/v1/rerank
 # RERANK_BINDING_API_KEY=your_rerank_api_key_here

 ### Default value for Cohere AI
 # RERANK_MODEL=rerank-v3.5
-# RERANK_BINDING_HOST=https://ai.znipower.com:5017/rerank
+# RERANK_BINDING_HOST=https://api.cohere.com/v2/rerank
 # RERANK_BINDING_API_KEY=your_rerank_api_key_here

 ### Default value for Jina AI
--- a/examples/rerank_example.py
+++ b/examples/rerank_example.py
@ -5,15 +5,21 @@ This example demonstrates how to use rerank functionality with LightRAG
 to improve retrieval quality across different query modes.

 Configuration Required:
-1. Set your LLM API key and base URL in llm_model_func()
-2. Set your embedding API key and base URL in embedding_func()
-3. Set your rerank API key and base URL in the rerank configuration
-4. Or use environment variables (.env file):
-   - RERANK_MODEL=your_rerank_model
-   - RERANK_BINDING_HOST=your_rerank_endpoint
-   - RERANK_BINDING_API_KEY=your_rerank_api_key
+1. Set your OpenAI LLM API key and base URL with env vars
+    LLM_MODEL
+    LLM_BINDING_HOST
+    LLM_BINDING_API_KEY
+2. Set your OpenAI embedding API key and base URL with env vars:
+    EMBEDDING_MODEL
+    EMBEDDING_DIM
+    EMBEDDING_BINDING_HOST
+    EMBEDDING_BINDING_API_KEY
+3. Set your vLLM deployed AI rerank model setting with env vars:
+    RERANK_MODEL
+    RERANK_BINDING_HOST
+    RERANK_BINDING_API_KEY

-Note: Rerank is now controlled per query via the 'enable_rerank' parameter (default: True)
+Note: Rerank is controlled per query via the 'enable_rerank' parameter (default: True)
 """

 import asyncio
@ -21,11 +27,13 @@ import os
 import numpy as np

 from lightrag import LightRAG, QueryParam
-from lightrag.rerank import custom_rerank, RerankModel
 from lightrag.llm.openai import openai_complete_if_cache, openai_embed
 from lightrag.utils import EmbeddingFunc, setup_logger
 from lightrag.kg.shared_storage import initialize_pipeline_status

+from functools import partial
+from lightrag.rerank import cohere_rerank
+
 # Set up your working directory
 WORKING_DIR = "./test_rerank"
 setup_logger("test_rerank")
@ -38,12 +46,12 @@ async def llm_model_func(
    prompt, system_prompt=None, history_messages=[], **kwargs
 ) -> str:
    return await openai_complete_if_cache(
-        "gpt-4o-mini",
+        os.getenv("LLM_MODEL"),
        prompt,
        system_prompt=system_prompt,
        history_messages=history_messages,
-        api_key="your_llm_api_key_here",
-        base_url="https://api.your-llm-provider.com/v1",
+        api_key=os.getenv("LLM_BINDING_API_KEY"),
+        base_url=os.getenv("LLM_BINDING_HOST"),
        **kwargs,
    )

@ -51,23 +59,18 @@ async def llm_model_func(
 async def embedding_func(texts: list[str]) -> np.ndarray:
    return await openai_embed(
        texts,
-        model="text-embedding-3-large",
-        api_key="your_embedding_api_key_here",
-        base_url="https://api.your-embedding-provider.com/v1",
+        model=os.getenv("EMBEDDING_MODEL"),
+        api_key=os.getenv("EMBEDDING_BINDING_API_KEY"),
+        base_url=os.getenv("EMBEDDING_BINDING_HOST"),
    )


-async def my_rerank_func(query: str, documents: list, top_n: int = None, **kwargs):
-    """Custom rerank function with all settings included"""
-    return await custom_rerank(
-        query=query,
-        documents=documents,
-        model="BAAI/bge-reranker-v2-m3",
-        base_url="https://api.your-rerank-provider.com/v1/rerank",
-        api_key="your_rerank_api_key_here",
-        top_n=top_n or 10,
-        **kwargs,
-    )
+rerank_model_func = partial(
+    cohere_rerank,
+    model=os.getenv("RERANK_MODEL"),
+    api_key=os.getenv("RERANK_BINDING_API_KEY"),
+    base_url=os.getenv("RERANK_BINDING_HOST"),
+)


 async def create_rag_with_rerank():
@ -88,42 +91,7 @@ async def create_rag_with_rerank():
            func=embedding_func,
        ),
        # Rerank Configuration - provide the rerank function
-        rerank_model_func=my_rerank_func,
-    )
-
-    await rag.initialize_storages()
-    await initialize_pipeline_status()
-
-    return rag
-
-
-async def create_rag_with_rerank_model():
-    """Alternative: Create LightRAG instance using RerankModel wrapper"""
-
-    # Get embedding dimension
-    test_embedding = await embedding_func(["test"])
-    embedding_dim = test_embedding.shape[1]
-    print(f"Detected embedding dimension: {embedding_dim}")
-
-    # Method 2: Using RerankModel wrapper
-    rerank_model = RerankModel(
-        rerank_func=custom_rerank,
-        kwargs={
-            "model": "BAAI/bge-reranker-v2-m3",
-            "base_url": "https://api.your-rerank-provider.com/v1/rerank",
-            "api_key": "your_rerank_api_key_here",
-        },
-    )
-
-    rag = LightRAG(
-        working_dir=WORKING_DIR,
-        llm_model_func=llm_model_func,
-        embedding_func=EmbeddingFunc(
-            embedding_dim=embedding_dim,
-            max_token_size=8192,
-            func=embedding_func,
-        ),
-        rerank_model_func=rerank_model.rerank,
+        rerank_model_func=rerank_model_func,
    )

    await rag.initialize_storages()
@ -136,7 +104,7 @@ async def test_rerank_with_different_settings():
    """
    Test rerank functionality with different enable_rerank settings
    """
-    print("🚀 Setting up LightRAG with Rerank functionality...")
+    print("\n\n🚀 Setting up LightRAG with Rerank functionality...")

    rag = await create_rag_with_rerank()

@ -199,11 +167,11 @@ async def test_direct_rerank():
    print("=" * 40)

    documents = [
-        {"content": "Reranking significantly improves retrieval quality"},
-        {"content": "LightRAG supports advanced reranking capabilities"},
-        {"content": "Vector search finds semantically similar documents"},
-        {"content": "Natural language processing with modern transformers"},
-        {"content": "The quick brown fox jumps over the lazy dog"},
+        "Vector search finds semantically similar documents",
+        "LightRAG supports advanced reranking capabilities",
+        "Reranking significantly improves retrieval quality",
+        "Natural language processing with modern transformers",
+        "The quick brown fox jumps over the lazy dog",
    ]

    query = "rerank improve quality"
@ -211,20 +179,20 @@ async def test_direct_rerank():
    print(f"Documents: {len(documents)}")

    try:
-        reranked_docs = await custom_rerank(
+        reranked_results = await rerank_model_func(
            query=query,
            documents=documents,
-            model="BAAI/bge-reranker-v2-m3",
-            base_url="https://api.your-rerank-provider.com/v1/rerank",
-            api_key="your_rerank_api_key_here",
-            top_n=3,
+            top_n=4,
        )

        print("\n✅ Rerank Results:")
-        for i, doc in enumerate(reranked_docs):
-            score = doc.get("rerank_score", "N/A")
-            content = doc.get("content", "")[:60]
-            print(f"  {i+1}. Score: {score:.4f} | {content}...")
+        i = 0
+        for result in reranked_results:
+            index = result["index"]
+            score = result["relevance_score"]
+            content = documents[index]
+            print(f"  {index}. Score: {score:.4f} | {content}...")
+            i += 1

    except Exception as e:
        print(f"❌ Rerank failed: {e}")
@ -236,12 +204,12 @@ async def main():
    print("=" * 60)

    try:
-        # Test rerank with different enable_rerank settings
-        await test_rerank_with_different_settings()
-
        # Test direct rerank
        await test_direct_rerank()

+        # Test rerank with different enable_rerank settings
+        await test_rerank_with_different_settings()
+
        print("\n✅ Example completed successfully!")
        print("\n💡 Key Points:")
        print("   ✓ Rerank is now controlled per query via 'enable_rerank' parameter")
--- a/lightrag/api/README-zh.md
+++ b/lightrag/api/README-zh.md
@ -421,6 +421,44 @@ LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage
 | --embedding-binding | ollama | 嵌入绑定类型（lollms、ollama、openai、azure_openai、aws_bedrock） |
 | auto-scan-at-startup | - | 扫描输入目录中的新文件并开始索引 |

+### Reranking 配置
+
+Reranking 查询召回的块可以显著提高检索质量，它通过基于优化的相关性评分模型对文档重新排序。LightRAG 目前支持以下 rerank 提供商：
+
+- **Cohere / vLLM**：提供与 Cohere AI 的 `v2/rerank` 端点的完整 API 集成。由于 vLLM 提供了与 Cohere 兼容的 reranker API，因此也支持所有通过 vLLM 部署的 reranker 模型。
+- **Jina AI**：提供与所有 Jina rerank 模型的完全实现兼容性。
+- **阿里云**：具有旨在支持阿里云 rerank API 格式的自定义实现。
+
+Rerank 提供商通过 `.env` 文件进行配置。以下是使用 vLLM 本地部署的 rerank 模型的示例配置：
+
+```
+RERANK_BINDING=cohere
+RERANK_MODEL=BAAI/bge-reranker-v2-m3
+RERANK_BINDING_HOST=http://localhost:8000/v1/rerank
+RERANK_BINDING_API_KEY=your_rerank_api_key_here
+```
+
+以下是使用阿里云提供的 Reranker 服务的示例配置：
+
+```
+RERANK_BINDING=aliyun
+RERANK_MODEL=gte-rerank-v2
+RERANK_BINDING_HOST=https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank
+RERANK_BINDING_API_KEY=your_rerank_api_key_here
+```
+
+有关完整的 reranker 配置示例，请参阅 `env.example` 文件。
+
+### 启用 Reranking
+
+可以按查询启用或禁用 Reranking。
+
+`/query` 和 `/query/stream` API 端点包含一个 `enable_rerank` 参数，默认设置为 `true`，用于控制当前查询是否激活 reranking。要将 `enable_rerank` 参数的默认值更改为 `false`，请设置以下环境变量：
+
+```
+RERANK_BY_DEFAULT=False
+```
+
 ### .env 文件示例

 ```bash
--- a/lightrag/api/README.md
+++ b/lightrag/api/README.md
@ -422,9 +422,43 @@ You cannot change storage implementation selection after adding documents to Lig
 | --embedding-binding   | ollama        | Embedding binding type (lollms, ollama, openai, azure_openai, aws_bedrock)                                                                   |
 | --auto-scan-at-startup| -             | Scan input directory for new files and start indexing                                                                           |

-### Additional Ollama Binding Options
+### Reranking Configuration

-When using `--llm-binding ollama` or `--embedding-binding ollama`, additional Ollama-specific configuration options are available. To see all available Ollama binding options, add `--help` to the command line when starting the server. These additional options allow for fine-tuning of Ollama model parameters and connection settings.
+Reranking query-recalled chunks can significantly enhance retrieval quality by re-ordering documents based on an optimized relevance scoring model. LightRAG currently supports the following rerank providers:
+
+- **Cohere / vLLM**: Offers full API integration with Cohere AI's `v2/rerank` endpoint. As vLLM provides a Cohere-compatible reranker API, all reranker models deployed via vLLM are also supported.
+- **Jina AI**: Provides complete implementation compatibility with all Jina rerank models.
+- **Aliyun**: Features a custom implementation designed to support Aliyun's rerank API format.
+
+The rerank provider is configured via the `.env` file. Below is an example configuration for a rerank model deployed locally using vLLM:
+
+```
+RERANK_BINDING=cohere
+RERANK_MODEL=BAAI/bge-reranker-v2-m3
+RERANK_BINDING_HOST=http://localhost:8000/v1/rerank
+RERANK_BINDING_API_KEY=your_rerank_api_key_here
+```
+
+Here is an example configuration for utilizing the Reranker service provided by Aliyun:
+
+```
+RERANK_BINDING=aliyun
+RERANK_MODEL=gte-rerank-v2
+RERANK_BINDING_HOST=https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank
+RERANK_BINDING_API_KEY=your_rerank_api_key_here
+```
+
+For comprehensive reranker configuration examples, please refer to the `env.example` file.
+
+### Enable Reranking
+
+Reranking can be enabled or disabled on a per-query basis.
+
+The `/query` and `/query/stream` API endpoints include an `enable_rerank` parameter, which is set to `true` by default, controlling whether reranking is active for the current query. To change the default value of the `enable_rerank` parameter to `false`, set the following environment variable:
+
+```
+RERANK_BY_DEFAULT=False
+```

 ### .env Examples

--- a/lightrag/api/routers/ollama_api.py
+++ b/lightrag/api/routers/ollama_api.py
@ -469,8 +469,8 @@ class OllamaAPI:
            "/chat", dependencies=[Depends(combined_auth)], include_in_schema=True
        )
        async def chat(raw_request: Request):
-            """Process chat completion requests acting as an Ollama model
-            Routes user queries through LightRAG by selecting query mode based on prefix indicators.
+            """Process chat completion requests by acting as an Ollama model.
+            Routes user queries through LightRAG by selecting query mode based on query prefix.
            Detects and forwards OpenWebUI session-related requests (for meta data generation task) directly to LLM.
            Supports both application/json and application/octet-stream Content-Types.
            """