Update README.md

2025-07-16 11:10:56 +08:00 · 2025-07-16 11:10:56 +08:00 · 1c53c5c764
commit 1c53c5c764
parent 2bf0d397ed
2 changed files with 41 additions and 9 deletions
--- a/README-zh.md
+++ b/README-zh.md
@ -135,6 +135,22 @@ pip install lightrag-hku

 ## 快速开始

+### LightRAG的LLM及配套技术栈要求
+
+LightRAG对大型语言模型（LLM）的能力要求远高于传统RAG，因为它需要LLM执行文档中的实体关系抽取任务。配置合适的Embedding和Reranker模型对提高查询表现也至关重要。
+
+- **LLM选型**：
+  - 推荐选用参数量至少为32B的LLM。
+  - 上下文长度至少为32KB，推荐达到64KB。
+- **Embedding模型**：
+  - 高性能的Embedding模型对RAG至关重要。
+  - 推荐使用主流的多语言Embedding模型，例如：BAAI/bge-m3 和 text-embedding-3-large。
+  - **重要提示**：在文档索引前必须确定使用的Embedding模型，且在文档查询阶段必须沿用与索引阶段相同的模型。
+- **Reranker模型配置**：
+  - 配置Reranker模型能够显著提升LightRAG的检索效果。
+  - 启用Reranker模型后，推荐将“mix模式”设为默认查询模式。
+  - 推荐选用主流的Reranker模型，例如：BAAI/bge-reranker-v2-m3 或 Jina 等服务商提供的模型。
+
 ### 使用LightRAG服务器

 **有关LightRAG服务器的更多信息，请参阅[LightRAG服务器](./lightrag/api/README.md)。**
@ -831,7 +847,7 @@ rag = LightRAG(
  create INDEX CONCURRENTLY entity_idx_node_id ON dickens."Entity" (ag_catalog.agtype_access_operator(properties, '"node_id"'::agtype));
  CREATE INDEX CONCURRENTLY entity_node_id_gin_idx ON dickens."Entity" using gin(properties);
  ALTER TABLE dickens."DIRECTED" CLUSTER ON directed_sid_idx;
-
+  
  -- 如有必要可以删除
  drop INDEX entity_p_idx;
  drop INDEX vertex_p_idx;
@ -1189,17 +1205,17 @@ LightRAG 现已与 [RAG-Anything](https://github.com/HKUDS/RAG-Anything) 实现
        from lightrag.llm.openai import openai_complete_if_cache, openai_embed
        from lightrag.utils import EmbeddingFunc
        import os
-
+    
        async def load_existing_lightrag():
            # 首先，创建或加载现有的 LightRAG 实例
            lightrag_working_dir = "./existing_lightrag_storage"
-
+    
            # 检查是否存在之前的 LightRAG 实例
            if os.path.exists(lightrag_working_dir) and os.listdir(lightrag_working_dir):
                print("✅ Found existing LightRAG instance, loading...")
            else:
                print("❌ No existing LightRAG instance found, will create new one")
-
+    
            # 使用您的配置创建/加载 LightRAG 实例
            lightrag_instance = LightRAG(
                working_dir=lightrag_working_dir,
@ -1222,10 +1238,10 @@ LightRAG 现已与 [RAG-Anything](https://github.com/HKUDS/RAG-Anything) 实现
                    ),
                )
            )
-
+    
            # 初始化存储（如果有现有数据，这将加载现有数据）
            await lightrag_instance.initialize_storages()
-
+    
            # 现在使用现有的 LightRAG 实例初始化 RAGAnything
            rag = RAGAnything(
                lightrag=lightrag_instance,  # 传递现有的 LightRAG 实例
@ -1254,20 +1270,20 @@ LightRAG 现已与 [RAG-Anything](https://github.com/HKUDS/RAG-Anything) 实现
                )
                # 注意：working_dir、llm_model_func、embedding_func 等都从 lightrag_instance 继承
            )
-
+    
            # 查询现有的知识库
            result = await rag.query_with_multimodal(
                "What data has been processed in this LightRAG instance?",
                mode="hybrid"
            )
            print("Query result:", result)
-
+    
            # 向现有的 LightRAG 实例添加新的多模态文档
            await rag.process_document_complete(
                file_path="path/to/new/multimodal_document.pdf",
                output_dir="./output"
            )
-
+    
        if __name__ == "__main__":
            asyncio.run(load_existing_lightrag())
    ```
--- a/README.md
+++ b/README.md
@ -134,6 +134,22 @@ pip install lightrag-hku

 ## Quick Start

+### LLM and Technology Stack Requirements for LightRAG
+
+LightRAG's demands on the capabilities of Large Language Models (LLMs) are significantly higher than those of traditional RAG, as it requires the LLM to perform entity-relationship extraction tasks from documents. Configuring appropriate Embedding and Reranker models is also crucial for improving query performance.
+
+- **LLM Selection**:
+  - It is recommended to use an LLM with at least 32 billion parameters.
+  - The context length should be at least 32KB, with 64KB being recommended.
+- **Embedding Model**:
+  - A high-performance Embedding model is essential for RAG.
+  - We recommend using mainstream multilingual Embedding models, such as: `BAAI/bge-m3` and `text-embedding-3-large`.
+  - **Important Note**: The Embedding model must be determined before document indexing, and the same model must be used during the document query phase.
+- **Reranker Model Configuration**:
+  - Configuring a Reranker model can significantly enhance LightRAG's retrieval performance.
+  - When a Reranker model is enabled, it is recommended to set the "mix mode" as the default query mode.
+  - We recommend using mainstream Reranker models, such as: `BAAI/bge-reranker-v2-m3` or models provided by services like Jina.
+
 ### Quick Start for LightRAG Server

 * For more information about LightRAG Server, please refer to [LightRAG Server](./lightrag/api/README.md).