LightRAG/docs/0005-llm-integration.md
Raphael MANSUY 2b292d4924
docs: Enterprise Edition & Multi-tenancy attribution (#5)
* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad.

* Add multi-tenant testing strategy and ADR index documentation

- Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details.
- Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles.

* feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise

- Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints.
- Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options.
- Documented the LightRAG architecture, storage backends, LLM integrations, and query modes.
- Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
2025-12-04 18:09:15 +08:00

678 lines
17 KiB
Markdown

# LightRAG LLM Integration
> Complete guide to configuring LLM providers and embedding models
**Version**: 1.4.9.1 | **Last Updated**: December 2025
---
## Table of Contents
1. [Overview](#overview)
2. [Supported Providers](#supported-providers)
3. [OpenAI Integration](#openai-integration)
4. [Ollama Integration](#ollama-integration)
5. [Azure OpenAI](#azure-openai)
6. [AWS Bedrock](#aws-bedrock)
7. [Anthropic Claude](#anthropic-claude)
8. [Other Providers](#other-providers)
9. [Embedding Models](#embedding-models)
10. [Reranking](#reranking)
11. [Configuration Reference](#configuration-reference)
---
## Overview
LightRAG requires two core AI components:
```
┌─────────────────────────────────────────────────────────────────────────┐
│ LLM Integration Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ LightRAG Core │ │
│ └───────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────────────────┐ ┌───────────────────────┐ │
│ │ LLM Function │ │ Embedding Function │ │
│ │ (Text Generation) │ │ (Vector Creation) │ │
│ └───────────┬───────────┘ └───────────┬───────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Provider Bindings │ │
│ │ │ │
│ │ OpenAI │ Ollama │ Azure │ Bedrock │ Anthropic │ HuggingFace │ │
│ │ Jina │ lollms │ NVIDIA │ SiliconCloud │ ZhipuAI │ ... │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
### LLM Function Requirements
The LLM function is used for:
- Entity/Relation extraction from text chunks
- Description summarization during merges
- Query keyword extraction
- Response generation from context
### Embedding Function Requirements
The embedding function is used for:
- Converting text chunks to vectors
- Converting entities/relations to vectors
- Query embedding for similarity search
---
## Supported Providers
| Provider | LLM | Embedding | Rerank | Module |
|----------|-----|-----------|--------|--------|
| **OpenAI** | ✅ | ✅ | ❌ | `lightrag.llm.openai` |
| **Ollama** | ✅ | ✅ | ❌ | `lightrag.llm.ollama` |
| **Azure OpenAI** | ✅ | ✅ | ❌ | `lightrag.llm.azure_openai` |
| **AWS Bedrock** | ✅ | ✅ | ❌ | `lightrag.llm.bedrock` |
| **Anthropic** | ✅ | ❌ | ❌ | `lightrag.llm.anthropic` |
| **Jina AI** | ❌ | ✅ | ✅ | `lightrag.llm.jina` |
| **HuggingFace** | ✅ | ✅ | ❌ | `lightrag.llm.hf` |
| **NVIDIA** | ✅ | ✅ | ❌ | `lightrag.llm.nvidia_openai` |
| **SiliconCloud** | ✅ | ✅ | ❌ | `lightrag.llm.siliconcloud` |
| **ZhipuAI** | ✅ | ✅ | ❌ | `lightrag.llm.zhipu` |
| **lollms** | ✅ | ✅ | ❌ | `lightrag.llm.lollms` |
| **LMDeploy** | ✅ | ❌ | ❌ | `lightrag.llm.lmdeploy` |
---
## OpenAI Integration
### Basic Setup
```python
import os
from lightrag import LightRAG
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
# Set API key
os.environ["OPENAI_API_KEY"] = "sk-..."
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=gpt_4o_mini_complete,
llm_model_name="gpt-4o-mini",
embedding_func=openai_embed,
)
await rag.initialize_storages()
```
### Available Models
```python
from lightrag.llm.openai import (
gpt_4o_complete, # GPT-4o (flagship)
gpt_4o_mini_complete, # GPT-4o-mini (cost-effective)
openai_complete, # Generic OpenAI completion
openai_embed, # text-embedding-3-small
)
```
### Environment Variables
```bash
# Required
OPENAI_API_KEY=sk-...
# Optional
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_DIM=1536
```
### Advanced Configuration
```python
from lightrag.llm.openai import create_openai_async_client, openai_complete_if_cache
# Custom client
client = create_openai_async_client(
api_key="sk-...",
base_url="https://your-proxy.com/v1",
client_configs={"timeout": 60.0}
)
# Custom model configuration
rag = LightRAG(
llm_model_func=lambda prompt, **kwargs: openai_complete_if_cache(
model="gpt-4-turbo",
prompt=prompt,
system_prompt="You are a knowledge extraction expert.",
**kwargs
),
llm_model_name="gpt-4-turbo",
llm_model_kwargs={
"temperature": 0.7,
"max_tokens": 4096,
},
embedding_func=openai_embed,
)
```
---
## Ollama Integration
### Prerequisites
Install and run Ollama:
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull models
ollama pull llama3.2:3b
ollama pull nomic-embed-text
```
### Basic Setup
```python
from lightrag import LightRAG
from lightrag.llm.ollama import ollama_model_complete, ollama_embed
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=ollama_model_complete,
llm_model_name="llama3.2:3b",
embedding_func=lambda texts: ollama_embed(
texts,
embed_model="nomic-embed-text"
),
)
```
### Environment Variables
```bash
# Ollama server
LLM_BINDING_HOST=http://localhost:11434
OLLAMA_HOST=http://localhost:11434
# Model configuration
LLM_MODEL=llama3.2:3b
EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_DIM=768
```
### Server Mode Configuration
For LightRAG API server with Ollama:
```bash
# Start API server with Ollama
python -m lightrag.api.lightrag_server \
--llm-binding ollama \
--llm-model llama3.2:3b \
--embedding-binding ollama \
--embedding-model nomic-embed-text \
--llm-binding-host http://localhost:11434 \
--embedding-binding-host http://localhost:11434
```
### Ollama Models Recommended
| Purpose | Model | Size | Notes |
|---------|-------|------|-------|
| **LLM (Fast)** | `llama3.2:3b` | 2GB | Good balance |
| **LLM (Quality)** | `llama3.1:8b` | 4.7GB | Better extraction |
| **LLM (Best)** | `llama3.1:70b` | 40GB | Production quality |
| **Embedding** | `nomic-embed-text` | 274MB | General purpose |
| **Embedding** | `bge-m3` | 2.2GB | Multilingual |
| **Embedding** | `mxbai-embed-large` | 669MB | High quality |
---
## Azure OpenAI
### Setup
```python
from lightrag import LightRAG
from lightrag.llm.azure_openai import azure_openai_complete, azure_embed
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=azure_openai_complete,
llm_model_name="gpt-4o-deployment",
embedding_func=azure_embed,
)
```
### Environment Variables
```bash
# Required
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01
# Deployment names
AZURE_OPENAI_DEPLOYMENT=gpt-4o-deployment
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-deployment
# Optional
AZURE_OPENAI_EMBEDDING_DIM=1536
```
### Server Mode
```bash
python -m lightrag.api.lightrag_server \
--llm-binding azure_openai \
--llm-model gpt-4o \
--embedding-binding azure_openai \
--embedding-model text-embedding-3-small
```
---
## AWS Bedrock
### Setup
```python
from lightrag import LightRAG
from lightrag.llm.bedrock import bedrock_complete, bedrock_embed
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=bedrock_complete,
llm_model_name="anthropic.claude-3-sonnet-20240229-v1:0",
embedding_func=bedrock_embed,
)
```
### Environment Variables
```bash
# AWS credentials
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_DEFAULT_REGION=us-east-1
# Bedrock configuration
BEDROCK_LLM_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
```
### Supported Bedrock Models
| Provider | Model ID | Type |
|----------|----------|------|
| Anthropic | `anthropic.claude-3-sonnet-20240229-v1:0` | LLM |
| Anthropic | `anthropic.claude-3-haiku-20240307-v1:0` | LLM |
| Amazon | `amazon.titan-text-express-v1` | LLM |
| Amazon | `amazon.titan-embed-text-v2:0` | Embedding |
| Cohere | `cohere.embed-multilingual-v3` | Embedding |
---
## Anthropic Claude
### Setup
```python
from lightrag import LightRAG
from lightrag.llm.anthropic import anthropic_complete
from lightrag.llm.openai import openai_embed # Use OpenAI for embedding
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=anthropic_complete,
llm_model_name="claude-3-5-sonnet-20241022",
embedding_func=openai_embed, # Anthropic doesn't have embeddings
)
```
### Environment Variables
```bash
ANTHROPIC_API_KEY=sk-ant-...
# Model selection
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
```
### Available Models
| Model | Context | Best For |
|-------|---------|----------|
| `claude-3-5-sonnet-20241022` | 200K | Best quality |
| `claude-3-haiku-20240307` | 200K | Fast & cheap |
| `claude-3-opus-20240229` | 200K | Complex tasks |
---
## Other Providers
### Jina AI (Embedding + Rerank)
```python
from lightrag.llm.jina import jina_embed, jina_rerank
from lightrag.llm.openai import gpt_4o_mini_complete
rag = LightRAG(
llm_model_func=gpt_4o_mini_complete,
embedding_func=jina_embed,
rerank_model_func=jina_rerank,
)
```
```bash
JINA_API_KEY=jina_...
JINA_EMBEDDING_MODEL=jina-embeddings-v3
JINA_RERANK_MODEL=jina-reranker-v2-base-multilingual
```
### HuggingFace
```python
from lightrag.llm.hf import hf_model_complete, hf_embed
rag = LightRAG(
llm_model_func=hf_model_complete,
llm_model_name="meta-llama/Llama-3.2-3B-Instruct",
embedding_func=hf_embed,
)
```
```bash
HF_TOKEN=hf_...
HF_MODEL=meta-llama/Llama-3.2-3B-Instruct
HF_EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
```
### NVIDIA NIM
```python
from lightrag.llm.nvidia_openai import nvidia_complete, nvidia_embed
rag = LightRAG(
llm_model_func=nvidia_complete,
llm_model_name="meta/llama-3.1-70b-instruct",
embedding_func=nvidia_embed,
)
```
```bash
NVIDIA_API_KEY=nvapi-...
NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1
```
### SiliconCloud
```python
from lightrag.llm.siliconcloud import siliconcloud_complete, siliconcloud_embed
rag = LightRAG(
llm_model_func=siliconcloud_complete,
llm_model_name="Qwen/Qwen2.5-72B-Instruct",
embedding_func=siliconcloud_embed,
)
```
```bash
SILICONCLOUD_API_KEY=sk-...
```
### ZhipuAI (GLM)
```python
from lightrag.llm.zhipu import zhipu_complete, zhipu_embed
rag = LightRAG(
llm_model_func=zhipu_complete,
llm_model_name="glm-4-flash",
embedding_func=zhipu_embed,
)
```
```bash
ZHIPUAI_API_KEY=...
```
---
## Embedding Models
### Embedding Function Signature
```python
from lightrag.utils import EmbeddingFunc
import numpy as np
async def custom_embedding(texts: list[str]) -> np.ndarray:
"""
Args:
texts: List of strings to embed
Returns:
np.ndarray: Shape (len(texts), embedding_dim)
"""
# Your embedding logic
return embeddings
```
### Wrapping Custom Embeddings
```python
from lightrag.utils import wrap_embedding_func_with_attrs
@wrap_embedding_func_with_attrs(
embedding_dim=1024,
max_token_size=8192,
)
async def my_embed(texts: list[str]) -> np.ndarray:
# Implementation
pass
```
### Embedding Dimension Reference
| Provider | Model | Dimension |
|----------|-------|-----------|
| OpenAI | text-embedding-3-small | 1536 |
| OpenAI | text-embedding-3-large | 3072 |
| OpenAI | text-embedding-ada-002 | 1536 |
| Ollama | nomic-embed-text | 768 |
| Ollama | bge-m3 | 1024 |
| Ollama | mxbai-embed-large | 1024 |
| Jina | jina-embeddings-v3 | 1024 |
| Cohere | embed-multilingual-v3 | 1024 |
| HuggingFace | BAAI/bge-large-en-v1.5 | 1024 |
### Batched Embedding
```python
rag = LightRAG(
embedding_func=openai_embed,
embedding_batch_num=20, # Process 20 texts per batch
embedding_func_max_async=8, # Max concurrent batches
default_embedding_timeout=30, # Timeout per batch (seconds)
)
```
---
## Reranking
### Enabling Reranking
```python
from lightrag import LightRAG
from lightrag.llm.jina import jina_rerank
rag = LightRAG(
# ... other config
rerank_model_func=jina_rerank,
min_rerank_score=0.3, # Filter chunks below this score
)
```
### Reranking Providers
| Provider | Function | Model |
|----------|----------|-------|
| Jina AI | `jina_rerank` | jina-reranker-v2-base-multilingual |
| Cohere | Custom | rerank-english-v3.0 |
### Query-Time Reranking
```python
from lightrag.base import QueryParam
result = await rag.aquery(
"What is the capital of France?",
param=QueryParam(
enable_rerank=True, # Enable for this query
chunk_top_k=50, # Retrieve more, rerank to top_k
)
)
```
---
## Configuration Reference
### LightRAG LLM Parameters
```python
rag = LightRAG(
# LLM Configuration
llm_model_func=gpt_4o_mini_complete, # LLM function
llm_model_name="gpt-4o-mini", # Model name for logging
llm_model_kwargs={ # Passed to LLM function
"temperature": 1.0,
"max_tokens": 4096,
},
llm_model_max_async=4, # Concurrent LLM calls
default_llm_timeout=180, # Timeout (seconds)
# Caching
enable_llm_cache=True, # Cache LLM responses
enable_llm_cache_for_entity_extract=True, # Cache extraction
)
```
### Environment Variables Summary
```bash
# === OpenAI ===
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=https://api.openai.com/v1
# === Ollama ===
LLM_BINDING_HOST=http://localhost:11434
EMBEDDING_BINDING_HOST=http://localhost:11434
LLM_MODEL=llama3.2:3b
EMBEDDING_MODEL=nomic-embed-text
# === Azure OpenAI ===
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://...openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01
AZURE_OPENAI_DEPLOYMENT=gpt-4o
# === AWS Bedrock ===
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=us-east-1
BEDROCK_LLM_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
# === Anthropic ===
ANTHROPIC_API_KEY=sk-ant-...
# === Jina ===
JINA_API_KEY=jina_...
# === Processing ===
MAX_ASYNC=4
EMBEDDING_BATCH_NUM=10
LLM_TIMEOUT=180
EMBEDDING_TIMEOUT=30
```
### API Server Bindings
```bash
# LLM binding options
--llm-binding openai|ollama|azure_openai|aws_bedrock|lollms
# Embedding binding options
--embedding-binding openai|ollama|azure_openai|aws_bedrock|jina|lollms
# Rerank binding options
--rerank-binding null|jina
```
---
## Best Practices
### Production Recommendations
1. **Use OpenAI** for best extraction quality
2. **Use Ollama** for cost-effective local deployment
3. **Enable LLM caching** to reduce costs
4. **Set appropriate timeouts** for reliability
5. **Monitor token usage** for cost control
### Cost Optimization
```python
rag = LightRAG(
# Use cost-effective model
llm_model_func=gpt_4o_mini_complete,
# Enable aggressive caching
enable_llm_cache=True,
enable_llm_cache_for_entity_extract=True,
# Limit parallel processing
llm_model_max_async=2,
# Use smaller chunks
chunk_token_size=800,
)
```
### Quality Optimization
```python
rag = LightRAG(
# Use best model
llm_model_func=gpt_4o_complete,
llm_model_name="gpt-4o",
# More extraction attempts
entity_extract_max_gleaning=2,
# Larger chunks for context
chunk_token_size=1500,
chunk_overlap_token_size=200,
# Enable reranking
rerank_model_func=jina_rerank,
)
```
---
**Version**: 1.4.9.1 | **License**: MIT