* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad. * Add multi-tenant testing strategy and ADR index documentation - Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details. - Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles. * feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise - Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints. - Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options. - Documented the LightRAG architecture, storage backends, LLM integrations, and query modes. - Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.
17 KiB
17 KiB
LightRAG LLM Integration
Complete guide to configuring LLM providers and embedding models
Version: 1.4.9.1 | Last Updated: December 2025
Table of Contents
- Overview
- Supported Providers
- OpenAI Integration
- Ollama Integration
- Azure OpenAI
- AWS Bedrock
- Anthropic Claude
- Other Providers
- Embedding Models
- Reranking
- Configuration Reference
Overview
LightRAG requires two core AI components:
┌─────────────────────────────────────────────────────────────────────────┐
│ LLM Integration Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ LightRAG Core │ │
│ └───────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────────────────┐ ┌───────────────────────┐ │
│ │ LLM Function │ │ Embedding Function │ │
│ │ (Text Generation) │ │ (Vector Creation) │ │
│ └───────────┬───────────┘ └───────────┬───────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Provider Bindings │ │
│ │ │ │
│ │ OpenAI │ Ollama │ Azure │ Bedrock │ Anthropic │ HuggingFace │ │
│ │ Jina │ lollms │ NVIDIA │ SiliconCloud │ ZhipuAI │ ... │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
LLM Function Requirements
The LLM function is used for:
- Entity/Relation extraction from text chunks
- Description summarization during merges
- Query keyword extraction
- Response generation from context
Embedding Function Requirements
The embedding function is used for:
- Converting text chunks to vectors
- Converting entities/relations to vectors
- Query embedding for similarity search
Supported Providers
| Provider | LLM | Embedding | Rerank | Module |
|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ❌ | lightrag.llm.openai |
| Ollama | ✅ | ✅ | ❌ | lightrag.llm.ollama |
| Azure OpenAI | ✅ | ✅ | ❌ | lightrag.llm.azure_openai |
| AWS Bedrock | ✅ | ✅ | ❌ | lightrag.llm.bedrock |
| Anthropic | ✅ | ❌ | ❌ | lightrag.llm.anthropic |
| Jina AI | ❌ | ✅ | ✅ | lightrag.llm.jina |
| HuggingFace | ✅ | ✅ | ❌ | lightrag.llm.hf |
| NVIDIA | ✅ | ✅ | ❌ | lightrag.llm.nvidia_openai |
| SiliconCloud | ✅ | ✅ | ❌ | lightrag.llm.siliconcloud |
| ZhipuAI | ✅ | ✅ | ❌ | lightrag.llm.zhipu |
| lollms | ✅ | ✅ | ❌ | lightrag.llm.lollms |
| LMDeploy | ✅ | ❌ | ❌ | lightrag.llm.lmdeploy |
OpenAI Integration
Basic Setup
import os
from lightrag import LightRAG
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
# Set API key
os.environ["OPENAI_API_KEY"] = "sk-..."
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=gpt_4o_mini_complete,
llm_model_name="gpt-4o-mini",
embedding_func=openai_embed,
)
await rag.initialize_storages()
Available Models
from lightrag.llm.openai import (
gpt_4o_complete, # GPT-4o (flagship)
gpt_4o_mini_complete, # GPT-4o-mini (cost-effective)
openai_complete, # Generic OpenAI completion
openai_embed, # text-embedding-3-small
)
Environment Variables
# Required
OPENAI_API_KEY=sk-...
# Optional
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_DIM=1536
Advanced Configuration
from lightrag.llm.openai import create_openai_async_client, openai_complete_if_cache
# Custom client
client = create_openai_async_client(
api_key="sk-...",
base_url="https://your-proxy.com/v1",
client_configs={"timeout": 60.0}
)
# Custom model configuration
rag = LightRAG(
llm_model_func=lambda prompt, **kwargs: openai_complete_if_cache(
model="gpt-4-turbo",
prompt=prompt,
system_prompt="You are a knowledge extraction expert.",
**kwargs
),
llm_model_name="gpt-4-turbo",
llm_model_kwargs={
"temperature": 0.7,
"max_tokens": 4096,
},
embedding_func=openai_embed,
)
Ollama Integration
Prerequisites
Install and run Ollama:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull models
ollama pull llama3.2:3b
ollama pull nomic-embed-text
Basic Setup
from lightrag import LightRAG
from lightrag.llm.ollama import ollama_model_complete, ollama_embed
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=ollama_model_complete,
llm_model_name="llama3.2:3b",
embedding_func=lambda texts: ollama_embed(
texts,
embed_model="nomic-embed-text"
),
)
Environment Variables
# Ollama server
LLM_BINDING_HOST=http://localhost:11434
OLLAMA_HOST=http://localhost:11434
# Model configuration
LLM_MODEL=llama3.2:3b
EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_DIM=768
Server Mode Configuration
For LightRAG API server with Ollama:
# Start API server with Ollama
python -m lightrag.api.lightrag_server \
--llm-binding ollama \
--llm-model llama3.2:3b \
--embedding-binding ollama \
--embedding-model nomic-embed-text \
--llm-binding-host http://localhost:11434 \
--embedding-binding-host http://localhost:11434
Ollama Models Recommended
| Purpose | Model | Size | Notes |
|---|---|---|---|
| LLM (Fast) | llama3.2:3b |
2GB | Good balance |
| LLM (Quality) | llama3.1:8b |
4.7GB | Better extraction |
| LLM (Best) | llama3.1:70b |
40GB | Production quality |
| Embedding | nomic-embed-text |
274MB | General purpose |
| Embedding | bge-m3 |
2.2GB | Multilingual |
| Embedding | mxbai-embed-large |
669MB | High quality |
Azure OpenAI
Setup
from lightrag import LightRAG
from lightrag.llm.azure_openai import azure_openai_complete, azure_embed
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=azure_openai_complete,
llm_model_name="gpt-4o-deployment",
embedding_func=azure_embed,
)
Environment Variables
# Required
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01
# Deployment names
AZURE_OPENAI_DEPLOYMENT=gpt-4o-deployment
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-deployment
# Optional
AZURE_OPENAI_EMBEDDING_DIM=1536
Server Mode
python -m lightrag.api.lightrag_server \
--llm-binding azure_openai \
--llm-model gpt-4o \
--embedding-binding azure_openai \
--embedding-model text-embedding-3-small
AWS Bedrock
Setup
from lightrag import LightRAG
from lightrag.llm.bedrock import bedrock_complete, bedrock_embed
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=bedrock_complete,
llm_model_name="anthropic.claude-3-sonnet-20240229-v1:0",
embedding_func=bedrock_embed,
)
Environment Variables
# AWS credentials
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_DEFAULT_REGION=us-east-1
# Bedrock configuration
BEDROCK_LLM_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
Supported Bedrock Models
| Provider | Model ID | Type |
|---|---|---|
| Anthropic | anthropic.claude-3-sonnet-20240229-v1:0 |
LLM |
| Anthropic | anthropic.claude-3-haiku-20240307-v1:0 |
LLM |
| Amazon | amazon.titan-text-express-v1 |
LLM |
| Amazon | amazon.titan-embed-text-v2:0 |
Embedding |
| Cohere | cohere.embed-multilingual-v3 |
Embedding |
Anthropic Claude
Setup
from lightrag import LightRAG
from lightrag.llm.anthropic import anthropic_complete
from lightrag.llm.openai import openai_embed # Use OpenAI for embedding
rag = LightRAG(
working_dir="./rag_storage",
llm_model_func=anthropic_complete,
llm_model_name="claude-3-5-sonnet-20241022",
embedding_func=openai_embed, # Anthropic doesn't have embeddings
)
Environment Variables
ANTHROPIC_API_KEY=sk-ant-...
# Model selection
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
Available Models
| Model | Context | Best For |
|---|---|---|
claude-3-5-sonnet-20241022 |
200K | Best quality |
claude-3-haiku-20240307 |
200K | Fast & cheap |
claude-3-opus-20240229 |
200K | Complex tasks |
Other Providers
Jina AI (Embedding + Rerank)
from lightrag.llm.jina import jina_embed, jina_rerank
from lightrag.llm.openai import gpt_4o_mini_complete
rag = LightRAG(
llm_model_func=gpt_4o_mini_complete,
embedding_func=jina_embed,
rerank_model_func=jina_rerank,
)
JINA_API_KEY=jina_...
JINA_EMBEDDING_MODEL=jina-embeddings-v3
JINA_RERANK_MODEL=jina-reranker-v2-base-multilingual
HuggingFace
from lightrag.llm.hf import hf_model_complete, hf_embed
rag = LightRAG(
llm_model_func=hf_model_complete,
llm_model_name="meta-llama/Llama-3.2-3B-Instruct",
embedding_func=hf_embed,
)
HF_TOKEN=hf_...
HF_MODEL=meta-llama/Llama-3.2-3B-Instruct
HF_EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
NVIDIA NIM
from lightrag.llm.nvidia_openai import nvidia_complete, nvidia_embed
rag = LightRAG(
llm_model_func=nvidia_complete,
llm_model_name="meta/llama-3.1-70b-instruct",
embedding_func=nvidia_embed,
)
NVIDIA_API_KEY=nvapi-...
NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1
SiliconCloud
from lightrag.llm.siliconcloud import siliconcloud_complete, siliconcloud_embed
rag = LightRAG(
llm_model_func=siliconcloud_complete,
llm_model_name="Qwen/Qwen2.5-72B-Instruct",
embedding_func=siliconcloud_embed,
)
SILICONCLOUD_API_KEY=sk-...
ZhipuAI (GLM)
from lightrag.llm.zhipu import zhipu_complete, zhipu_embed
rag = LightRAG(
llm_model_func=zhipu_complete,
llm_model_name="glm-4-flash",
embedding_func=zhipu_embed,
)
ZHIPUAI_API_KEY=...
Embedding Models
Embedding Function Signature
from lightrag.utils import EmbeddingFunc
import numpy as np
async def custom_embedding(texts: list[str]) -> np.ndarray:
"""
Args:
texts: List of strings to embed
Returns:
np.ndarray: Shape (len(texts), embedding_dim)
"""
# Your embedding logic
return embeddings
Wrapping Custom Embeddings
from lightrag.utils import wrap_embedding_func_with_attrs
@wrap_embedding_func_with_attrs(
embedding_dim=1024,
max_token_size=8192,
)
async def my_embed(texts: list[str]) -> np.ndarray:
# Implementation
pass
Embedding Dimension Reference
| Provider | Model | Dimension |
|---|---|---|
| OpenAI | text-embedding-3-small | 1536 |
| OpenAI | text-embedding-3-large | 3072 |
| OpenAI | text-embedding-ada-002 | 1536 |
| Ollama | nomic-embed-text | 768 |
| Ollama | bge-m3 | 1024 |
| Ollama | mxbai-embed-large | 1024 |
| Jina | jina-embeddings-v3 | 1024 |
| Cohere | embed-multilingual-v3 | 1024 |
| HuggingFace | BAAI/bge-large-en-v1.5 | 1024 |
Batched Embedding
rag = LightRAG(
embedding_func=openai_embed,
embedding_batch_num=20, # Process 20 texts per batch
embedding_func_max_async=8, # Max concurrent batches
default_embedding_timeout=30, # Timeout per batch (seconds)
)
Reranking
Enabling Reranking
from lightrag import LightRAG
from lightrag.llm.jina import jina_rerank
rag = LightRAG(
# ... other config
rerank_model_func=jina_rerank,
min_rerank_score=0.3, # Filter chunks below this score
)
Reranking Providers
| Provider | Function | Model |
|---|---|---|
| Jina AI | jina_rerank |
jina-reranker-v2-base-multilingual |
| Cohere | Custom | rerank-english-v3.0 |
Query-Time Reranking
from lightrag.base import QueryParam
result = await rag.aquery(
"What is the capital of France?",
param=QueryParam(
enable_rerank=True, # Enable for this query
chunk_top_k=50, # Retrieve more, rerank to top_k
)
)
Configuration Reference
LightRAG LLM Parameters
rag = LightRAG(
# LLM Configuration
llm_model_func=gpt_4o_mini_complete, # LLM function
llm_model_name="gpt-4o-mini", # Model name for logging
llm_model_kwargs={ # Passed to LLM function
"temperature": 1.0,
"max_tokens": 4096,
},
llm_model_max_async=4, # Concurrent LLM calls
default_llm_timeout=180, # Timeout (seconds)
# Caching
enable_llm_cache=True, # Cache LLM responses
enable_llm_cache_for_entity_extract=True, # Cache extraction
)
Environment Variables Summary
# === OpenAI ===
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=https://api.openai.com/v1
# === Ollama ===
LLM_BINDING_HOST=http://localhost:11434
EMBEDDING_BINDING_HOST=http://localhost:11434
LLM_MODEL=llama3.2:3b
EMBEDDING_MODEL=nomic-embed-text
# === Azure OpenAI ===
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://...openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01
AZURE_OPENAI_DEPLOYMENT=gpt-4o
# === AWS Bedrock ===
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=us-east-1
BEDROCK_LLM_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
# === Anthropic ===
ANTHROPIC_API_KEY=sk-ant-...
# === Jina ===
JINA_API_KEY=jina_...
# === Processing ===
MAX_ASYNC=4
EMBEDDING_BATCH_NUM=10
LLM_TIMEOUT=180
EMBEDDING_TIMEOUT=30
API Server Bindings
# LLM binding options
--llm-binding openai|ollama|azure_openai|aws_bedrock|lollms
# Embedding binding options
--embedding-binding openai|ollama|azure_openai|aws_bedrock|jina|lollms
# Rerank binding options
--rerank-binding null|jina
Best Practices
Production Recommendations
- Use OpenAI for best extraction quality
- Use Ollama for cost-effective local deployment
- Enable LLM caching to reduce costs
- Set appropriate timeouts for reliability
- Monitor token usage for cost control
Cost Optimization
rag = LightRAG(
# Use cost-effective model
llm_model_func=gpt_4o_mini_complete,
# Enable aggressive caching
enable_llm_cache=True,
enable_llm_cache_for_entity_extract=True,
# Limit parallel processing
llm_model_max_async=2,
# Use smaller chunks
chunk_token_size=800,
)
Quality Optimization
rag = LightRAG(
# Use best model
llm_model_func=gpt_4o_complete,
llm_model_name="gpt-4o",
# More extraction attempts
entity_extract_max_gleaning=2,
# Larger chunks for context
chunk_token_size=1500,
chunk_overlap_token_size=200,
# Enable reranking
rerank_model_func=jina_rerank,
)
Version: 1.4.9.1 | License: MIT