docs: Enterprise Edition & Multi-tenancy attribution (#5 )

* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad.

* Add multi-tenant testing strategy and ADR index documentation

- Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details.
- Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles.

* feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise

- Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints.
- Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options.
- Documented the LightRAG architecture, storage backends, LLM integrations, and query modes.
- Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.

2025-12-04 18:09:15 +08:00

17 KiB

Raw Blame History

LightRAG LLM Integration

Complete guide to configuring LLM providers and embedding models

Version: 1.4.9.1 | Last Updated: December 2025

Overview
Supported Providers
OpenAI Integration
Ollama Integration
Azure OpenAI
AWS Bedrock
Anthropic Claude
Other Providers
Embedding Models
Reranking
Configuration Reference

Overview

LightRAG requires two core AI components:

┌─────────────────────────────────────────────────────────────────────────┐
│                        LLM Integration Architecture                     │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                     LightRAG Core                                │   │
│  └───────────────────────────┬─────────────────────────────────────┘   │
│                              │                                          │
│              ┌───────────────┴───────────────┐                         │
│              │                               │                         │
│              ▼                               ▼                         │
│  ┌───────────────────────┐       ┌───────────────────────┐            │
│  │    LLM Function       │       │   Embedding Function  │            │
│  │    (Text Generation)  │       │   (Vector Creation)   │            │
│  └───────────┬───────────┘       └───────────┬───────────┘            │
│              │                               │                         │
│              ▼                               ▼                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                      Provider Bindings                           │   │
│  │                                                                  │   │
│  │  OpenAI │ Ollama │ Azure │ Bedrock │ Anthropic │ HuggingFace   │   │
│  │  Jina   │ lollms │ NVIDIA │ SiliconCloud │ ZhipuAI │ ...        │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

LLM Function Requirements

The LLM function is used for:

Entity/Relation extraction from text chunks
Description summarization during merges
Query keyword extraction
Response generation from context

Embedding Function Requirements

The embedding function is used for:

Converting text chunks to vectors
Converting entities/relations to vectors
Query embedding for similarity search

Supported Providers

Provider	LLM	Embedding	Rerank	Module
OpenAI	✅	✅	❌	`lightrag.llm.openai`
Ollama	✅	✅	❌	`lightrag.llm.ollama`
Azure OpenAI	✅	✅	❌	`lightrag.llm.azure_openai`
AWS Bedrock	✅	✅	❌	`lightrag.llm.bedrock`
Anthropic	✅	❌	❌	`lightrag.llm.anthropic`
Jina AI	❌	✅	✅	`lightrag.llm.jina`
HuggingFace	✅	✅	❌	`lightrag.llm.hf`
NVIDIA	✅	✅	❌	`lightrag.llm.nvidia_openai`
SiliconCloud	✅	✅	❌	`lightrag.llm.siliconcloud`
ZhipuAI	✅	✅	❌	`lightrag.llm.zhipu`
lollms	✅	✅	❌	`lightrag.llm.lollms`
LMDeploy	✅	❌	❌	`lightrag.llm.lmdeploy`

OpenAI Integration

Basic Setup

import os
from lightrag import LightRAG
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed

# Set API key
os.environ["OPENAI_API_KEY"] = "sk-..."

rag = LightRAG(
    working_dir="./rag_storage",
    llm_model_func=gpt_4o_mini_complete,
    llm_model_name="gpt-4o-mini",
    embedding_func=openai_embed,
)

await rag.initialize_storages()

Available Models

from lightrag.llm.openai import (
    gpt_4o_complete,          # GPT-4o (flagship)
    gpt_4o_mini_complete,     # GPT-4o-mini (cost-effective)
    openai_complete,          # Generic OpenAI completion
    openai_embed,             # text-embedding-3-small
)

Environment Variables

# Required
OPENAI_API_KEY=sk-...

# Optional
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_DIM=1536

Advanced Configuration

from lightrag.llm.openai import create_openai_async_client, openai_complete_if_cache

# Custom client
client = create_openai_async_client(
    api_key="sk-...",
    base_url="https://your-proxy.com/v1",
    client_configs={"timeout": 60.0}
)

# Custom model configuration
rag = LightRAG(
    llm_model_func=lambda prompt, **kwargs: openai_complete_if_cache(
        model="gpt-4-turbo",
        prompt=prompt,
        system_prompt="You are a knowledge extraction expert.",
        **kwargs
    ),
    llm_model_name="gpt-4-turbo",
    llm_model_kwargs={
        "temperature": 0.7,
        "max_tokens": 4096,
    },
    embedding_func=openai_embed,
)

Ollama Integration

Prerequisites

Install and run Ollama:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull llama3.2:3b
ollama pull nomic-embed-text

Basic Setup

from lightrag import LightRAG
from lightrag.llm.ollama import ollama_model_complete, ollama_embed

rag = LightRAG(
    working_dir="./rag_storage",
    llm_model_func=ollama_model_complete,
    llm_model_name="llama3.2:3b",
    embedding_func=lambda texts: ollama_embed(
        texts, 
        embed_model="nomic-embed-text"
    ),
)

Environment Variables

# Ollama server
LLM_BINDING_HOST=http://localhost:11434
OLLAMA_HOST=http://localhost:11434

# Model configuration
LLM_MODEL=llama3.2:3b
EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_DIM=768

Server Mode Configuration

For LightRAG API server with Ollama:

# Start API server with Ollama
python -m lightrag.api.lightrag_server \
    --llm-binding ollama \
    --llm-model llama3.2:3b \
    --embedding-binding ollama \
    --embedding-model nomic-embed-text \
    --llm-binding-host http://localhost:11434 \
    --embedding-binding-host http://localhost:11434

Ollama Models Recommended

Purpose	Model	Size	Notes
LLM (Fast)	`llama3.2:3b`	2GB	Good balance
LLM (Quality)	`llama3.1:8b`	4.7GB	Better extraction
LLM (Best)	`llama3.1:70b`	40GB	Production quality
Embedding	`nomic-embed-text`	274MB	General purpose
Embedding	`bge-m3`	2.2GB	Multilingual
Embedding	`mxbai-embed-large`	669MB	High quality

Azure OpenAI

Setup

from lightrag import LightRAG
from lightrag.llm.azure_openai import azure_openai_complete, azure_embed

rag = LightRAG(
    working_dir="./rag_storage",
    llm_model_func=azure_openai_complete,
    llm_model_name="gpt-4o-deployment",
    embedding_func=azure_embed,
)

Environment Variables

# Required
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01

# Deployment names
AZURE_OPENAI_DEPLOYMENT=gpt-4o-deployment
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-deployment

# Optional
AZURE_OPENAI_EMBEDDING_DIM=1536

Server Mode

python -m lightrag.api.lightrag_server \
    --llm-binding azure_openai \
    --llm-model gpt-4o \
    --embedding-binding azure_openai \
    --embedding-model text-embedding-3-small

AWS Bedrock

Setup

from lightrag import LightRAG
from lightrag.llm.bedrock import bedrock_complete, bedrock_embed

rag = LightRAG(
    working_dir="./rag_storage",
    llm_model_func=bedrock_complete,
    llm_model_name="anthropic.claude-3-sonnet-20240229-v1:0",
    embedding_func=bedrock_embed,
)

Environment Variables

# AWS credentials
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_DEFAULT_REGION=us-east-1

# Bedrock configuration
BEDROCK_LLM_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0

Supported Bedrock Models

Provider	Model ID	Type
Anthropic	`anthropic.claude-3-sonnet-20240229-v1:0`	LLM
Anthropic	`anthropic.claude-3-haiku-20240307-v1:0`	LLM
Amazon	`amazon.titan-text-express-v1`	LLM
Amazon	`amazon.titan-embed-text-v2:0`	Embedding
Cohere	`cohere.embed-multilingual-v3`	Embedding

Anthropic Claude

Setup

from lightrag import LightRAG
from lightrag.llm.anthropic import anthropic_complete
from lightrag.llm.openai import openai_embed  # Use OpenAI for embedding

rag = LightRAG(
    working_dir="./rag_storage",
    llm_model_func=anthropic_complete,
    llm_model_name="claude-3-5-sonnet-20241022",
    embedding_func=openai_embed,  # Anthropic doesn't have embeddings
)

Environment Variables

ANTHROPIC_API_KEY=sk-ant-...

# Model selection
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

Available Models

Model	Context	Best For
`claude-3-5-sonnet-20241022`	200K	Best quality
`claude-3-haiku-20240307`	200K	Fast & cheap
`claude-3-opus-20240229`	200K	Complex tasks

Other Providers

Jina AI (Embedding + Rerank)

from lightrag.llm.jina import jina_embed, jina_rerank
from lightrag.llm.openai import gpt_4o_mini_complete

rag = LightRAG(
    llm_model_func=gpt_4o_mini_complete,
    embedding_func=jina_embed,
    rerank_model_func=jina_rerank,
)

JINA_API_KEY=jina_...
JINA_EMBEDDING_MODEL=jina-embeddings-v3
JINA_RERANK_MODEL=jina-reranker-v2-base-multilingual

HuggingFace

from lightrag.llm.hf import hf_model_complete, hf_embed

rag = LightRAG(
    llm_model_func=hf_model_complete,
    llm_model_name="meta-llama/Llama-3.2-3B-Instruct",
    embedding_func=hf_embed,
)

HF_TOKEN=hf_...
HF_MODEL=meta-llama/Llama-3.2-3B-Instruct
HF_EMBEDDING_MODEL=BAAI/bge-large-en-v1.5

NVIDIA NIM

from lightrag.llm.nvidia_openai import nvidia_complete, nvidia_embed

rag = LightRAG(
    llm_model_func=nvidia_complete,
    llm_model_name="meta/llama-3.1-70b-instruct",
    embedding_func=nvidia_embed,
)

NVIDIA_API_KEY=nvapi-...
NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1

SiliconCloud

from lightrag.llm.siliconcloud import siliconcloud_complete, siliconcloud_embed

rag = LightRAG(
    llm_model_func=siliconcloud_complete,
    llm_model_name="Qwen/Qwen2.5-72B-Instruct",
    embedding_func=siliconcloud_embed,
)

SILICONCLOUD_API_KEY=sk-...

ZhipuAI (GLM)

from lightrag.llm.zhipu import zhipu_complete, zhipu_embed

rag = LightRAG(
    llm_model_func=zhipu_complete,
    llm_model_name="glm-4-flash",
    embedding_func=zhipu_embed,
)

ZHIPUAI_API_KEY=...

Embedding Models

Embedding Function Signature

from lightrag.utils import EmbeddingFunc
import numpy as np

async def custom_embedding(texts: list[str]) -> np.ndarray:
    """
    Args:
        texts: List of strings to embed
        
    Returns:
        np.ndarray: Shape (len(texts), embedding_dim)
    """
    # Your embedding logic
    return embeddings

Wrapping Custom Embeddings

from lightrag.utils import wrap_embedding_func_with_attrs

@wrap_embedding_func_with_attrs(
    embedding_dim=1024,
    max_token_size=8192,
)
async def my_embed(texts: list[str]) -> np.ndarray:
    # Implementation
    pass

Embedding Dimension Reference

Provider	Model	Dimension
OpenAI	text-embedding-3-small	1536
OpenAI	text-embedding-3-large	3072
OpenAI	text-embedding-ada-002	1536
Ollama	nomic-embed-text	768
Ollama	bge-m3	1024
Ollama	mxbai-embed-large	1024
Jina	jina-embeddings-v3	1024
Cohere	embed-multilingual-v3	1024
HuggingFace	BAAI/bge-large-en-v1.5	1024

Batched Embedding

rag = LightRAG(
    embedding_func=openai_embed,
    embedding_batch_num=20,           # Process 20 texts per batch
    embedding_func_max_async=8,       # Max concurrent batches
    default_embedding_timeout=30,     # Timeout per batch (seconds)
)

Reranking

Enabling Reranking

from lightrag import LightRAG
from lightrag.llm.jina import jina_rerank

rag = LightRAG(
    # ... other config
    rerank_model_func=jina_rerank,
    min_rerank_score=0.3,  # Filter chunks below this score
)

Reranking Providers

Provider	Function	Model
Jina AI	`jina_rerank`	jina-reranker-v2-base-multilingual
Cohere	Custom	rerank-english-v3.0

Query-Time Reranking

from lightrag.base import QueryParam

result = await rag.aquery(
    "What is the capital of France?",
    param=QueryParam(
        enable_rerank=True,     # Enable for this query
        chunk_top_k=50,         # Retrieve more, rerank to top_k
    )
)

Configuration Reference

LightRAG LLM Parameters

rag = LightRAG(
    # LLM Configuration
    llm_model_func=gpt_4o_mini_complete,    # LLM function
    llm_model_name="gpt-4o-mini",           # Model name for logging
    llm_model_kwargs={                       # Passed to LLM function
        "temperature": 1.0,
        "max_tokens": 4096,
    },
    llm_model_max_async=4,                  # Concurrent LLM calls
    default_llm_timeout=180,                # Timeout (seconds)
    
    # Caching
    enable_llm_cache=True,                  # Cache LLM responses
    enable_llm_cache_for_entity_extract=True,  # Cache extraction
)

Environment Variables Summary

# === OpenAI ===
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=https://api.openai.com/v1

# === Ollama ===
LLM_BINDING_HOST=http://localhost:11434
EMBEDDING_BINDING_HOST=http://localhost:11434
LLM_MODEL=llama3.2:3b
EMBEDDING_MODEL=nomic-embed-text

# === Azure OpenAI ===
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://...openai.azure.com
AZURE_OPENAI_API_VERSION=2024-02-01
AZURE_OPENAI_DEPLOYMENT=gpt-4o

# === AWS Bedrock ===
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=us-east-1
BEDROCK_LLM_MODEL=anthropic.claude-3-sonnet-20240229-v1:0

# === Anthropic ===
ANTHROPIC_API_KEY=sk-ant-...

# === Jina ===
JINA_API_KEY=jina_...

# === Processing ===
MAX_ASYNC=4
EMBEDDING_BATCH_NUM=10
LLM_TIMEOUT=180
EMBEDDING_TIMEOUT=30

API Server Bindings

# LLM binding options
--llm-binding openai|ollama|azure_openai|aws_bedrock|lollms

# Embedding binding options  
--embedding-binding openai|ollama|azure_openai|aws_bedrock|jina|lollms

# Rerank binding options
--rerank-binding null|jina

Best Practices

Production Recommendations

Use OpenAI for best extraction quality
Use Ollama for cost-effective local deployment
Enable LLM caching to reduce costs
Set appropriate timeouts for reliability
Monitor token usage for cost control

Cost Optimization

rag = LightRAG(
    # Use cost-effective model
    llm_model_func=gpt_4o_mini_complete,
    
    # Enable aggressive caching
    enable_llm_cache=True,
    enable_llm_cache_for_entity_extract=True,
    
    # Limit parallel processing
    llm_model_max_async=2,
    
    # Use smaller chunks
    chunk_token_size=800,
)

Quality Optimization

rag = LightRAG(
    # Use best model
    llm_model_func=gpt_4o_complete,
    llm_model_name="gpt-4o",
    
    # More extraction attempts
    entity_extract_max_gleaning=2,
    
    # Larger chunks for context
    chunk_token_size=1500,
    chunk_overlap_token_size=200,
    
    # Enable reranking
    rerank_model_func=jina_rerank,
)

Version: 1.4.9.1 | License: MIT

17 KiB Raw Blame History

LightRAG LLM Integration

Table of Contents

Overview

LLM Function Requirements

Embedding Function Requirements

Supported Providers

OpenAI Integration

Basic Setup

Available Models

Environment Variables

Advanced Configuration

Ollama Integration

Prerequisites

Basic Setup

Environment Variables

Server Mode Configuration

Ollama Models Recommended

Azure OpenAI

Setup

Environment Variables

Server Mode

AWS Bedrock

Setup

Environment Variables

Supported Bedrock Models

Anthropic Claude

Setup

Environment Variables

Available Models

Other Providers

Jina AI (Embedding + Rerank)

HuggingFace

NVIDIA NIM

SiliconCloud

ZhipuAI (GLM)

Embedding Models

Embedding Function Signature

Wrapping Custom Embeddings

Embedding Dimension Reference

Batched Embedding

Reranking

Enabling Reranking

Reranking Providers

Query-Time Reranking

Configuration Reference

LightRAG LLM Parameters

Environment Variables Summary

API Server Bindings

Best Practices

Production Recommendations

Cost Optimization

Quality Optimization

17 KiB

Raw Blame History