docs: Enterprise Edition & Multi-tenancy attribution (#5 )

* Remove outdated documentation files: Quick Start Guide, Apache AGE Analysis, and Scratchpad.

* Add multi-tenant testing strategy and ADR index documentation

- Introduced ADR 008 detailing the multi-tenant testing strategy for the ./starter environment, covering compatibility and multi-tenant modes, testing scenarios, and implementation details.
- Created a comprehensive ADR index (README.md) summarizing all architecture decision records related to the multi-tenant implementation, including purpose, key sections, and reading paths for different roles.

* feat(docs): Add comprehensive multi-tenancy guide and README for LightRAG Enterprise

- Introduced `0008-multi-tenancy.md` detailing multi-tenancy architecture, key concepts, roles, permissions, configuration, and API endpoints.
- Created `README.md` as the main documentation index, outlining features, quick start, system overview, and deployment options.
- Documented the LightRAG architecture, storage backends, LLM integrations, and query modes.
- Established a task log (`2025-01-21-lightrag-documentation-log.md`) summarizing documentation creation actions, decisions, and insights.

2025-12-04 18:09:15 +08:00

55 KiB

Raw Blame History

LightRAG Architecture Overview

Technical deep-dive into LightRAG's Graph-Enhanced RAG system architecture

Version: 1.4.9.1 | Last Updated: December 2025

System Overview
Core Architecture
Data Flow
Component Breakdown
Storage Architecture
Query Pipeline
Entity-Relationship Diagrams

System Overview

LightRAG is a Graph-Enhanced Retrieval-Augmented Generation framework that combines knowledge graph construction with vector similarity search to provide contextually rich, accurate responses.

Key Differentiators

Feature	Traditional RAG	LightRAG
Retrieval	Vector similarity only	Graph + Vector hybrid
Context	Flat document chunks	Entity-Relation aware
Query Modes	Single mode	5 modes (local/global/hybrid/naive/mix)
Knowledge	Implicit in embeddings	Explicit knowledge graph
Multi-tenant	❌	✅ Full isolation

High-Level Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              LightRAG System                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐  │
│  │   FastAPI   │───▶│   LightRAG  │───▶│   Storage   │───▶│  Backends   │  │
│  │   Server    │    │    Core     │    │   Layer     │    │             │  │
│  └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘  │
│        │                  │                  │                  │          │
│        │                  ▼                  ▼                  ▼          │
│        │           ┌───────────┐      ┌───────────┐      ┌───────────┐    │
│        │           │ Chunking  │      │ KV Store  │      │ PostgreSQL│    │
│        │           │ + Extract │      │ VectorDB  │      │ MongoDB   │    │
│        │           │ + Merge   │      │ GraphDB   │      │ Neo4j     │    │
│        │           └───────────┘      │ DocStatus │      │ Redis     │    │
│        │                              └───────────┘      └───────────┘    │
│        │                                                                   │
│        ▼                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐  │
│  │                     LLM / Embedding Providers                        │  │
│  │  OpenAI │ Azure │ Ollama │ Bedrock │ Anthropic │ HuggingFace │ ... │  │
│  └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Core Architecture

Module Dependency Graph

graph TB
    subgraph "API Layer"
        API[FastAPI Server]
        Routes[Routers]
        Auth[Auth Handler]
    end
    
    subgraph "Core Engine"
        LR[LightRAG Class]
        OP[Operate Module]
        PM[Prompt Templates]
    end
    
    subgraph "Storage Abstractions"
        BKV[BaseKVStorage]
        BVS[BaseVectorStorage]
        BGS[BaseGraphStorage]
        DSS[DocStatusStorage]
    end
    
    subgraph "Implementations"
        PG[PostgreSQL]
        MG[MongoDB]
        RD[Redis]
        N4J[Neo4j]
        NX[NetworkX]
        QD[Qdrant]
        ML[Milvus]
        FS[FAISS]
    end
    
    subgraph "LLM Layer"
        OAI[OpenAI]
        OLL[Ollama]
        AZ[Azure OpenAI]
        BD[Bedrock]
    end
    
    API --> Routes
    Routes --> Auth
    Routes --> LR
    LR --> OP
    LR --> PM
    LR --> BKV
    LR --> BVS
    LR --> BGS
    LR --> DSS
    
    BKV --> PG
    BKV --> MG
    BKV --> RD
    BVS --> PG
    BVS --> QD
    BVS --> ML
    BVS --> FS
    BGS --> PG
    BGS --> N4J
    BGS --> NX
    BGS --> MG
    
    OP --> OAI
    OP --> OLL
    OP --> AZ
    OP --> BD

Core Class: `LightRAG`

The LightRAG class (lightrag/lightrag.py) is the central orchestrator:

┌────────────────────────────────────────────────────────────────────┐
│                         LightRAG Class                             │
├────────────────────────────────────────────────────────────────────┤
│                                                                    │
│  Configuration                                                     │
│  ├── working_dir: str          # Data storage directory           │
│  ├── workspace: str            # Tenant isolation namespace        │
│  ├── kv_storage: str           # JsonKV/Redis/PG/Mongo            │
│  ├── vector_storage: str       # NanoVDB/Milvus/PG/Qdrant/FAISS   │
│  ├── graph_storage: str        # NetworkX/Neo4j/PG/Memgraph       │
│  └── doc_status_storage: str   # Json/Redis/PG/Mongo              │
│                                                                    │
│  Processing Settings                                               │
│  ├── chunk_token_size: int     # Default: 1200                    │
│  ├── chunk_overlap: int        # Default: 100                     │
│  ├── entity_extract_max_gleaning: int  # Default: 1               │
│  └── max_parallel_insert: int  # Default: 2                       │
│                                                                    │
│  Query Settings                                                    │
│  ├── top_k: int                # Default: 40                      │
│  ├── chunk_top_k: int          # Default: 20                      │
│  ├── max_entity_tokens: int    # Default: 6000                    │
│  ├── max_relation_tokens: int  # Default: 8000                    │
│  └── max_total_tokens: int     # Default: 30000                   │
│                                                                    │
│  Storage Instances                                                 │
│  ├── full_docs: BaseKVStorage                                     │
│  ├── text_chunks: BaseKVStorage                                   │
│  ├── entities_vdb: BaseVectorStorage                              │
│  ├── relationships_vdb: BaseVectorStorage                         │
│  ├── chunks_vdb: BaseVectorStorage                                │
│  ├── chunk_entity_relation_graph: BaseGraphStorage                │
│  └── doc_status: DocStatusStorage                                 │
│                                                                    │
│  Methods                                                           │
│  ├── insert(docs) → track_id                                      │
│  ├── ainsert(docs) → track_id                                     │
│  ├── query(text, params) → response                               │
│  ├── aquery(text, params) → response                              │
│  ├── delete_by_doc(doc_id) → result                               │
│  ├── get_knowledge_graph(label) → KnowledgeGraph                  │
│  └── get_processing_status(track_id) → status                     │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

Data Flow

Document Ingestion Pipeline

┌──────────────────────────────────────────────────────────────────────────┐
│                        Document Ingestion Flow                           │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  1. INPUT                                                                │
│     ▼                                                                    │
│  ┌──────────────────┐                                                    │
│  │  Raw Documents   │  ◀── String/List[String] + optional file_paths    │
│  └────────┬─────────┘                                                    │
│           │                                                              │
│  2. CHUNKING                                                             │
│     ▼                                                                    │
│  ┌──────────────────┐                                                    │
│  │ chunking_by_     │  ── chunk_token_size: 1200                        │
│  │ token_size()     │  ── overlap: 100 tokens                           │
│  └────────┬─────────┘  ── optional character split                      │
│           │                                                              │
│  3. ENTITY EXTRACTION                                                    │
│     ▼                                                                    │
│  ┌──────────────────┐                                                    │
│  │ extract_entities │  ── LLM extracts entities + relations             │
│  │ ()               │  ── Parallel processing per chunk                 │
│  └────────┬─────────┘  ── Gleaning for missed entities                  │
│           │                                                              │
│  4. MERGE & UPSERT                                                       │
│     ▼                                                                    │
│  ┌──────────────────┐                                                    │
│  │ merge_nodes_and_ │  ── Deduplicate entities                          │
│  │ edges()          │  ── Summarize descriptions via LLM                │
│  └────────┬─────────┘  ── Upsert to graph + vector DBs                  │
│           │                                                              │
│  5. STORAGE                                                              │
│     ▼                                                                    │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                    Storage Layer (Parallel Write)                 │   │
│  ├──────────────┬──────────────┬──────────────┬──────────────────────┤   │
│  │  full_docs   │ text_chunks  │ entities_vdb │ chunk_entity_graph   │   │
│  │  (KV Store)  │ (KV Store)   │ (Vector DB)  │ (Graph DB)           │   │
│  │              │              │ relations_vdb│                      │   │
│  │              │              │ chunks_vdb   │                      │   │
│  └──────────────┴──────────────┴──────────────┴──────────────────────┘   │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Entity Extraction Detail

┌─────────────────────────────────────────────────────────────────────┐
│                   Entity Extraction Process                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Input Chunk ────────────────────────────────────────────────────▶ │
│  "Apple Inc. announced new iPhone at Cupertino headquarters..."    │
│                                                                     │
│            ▼                                                        │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │               LLM Extraction Prompt                          │   │
│  │  - Entity types: Person, Organization, Location, Event...   │   │
│  │  - Output format: entity<SEP>name<SEP>type<SEP>description  │   │
│  │                   relation<SEP>src<SEP>tgt<SEP>kw<SEP>desc   │   │
│  └─────────────────────────────────────────────────────────────┘   │
│            ▼                                                        │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                    Extracted Output                          │   │
│  │                                                              │   │
│  │  ENTITIES:                                                   │   │
│  │  ├── Apple Inc. (organization) - Tech company...            │   │
│  │  ├── iPhone (product) - Smartphone device...                │   │
│  │  └── Cupertino (location) - City in California...           │   │
│  │                                                              │   │
│  │  RELATIONS:                                                  │   │
│  │  ├── Apple Inc. → iPhone (produces)                         │   │
│  │  └── Apple Inc. → Cupertino (headquartered_in)              │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Component Breakdown

Namespace Registry

All storage namespaces are defined in lightrag/namespace.py:

class NameSpace:
    # Key-Value Stores
    KV_STORE_FULL_DOCS = "full_docs"           # Complete document storage
    KV_STORE_TEXT_CHUNKS = "text_chunks"       # Chunked text storage
    KV_STORE_LLM_RESPONSE_CACHE = "llm_response_cache"  # LLM caching
    KV_STORE_FULL_ENTITIES = "full_entities"   # Entity metadata
    KV_STORE_FULL_RELATIONS = "full_relations" # Relation metadata
    
    # Vector Stores
    VECTOR_STORE_ENTITIES = "entities"         # Entity embeddings
    VECTOR_STORE_RELATIONSHIPS = "relationships"  # Relation embeddings
    VECTOR_STORE_CHUNKS = "chunks"             # Chunk embeddings
    
    # Graph Store
    GRAPH_STORE_CHUNK_ENTITY_RELATION = "chunk_entity_relation"
    
    # Document Status
    DOC_STATUS = "doc_status"

Storage Implementation Matrix

Backend	KV Store	Vector Store	Graph Store	Doc Status
JSON/File	✅ JsonKVStorage	❌	❌	✅ JsonDocStatus
PostgreSQL	✅ PGKVStorage	✅ PGVectorStorage	✅ PGGraphStorage	✅ PGDocStatus
MongoDB	✅ MongoKVStorage	✅ MongoVectorDB	✅ MongoGraphStorage	✅ MongoDocStatus
Redis	✅ RedisKVStorage	❌	❌	✅ RedisDocStatus
Neo4j	❌	❌	✅ Neo4JStorage	❌
Memgraph	❌	❌	✅ MemgraphStorage	❌
NetworkX	❌	❌	✅ NetworkXStorage	❌
Milvus	❌	✅ MilvusVectorDB	❌	❌
Qdrant	❌	✅ QdrantVectorDB	❌	❌
FAISS	❌	✅ FaissVectorDB	❌	❌
NanoVectorDB	❌	✅ NanoVectorDB	❌	❌

Storage Architecture

Abstract Base Classes

┌────────────────────────────────────────────────────────────────────────┐
│                      Storage Abstract Classes                          │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│  StorageNameSpace (Base)                                               │
│  ├── namespace: str                                                    │
│  ├── workspace: str                                                    │
│  ├── global_config: dict                                               │
│  ├── tenant_id: Optional[str]    # Multi-tenant support               │
│  ├── kb_id: Optional[str]        # Knowledge base isolation            │
│  ├── initialize() → None                                               │
│  ├── finalize() → None                                                 │
│  ├── index_done_callback() → None                                      │
│  └── drop() → dict                                                     │
│                                                                        │
│  BaseKVStorage (StorageNameSpace)                                      │
│  ├── get_by_id(id) → dict                                              │
│  ├── get_by_ids(ids) → list[dict]                                      │
│  ├── filter_keys(keys) → set[str]                                      │
│  ├── upsert(data) → None                                               │
│  └── delete(ids) → None                                                │
│                                                                        │
│  BaseVectorStorage (StorageNameSpace)                                  │
│  ├── query(query, top_k, embedding) → list[dict]                       │
│  ├── upsert(data) → None                                               │
│  ├── delete(ids) → None                                                │
│  ├── delete_entity(name) → None                                        │
│  ├── get_by_id(id) → dict                                              │
│  └── get_vectors_by_ids(ids) → dict[str, list[float]]                  │
│                                                                        │
│  BaseGraphStorage (StorageNameSpace)                                   │
│  ├── has_node(node_id) → bool                                          │
│  ├── has_edge(src, tgt) → bool                                         │
│  ├── get_node(node_id) → dict                                          │
│  ├── get_edge(src, tgt) → dict                                         │
│  ├── get_node_edges(node_id) → list[tuple]                             │
│  ├── upsert_node(node_id, data) → None                                 │
│  ├── upsert_edge(src, tgt, data) → None                                │
│  ├── delete_node(node_id) → None                                       │
│  ├── get_knowledge_graph(label, depth) → KnowledgeGraph                │
│  └── get_all_labels() → list[str]                                      │
│                                                                        │
│  DocStatusStorage (StorageNameSpace)                                   │
│  ├── get_status_by_doc_id(doc_id) → DocProcessingStatus                │
│  ├── upsert_status(doc_id, status) → None                              │
│  ├── get_docs_by_status(status) → dict                                 │
│  └── delete_status(doc_id) → None                                      │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

PostgreSQL Storage Implementation

┌─────────────────────────────────────────────────────────────────────────┐
│                    PostgreSQL Storage Schema                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  Table: LIGHTRAG_DOC_FULL                                               │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │ workspace VARCHAR(1024)  PK  │  Tenant isolation namespace       │  │
│  │ id        VARCHAR(255)   PK  │  Document hash ID                 │  │
│  │ doc_name  VARCHAR(1024)      │  Original filename                │  │
│  │ content   TEXT               │  Full document content            │  │
│  │ meta      JSONB              │  Document metadata                │  │
│  │ createtime TIMESTAMP(0)      │  Creation timestamp               │  │
│  │ updatetime TIMESTAMP(0)      │  Last update timestamp            │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  Table: LIGHTRAG_DOC_CHUNKS                                             │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │ workspace       VARCHAR(1024) PK  │  Tenant namespace            │  │
│  │ id              VARCHAR(255)  PK  │  Chunk hash ID               │  │
│  │ full_doc_id     VARCHAR(255)      │  Parent document ID          │  │
│  │ chunk_order_index INT             │  Chunk sequence number       │  │
│  │ tokens          INT               │  Token count                 │  │
│  │ content         TEXT              │  Chunk text content          │  │
│  │ file_path       TEXT              │  Source file path            │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  Table: LIGHTRAG_VDB_ENTITY                                             │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │ workspace      VARCHAR(1024) PK  │  Tenant namespace             │  │
│  │ id             VARCHAR(255)  PK  │  Entity hash ID               │  │
│  │ entity_name    VARCHAR(1024)     │  Entity display name          │  │
│  │ content        TEXT              │  Name + Description           │  │
│  │ content_vector VECTOR(dim)       │  Embedding vector             │  │
│  │ source_id      TEXT              │  Source chunk IDs             │  │
│  │ file_path      TEXT              │  Source file paths            │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  Table: LIGHTRAG_VDB_RELATION                                           │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │ workspace      VARCHAR(1024) PK  │  Tenant namespace             │  │
│  │ id             VARCHAR(255)  PK  │  Relation hash ID             │  │
│  │ src_id         VARCHAR(1024)     │  Source entity name           │  │
│  │ tgt_id         VARCHAR(1024)     │  Target entity name           │  │
│  │ content        TEXT              │  Relation description         │  │
│  │ content_vector VECTOR(dim)       │  Embedding vector             │  │
│  │ source_id      TEXT              │  Source chunk IDs             │  │
│  │ file_path      TEXT              │  Source file paths            │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  Table: LIGHTRAG_GRAPH_NODES                                            │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │ workspace    VARCHAR(1024) PK  │  Tenant namespace               │  │
│  │ id           VARCHAR(255)  PK  │  Node/Entity name               │  │
│  │ entity_type  VARCHAR(255)      │  Entity type (Person, etc.)     │  │
│  │ description  TEXT              │  Entity description             │  │
│  │ source_id    TEXT              │  Source chunk IDs               │  │
│  │ file_path    TEXT              │  Source file paths              │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  Table: LIGHTRAG_GRAPH_EDGES                                            │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │ workspace    VARCHAR(1024) PK  │  Tenant namespace               │  │
│  │ source_id    VARCHAR(255)  PK  │  Source node ID                 │  │
│  │ target_id    VARCHAR(255)  PK  │  Target node ID                 │  │
│  │ weight       FLOAT             │  Edge weight/strength           │  │
│  │ description  TEXT              │  Relation description           │  │
│  │ keywords     TEXT              │  Relation keywords              │  │
│  │ source_chunk_id TEXT           │  Source chunk IDs               │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Query Pipeline

Query Modes

┌─────────────────────────────────────────────────────────────────────────┐
│                           Query Modes                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  MODE: local                                                            │
│  ┌───────────────────────────────────────────────────────────────────┐ │
│  │ Focus: Specific entity context                                    │ │
│  │ 1. Extract keywords from query (LLM)                              │ │
│  │ 2. Search entity vector DB (top_k entities)                       │ │
│  │ 3. Get entity neighbors from graph (1-hop)                        │ │
│  │ 4. Retrieve related text chunks                                   │ │
│  │ 5. Generate response with entity-focused context                  │ │
│  └───────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  MODE: global                                                           │
│  ┌───────────────────────────────────────────────────────────────────┐ │
│  │ Focus: High-level relationships                                   │ │
│  │ 1. Extract keywords from query (LLM)                              │ │
│  │ 2. Search relationship vector DB (top_k relations)                │ │
│  │ 3. Get connected entities for each relation                       │ │
│  │ 4. Retrieve related text chunks                                   │ │
│  │ 5. Generate response with relationship-focused context            │ │
│  └───────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  MODE: hybrid                                                           │
│  ┌───────────────────────────────────────────────────────────────────┐ │
│  │ Focus: Combined entity + relation context                         │ │
│  │ 1. Execute LOCAL mode retrieval                                   │ │
│  │ 2. Execute GLOBAL mode retrieval                                  │ │
│  │ 3. Merge and deduplicate context                                  │ │
│  │ 4. Generate response with comprehensive context                   │ │
│  └───────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  MODE: naive                                                            │
│  ┌───────────────────────────────────────────────────────────────────┐ │
│  │ Focus: Direct vector similarity (traditional RAG)                 │ │
│  │ 1. Embed query                                                    │ │
│  │ 2. Search chunks vector DB (top_k chunks)                         │ │
│  │ 3. Generate response with chunk context                           │ │
│  └───────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  MODE: mix (DEFAULT)                                                    │
│  ┌───────────────────────────────────────────────────────────────────┐ │
│  │ Focus: Knowledge graph + vector retrieval                         │ │
│  │ 1. Execute HYBRID mode (entity + relations)                       │ │
│  │ 2. Execute NAIVE mode (direct chunk similarity)                   │ │
│  │ 3. Intelligently merge contexts                                   │ │
│  │ 4. Apply reranking if enabled                                     │ │
│  │ 5. Generate response with all context                             │ │
│  └───────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  MODE: bypass                                                           │
│  ┌───────────────────────────────────────────────────────────────────┐ │
│  │ Focus: Direct LLM (no retrieval)                                  │ │
│  │ 1. Send query directly to LLM                                     │ │
│  │ 2. No context retrieval                                           │ │
│  └───────────────────────────────────────────────────────────────────┘ │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Query Flow Diagram

sequenceDiagram
    participant C as Client
    participant API as FastAPI
    participant LR as LightRAG
    participant KW as Keyword Extractor
    participant VDB as Vector DBs
    participant GDB as Graph DB
    participant LLM as LLM

    C->>API: POST /query {query, mode}
    API->>LR: aquery(query, params)
    
    alt mode != bypass
        LR->>KW: Extract keywords
        KW->>LLM: Keyword extraction prompt
        LLM-->>KW: {high_level, low_level}
        
        par Parallel Retrieval
            LR->>VDB: Search entities/relations/chunks
            LR->>GDB: Get graph neighbors
        end
        
        VDB-->>LR: Entity/Relation matches
        GDB-->>LR: Graph context
        
        LR->>LR: Merge & deduplicate context
        opt Rerank enabled
            LR->>LR: Rerank chunks
        end
    end
    
    LR->>LLM: Generate response
    LLM-->>LR: Response text
    LR-->>API: QueryResult
    API-->>C: {response, references}

Entity-Relationship Diagrams

Document Processing ERD

┌─────────────────────────────────────────────────────────────────────────┐
│                    Document Processing Data Model                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌──────────────┐         ┌──────────────┐         ┌──────────────┐    │
│  │   Document   │ 1───n   │    Chunk     │  n───m  │    Entity    │    │
│  ├──────────────┤         ├──────────────┤         ├──────────────┤    │
│  │ id: str (PK) │─────────│ id: str (PK) │─────────│ id: str (PK) │    │
│  │ content: text│         │ doc_id: FK   │         │ name: str    │    │
│  │ file_path    │         │ order_index  │         │ type: str    │    │
│  │ meta: json   │         │ content: text│         │ description  │    │
│  │ created_at   │         │ tokens: int  │         │ source_ids   │    │
│  └──────────────┘         │ file_path    │         │ vector: []   │    │
│                           └──────────────┘         └──────────────┘    │
│                                  │                        │            │
│                                  │ n                      │ n          │
│                                  │                        │            │
│                                  ▼                        ▼            │
│                           ┌──────────────┐         ┌──────────────┐    │
│                           │ ChunkVector  │         │   Relation   │    │
│                           ├──────────────┤         ├──────────────┤    │
│                           │ chunk_id: FK │         │ id: str (PK) │    │
│                           │ vector: []   │         │ source: FK   │    │
│                           │ content: str │         │ target: FK   │    │
│                           └──────────────┘         │ description  │    │
│                                                    │ keywords: str│    │
│                                                    │ weight: float│    │
│                                                    │ vector: []   │    │
│                                                    └──────────────┘    │
│                                                                         │
│  Legend:                                                                │
│  ─────── = Relationship        PK = Primary Key                         │
│  1───n   = One-to-Many         FK = Foreign Key                         │
│  n───m   = Many-to-Many                                                 │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Knowledge Graph Schema

┌─────────────────────────────────────────────────────────────────────────┐
│                     Knowledge Graph Structure                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│                          ┌─────────────┐                                │
│                          │   ENTITY    │                                │
│                          │  (Node)     │                                │
│                          ├─────────────┤                                │
│                          │ entity_id   │ ◀─────────────┐               │
│                          │ entity_type │               │               │
│                          │ description │               │               │
│                          │ source_id   │               │               │
│                          │ file_path   │               │               │
│                          │ created_at  │               │               │
│                          └──────┬──────┘               │               │
│                                 │                      │               │
│                      ┌──────────┴──────────┐           │               │
│                      │                     │           │               │
│                      ▼                     ▼           │               │
│              ┌─────────────┐       ┌─────────────┐     │               │
│              │  RELATION   │       │  RELATION   │     │               │
│              │   (Edge)    │       │   (Edge)    │     │               │
│              ├─────────────┤       ├─────────────┤     │               │
│              │ source ─────┼───────┼─────────────┼─────┘               │
│              │ target ─────┼───────┘             │                     │
│              │ weight      │                     │                     │
│              │ description │                     │                     │
│              │ keywords    │                     │                     │
│              │ source_id   │                     │                     │
│              └─────────────┘                     │                     │
│                                                  │                     │
│              ┌───────────────────────────────────┘                     │
│              │                                                         │
│              ▼                                                         │
│      ┌─────────────┐                                                   │
│      │   ENTITY    │                                                   │
│      │  (Node)     │  ◀── Another entity connected by relation         │
│      └─────────────┘                                                   │
│                                                                         │
│  Graph Properties:                                                      │
│  • Undirected edges (relations work both ways)                          │
│  • Multi-edges allowed (multiple relations between same entities)       │
│  • Self-loops not typical but possible                                  │
│  • Weighted edges (based on mention frequency)                          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Multi-Tenant Data Isolation

┌─────────────────────────────────────────────────────────────────────────┐
│                    Multi-Tenant Data Model                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌────────────────────────────────────────────────────────────────────┐ │
│  │                      Composite Key Pattern                         │ │
│  │                                                                    │ │
│  │  All tables use: (tenant_id, kb_id, resource_id) as primary key   │ │
│  │                                                                    │ │
│  │  workspace = "{tenant_id}:{kb_id}:{workspace_name}"               │ │
│  └────────────────────────────────────────────────────────────────────┘ │
│                                                                         │
│  ┌──────────────────┐      ┌──────────────────┐                        │
│  │     Tenant       │ 1──n │  Knowledge Base  │                        │
│  ├──────────────────┤      ├──────────────────┤                        │
│  │ id: UUID (PK)    │──────│ id: UUID (PK)    │                        │
│  │ name: str        │      │ tenant_id: FK    │                        │
│  │ config: json     │      │ name: str        │                        │
│  │ quota: json      │      │ config: json     │                        │
│  │ created_at       │      │ created_at       │                        │
│  └──────────────────┘      └────────┬─────────┘                        │
│                                     │                                   │
│                                     │ 1                                 │
│                                     ▼                                   │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │                    Scoped Resources                               │  │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │  │
│  │  │  Documents  │  │  Entities   │  │  Relations  │              │  │
│  │  ├─────────────┤  ├─────────────┤  ├─────────────┤              │  │
│  │  │ tenant_id   │  │ tenant_id   │  │ tenant_id   │              │  │
│  │  │ kb_id       │  │ kb_id       │  │ kb_id       │              │  │
│  │  │ doc_id      │  │ entity_id   │  │ relation_id │              │  │
│  │  │ ...         │  │ ...         │  │ ...         │              │  │
│  │  └─────────────┘  └─────────────┘  └─────────────┘              │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  Query Filter Example (PostgreSQL):                                     │
│  SELECT * FROM entities                                                 │
│  WHERE workspace = 'tenant_a:kb_prod:default'                          │
│    AND entity_name ILIKE '%search_term%'                               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Configuration Reference

Default Constants

# Query defaults (lightrag/constants.py)
DEFAULT_TOP_K = 40                    # Entities/relations to retrieve
DEFAULT_CHUNK_TOP_K = 20              # Text chunks to retrieve
DEFAULT_MAX_ENTITY_TOKENS = 6000      # Max tokens for entity context
DEFAULT_MAX_RELATION_TOKENS = 8000    # Max tokens for relation context
DEFAULT_MAX_TOTAL_TOKENS = 30000      # Total context budget

# Chunking defaults
DEFAULT_CHUNK_SIZE = 1200             # Tokens per chunk
DEFAULT_CHUNK_OVERLAP = 100           # Overlap between chunks

# Processing defaults
DEFAULT_MAX_GLEANING = 1              # Extra extraction passes
DEFAULT_MAX_ASYNC = 4                 # Concurrent LLM calls
DEFAULT_MAX_PARALLEL_INSERT = 2       # Concurrent document inserts

# Timeouts
DEFAULT_LLM_TIMEOUT = 180             # LLM call timeout (seconds)
DEFAULT_EMBEDDING_TIMEOUT = 30        # Embedding call timeout

Environment Variables

# Storage backends
KV_STORAGE=JsonKVStorage          # JsonKVStorage|PGKVStorage|MongoKVStorage|RedisKVStorage
VECTOR_STORAGE=NanoVectorDBStorage # NanoVectorDBStorage|PGVectorStorage|MilvusVectorDBStorage|QdrantVectorDBStorage
GRAPH_STORAGE=NetworkXStorage     # NetworkXStorage|PGGraphStorage|Neo4JStorage|MemgraphStorage

# PostgreSQL (when using PG* backends)
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=secret
POSTGRES_DATABASE=lightrag

# Neo4j (when using Neo4JStorage)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password

# Redis (when using Redis* backends)
REDIS_URI=redis://localhost:6379

# MongoDB (when using Mongo* backends)
MONGO_URI=mongodb://localhost:27017
MONGO_DATABASE=lightrag

# Processing
CHUNK_SIZE=1200
CHUNK_OVERLAP_SIZE=100
MAX_ASYNC=4
MAX_PARALLEL_INSERT=2

# Query
TOP_K=40
CHUNK_TOP_K=20
COSINE_THRESHOLD=0.2

Next Steps

0003-api-reference.md: Complete REST API documentation
0004-storage-backends.md: Detailed storage implementation guide
0005-llm-integration.md: LLM provider configuration
0006-deployment-guide.md: Docker, K8s, and production setup
0007-quick-start.md: Getting started tutorial

Version: 1.4.9.1 | License: MIT | Repository: HKUDS/LightRAG

55 KiB Raw Blame History