* Initial plan * Initialize reverse documentation directory and create analysis plan Co-authored-by: raphaelmansuy <1003084+raphaelmansuy@users.noreply.github.com> * Add executive summary and comprehensive architecture documentation Co-authored-by: raphaelmansuy <1003084+raphaelmansuy@users.noreply.github.com> * Add comprehensive data models and dependency migration documentation Co-authored-by: raphaelmansuy <1003084+raphaelmansuy@users.noreply.github.com> * Complete comprehensive TypeScript migration documentation suite Co-authored-by: raphaelmansuy <1003084+raphaelmansuy@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: raphaelmansuy <1003084+raphaelmansuy@users.noreply.github.com>
32 KiB
Architecture Documentation: LightRAG System Design
Table of Contents
- System Architecture Overview
- Component Interaction Architecture
- Document Indexing Data Flow
- Query Processing Data Flow
- Storage Layer Architecture
- Concurrency and State Management
System Architecture Overview
LightRAG follows a layered architecture pattern with clear separation of concerns. The system is structured into five primary layers, each with specific responsibilities and well-defined interfaces.
graph TB
subgraph "Presentation Layer"
A1["WebUI (TypeScript)<br/>React + Vite"]
A2["API Clients<br/>REST/OpenAPI"]
end
subgraph "API Gateway Layer"
B1["FastAPI Server<br/>lightrag_server.py"]
B2["Authentication<br/>JWT + API Keys"]
B3["Request Validation<br/>Pydantic Models"]
B4["Route Handlers<br/>Query/Document/Graph"]
end
subgraph "Business Logic Layer"
C1["LightRAG Core<br/>lightrag.py"]
C2["Operations Module<br/>operate.py"]
C3["Utilities<br/>utils.py + utils_graph.py"]
C4["Prompt Templates<br/>prompt.py"]
end
subgraph "Integration Layer"
D1["LLM Providers<br/>OpenAI, Ollama, etc."]
D2["Embedding Providers<br/>text-embedding-*"]
D3["Storage Adapters<br/>KV/Vector/Graph/Status"]
end
subgraph "Infrastructure Layer"
E1["PostgreSQL<br/>Relational + Vector"]
E2["Neo4j/Memgraph<br/>Graph Database"]
E3["Redis/MongoDB<br/>Cache + NoSQL"]
E4["File System<br/>JSON + FAISS"]
end
A1 --> B1
A2 --> B1
B1 --> B2
B2 --> B3
B3 --> B4
B4 --> C1
C1 --> C2
C1 --> C3
C2 --> C4
C1 --> D1
C1 --> D2
C1 --> D3
D3 --> E1
D3 --> E2
D3 --> E3
D3 --> E4
style A1 fill:#E6F3FF
style B1 fill:#FFE6E6
style C1 fill:#E6FFE6
style D1 fill:#FFF5E6
style E1 fill:#FFE6F5
Layer Responsibilities
Presentation Layer: Handles user interactions through a React-based WebUI and provides REST API client capabilities. Responsible for rendering data, handling user input, and managing client-side state. Written in TypeScript with React, this layer communicates with the API Gateway exclusively through HTTP/HTTPS.
API Gateway Layer: Manages all external communication with the system. Implements authentication and authorization using JWT tokens and API keys, validates incoming requests using Pydantic models, handles rate limiting, and routes requests to appropriate handlers. Built with FastAPI, it provides automatic OpenAPI documentation and request/response validation.
Business Logic Layer: Contains the core intelligence of LightRAG. The LightRAG class orchestrates all operations, managing document processing pipelines, query execution, and storage coordination. The Operations module handles entity extraction, graph merging, and retrieval algorithms. Utilities provide helper functions for text processing, tokenization, hashing, and caching. Prompt templates define structured prompts for LLM interactions.
Integration Layer: Abstracts external dependencies through consistent interfaces. LLM provider adapters normalize different API formats (OpenAI, Anthropic, Ollama, etc.) into a common interface. Embedding provider adapters handle various embedding services. Storage adapters implement the abstract storage interfaces (BaseKVStorage, BaseVectorStorage, BaseGraphStorage, DocStatusStorage) for different backends.
Infrastructure Layer: Provides the foundational data persistence and retrieval capabilities. Supports multiple database systems including PostgreSQL (with pgvector for vector storage), Neo4j and Memgraph (for graph storage), Redis and MongoDB (for caching and document storage), and file-based storage (JSON files, FAISS indexes) for development and small deployments.
Component Interaction Architecture
This diagram illustrates how major components interact during typical operations, showing both document indexing and query execution flows.
graph LR
subgraph "External Systems"
LLM["LLM Service<br/>(OpenAI/Ollama)"]
Embed["Embedding Service"]
end
subgraph "Core Components"
RAG["LightRAG<br/>Core Engine"]
OPS["Operations<br/>Module"]
PIPE["Pipeline<br/>Manager"]
end
subgraph "Storage System"
KV["KV Storage<br/>Chunks/Cache"]
VEC["Vector Storage<br/>Embeddings"]
GRAPH["Graph Storage<br/>Entities/Relations"]
STATUS["Status Storage<br/>Pipeline State"]
end
subgraph "Processing Components"
CHUNK["Chunking<br/>Engine"]
EXTRACT["Entity<br/>Extraction"]
MERGE["Graph<br/>Merging"]
QUERY["Query<br/>Engine"]
end
RAG --> PIPE
PIPE --> CHUNK
CHUNK --> EXTRACT
EXTRACT --> OPS
OPS --> LLM
OPS --> Embed
OPS --> MERGE
MERGE --> VEC
MERGE --> GRAPH
CHUNK --> KV
PIPE --> STATUS
RAG --> QUERY
QUERY --> OPS
QUERY --> VEC
QUERY --> GRAPH
QUERY --> KV
QUERY --> LLM
style RAG fill:#E6FFE6
style LLM fill:#FFF5E6
style KV fill:#FFE6E6
style VEC fill:#FFE6E6
style GRAPH fill:#FFE6E6
style STATUS fill:#FFE6E6
Component Interaction Patterns
Document Ingestion Pattern: Client submits documents to the API Gateway, which authenticates and validates the request before passing it to the LightRAG core. The core initializes a pipeline instance with a unique track ID, stores the document in KV storage, and updates its status in Status storage. The Pipeline Manager coordinates the chunking, extraction, merging, and indexing stages, maintaining progress information throughout.
Entity Extraction Pattern: The Operations module receives text chunks from the Pipeline Manager and constructs prompts using templates from prompt.py. These prompts are sent to the configured LLM service, which returns structured entity and relationship data. The Operations module parses the response, normalizes entity names, and prepares data for graph merging.
Graph Merging Pattern: When new entities and relationships are extracted, the Merge component compares them against existing graph data. For matching entities (based on name similarity), it consolidates descriptions, merges metadata, and updates source references. For relationships, it deduplicates based on source-target pairs and aggregates weights. The merged data is then stored in Graph storage and vector representations are computed and stored in Vector storage.
Query Execution Pattern: The Query Engine receives a user query and determines the appropriate retrieval strategy based on the query mode. It extracts high-level and low-level keywords using the LLM, retrieves relevant entities and relationships from Graph storage, fetches related chunks from Vector storage, builds a context respecting token budgets, and finally generates a response using the LLM with the assembled context.
Document Indexing Data Flow
This sequence diagram details the complete flow of document processing from ingestion to indexing.
sequenceDiagram
participant Client
participant API as API Server
participant Core as LightRAG Core
participant Pipeline
participant Chunking
participant Extraction
participant Merging
participant KV as KV Storage
participant Vec as Vector Storage
participant Graph as Graph Storage
participant Status as Status Storage
participant LLM as LLM Service
participant Embed as Embedding Service
Client->>API: POST /documents/upload
API->>API: Authenticate & Validate
API->>Core: ainsert(document, file_path)
Note over Core: Initialization Phase
Core->>Core: Generate track_id
Core->>Status: Create doc status (PENDING)
Core->>KV: Store document content
Core->>Pipeline: apipeline_process_enqueue_documents()
Note over Pipeline: Chunking Phase
Pipeline->>Status: Update status (CHUNKING)
Pipeline->>Chunking: chunking_by_token_size()
Chunking->>Chunking: Tokenize & split by overlap
Chunking-->>Pipeline: chunks[]
Pipeline->>KV: Store chunks with metadata
Note over Pipeline: Extraction Phase
Pipeline->>Status: Update status (EXTRACTING)
loop For each chunk
Pipeline->>Extraction: extract_entities(chunk)
Extraction->>LLM: Generate entities/relations
LLM-->>Extraction: Structured output
Extraction->>Extraction: Parse & normalize
Extraction-->>Pipeline: entities[], relations[]
end
Note over Pipeline: Merging Phase
Pipeline->>Status: Update status (MERGING)
Pipeline->>Merging: merge_nodes_and_edges()
par Parallel Entity Processing
loop For each entity
Merging->>Graph: Check if entity exists
alt Entity exists
Merging->>LLM: Summarize descriptions
LLM-->>Merging: Merged description
Merging->>Graph: Update entity
else New entity
Merging->>Graph: Insert entity
end
Merging->>Embed: Generate embedding
Embed-->>Merging: embedding vector
Merging->>Vec: Store entity embedding
end
and Parallel Relationship Processing
loop For each relationship
Merging->>Graph: Check if relation exists
alt Relation exists
Merging->>Graph: Update weight & metadata
else New relation
Merging->>Graph: Insert relation
end
Merging->>Embed: Generate embedding
Embed-->>Merging: embedding vector
Merging->>Vec: Store relation embedding
end
end
Note over Pipeline: Indexing Phase
Pipeline->>Status: Update status (INDEXING)
Pipeline->>Vec: Index chunk embeddings
Pipeline->>Graph: Build graph indices
Pipeline->>KV: Commit cache
Note over Pipeline: Completion
Pipeline->>Status: Update status (COMPLETED)
Pipeline-->>Core: Success
Core-->>API: track_id, status
API-->>Client: 200 OK {track_id}
Note over Client: Client can poll /status/{track_id}
Indexing Phase Details
Document Reception and Validation: The API server receives the document, validates the file format and size, authenticates the request, and generates a unique track ID for monitoring. The document content is immediately stored in KV storage with metadata including file path, upload timestamp, and original filename.
Chunking Strategy: Documents are split into overlapping chunks using a token-based approach. The system tokenizes the entire document using tiktoken, creates chunks of configurable size (default 1200 tokens), adds overlap between consecutive chunks (default 100 tokens) to preserve context, and stores each chunk with position metadata and references to the source document.
Entity and Relationship Extraction: For each chunk, the system constructs a specialized prompt that instructs the LLM to identify entities with specific types and relationships between them. The LLM returns structured output in a specific format (entity|name|type|description for entities, relation|source|target|keywords|description for relationships). The system parses this output, normalizes entity names using case-insensitive matching, and validates the structure before proceeding.
Graph Construction and Merging: New entities are compared against existing entities in the graph using fuzzy matching. When duplicates are found, descriptions are merged using either LLM-based summarization (for complex cases) or simple concatenation (for simple cases). Relationships are deduplicated based on source-target pairs, with weights aggregated when duplicates are found. All graph modifications are protected by keyed locks to ensure consistency in concurrent operations.
Vector Embedding Generation: Entity descriptions, relationship descriptions, and chunk content are sent to the embedding service in batches for efficient processing. Embeddings are generated using the configured model (e.g., text-embedding-3-small for OpenAI), and vectors are stored in Vector storage with metadata linking back to their source entities, relationships, or chunks. The system uses semaphores to limit concurrent embedding requests and prevent rate limit errors.
Status Tracking Throughout: Every stage updates the document status in Status storage, recording the current phase, progress percentage, error messages if any, and timing information. This enables clients to poll for progress and provides diagnostic information for debugging failed indexing operations.
Query Processing Data Flow
This sequence diagram illustrates the retrieval and response generation process for different query modes.
sequenceDiagram
participant Client
participant API as API Server
participant Core as LightRAG Core
participant Query as Query Engine
participant KW as Keyword Extractor
participant Graph as Graph Storage
participant Vec as Vector Storage
participant KV as KV Storage
participant Context as Context Builder
participant LLM as LLM Service
participant Rerank as Rerank Service
Client->>API: POST /query (query, mode, params)
API->>API: Authenticate & Validate
API->>Core: aquery(query, QueryParam)
Core->>Query: Execute query
Note over Query: Keyword Extraction Phase
Query->>KW: Extract keywords
KW->>LLM: Generate high/low level keywords
LLM-->>KW: {hl_keywords[], ll_keywords[]}
KW-->>Query: keywords
alt Mode: local (Entity-centric)
Note over Query: Local Mode - Focus on Entities
Query->>Vec: Query entity vectors (ll_keywords)
Vec-->>Query: top_k entity_ids[]
Query->>Graph: Get entities by IDs
Graph-->>Query: entities[]
Query->>Graph: Get connected relations
Graph-->>Query: relations[]
else Mode: global (Relationship-centric)
Note over Query: Global Mode - Focus on Relations
Query->>Vec: Query relation vectors (hl_keywords)
Vec-->>Query: top_k relation_ids[]
Query->>Graph: Get relations by IDs
Graph-->>Query: relations[]
Query->>Graph: Get connected entities
Graph-->>Query: entities[]
else Mode: hybrid
Note over Query: Hybrid Mode - Combined
par Parallel Retrieval
Query->>Vec: Query entity vectors
Vec-->>Query: entity_ids[]
and
Query->>Vec: Query relation vectors
Vec-->>Query: relation_ids[]
end
Query->>Graph: Get entities and relations
Graph-->>Query: entities[], relations[]
Query->>Query: Merge with round-robin
else Mode: mix
Note over Query: Mix Mode - KG + Chunks
par Parallel Retrieval
Query->>Vec: Query entity vectors
Vec-->>Query: entity_ids[]
Query->>Graph: Get entities
Graph-->>Query: entities[]
and
Query->>Vec: Query chunk vectors
Vec-->>Query: chunk_ids[]
end
else Mode: naive
Note over Query: Naive Mode - Pure Vector
Query->>Vec: Query chunk vectors only
Vec-->>Query: top_k chunk_ids[]
end
Note over Query: Chunk Retrieval Phase
alt Mode != bypass
Query->>Query: Get related chunks from entities/relations
Query->>KV: Get chunks by IDs
KV-->>Query: chunks[]
opt Rerank enabled
Query->>Rerank: Rerank chunks
Rerank-->>Query: reranked_chunks[]
end
end
Note over Query: Context Building Phase
Query->>Context: Build context with token budget
Context->>Context: Allocate tokens (entities/relations/chunks)
Context->>Context: Truncate to fit budget
Context->>Context: Format entities/relations/chunks
Context-->>Query: context_string, references[]
Note over Query: Response Generation Phase
Query->>Query: Build prompt with context
opt Include conversation history
Query->>Query: Add history messages
end
alt Stream enabled
Query->>LLM: Stream generate (prompt, context)
loop Streaming chunks
LLM-->>Query: chunk
Query-->>API: chunk
API-->>Client: SSE chunk
end
else Stream disabled
Query->>LLM: Generate (prompt, context)
LLM-->>Query: response
Query-->>Core: response, references
Core-->>API: {response, references, metadata}
API-->>Client: 200 OK {response}
end
Query Processing Phase Details
Keyword Extraction Phase: The system sends the user query to the LLM with a specialized prompt that asks for high-level keywords (abstract concepts and themes) and low-level keywords (specific entities and terms). The LLM returns structured JSON with both keyword types, which guide the subsequent retrieval strategy. This two-level keyword approach enables the system to retrieve both broad contextual information and specific detailed facts.
Mode-Specific Retrieval Strategies:
Local Mode focuses on entity-centric retrieval by querying the vector storage using low-level keywords to find the most relevant entities, retrieving full entity details including descriptions and metadata, and then fetching all relationships connected to those entities. This mode is optimal for questions about specific entities or localized information.
Global Mode emphasizes relationship-centric retrieval by querying vector storage using high-level keywords to find relevant relationships, retrieving relationship details including keywords and descriptions, and then fetching the entities connected by those relationships. This mode excels at questions about connections, patterns, and higher-level concepts.
Hybrid Mode combines both approaches by running local and global retrieval in parallel and then merging results using a round-robin strategy to balance entity and relationship information. This provides comprehensive coverage for complex queries that require both types of information.
Mix Mode integrates knowledge graph retrieval with direct chunk retrieval by querying entity vectors to get graph-based context, simultaneously querying chunk vectors for relevant document sections, and combining both types of results. This mode provides the most complete context by including both structured knowledge and raw document content.
Naive Mode performs pure vector similarity search without using the knowledge graph, simply retrieving the most similar chunks based on embedding distance. This mode is fastest and works well for simple similarity-based retrieval without needing entity or relationship context.
Bypass Mode skips retrieval entirely and sends the query directly to the LLM, useful for general questions that don't require specific document context or when testing the LLM's base knowledge.
Context Building with Token Budgets: The system implements a sophisticated token budget management system that allocates a maximum number of tokens across different context components. It allocates tokens to entity descriptions (default 6000 tokens), relationship descriptions (default 8000 tokens), and chunk content (remaining budget, with a cap defined by chunk_top_k). The system truncates each component to fit within its budget using the tokenizer, prioritizing higher-ranked items when truncation is necessary, and ensures the total context doesn't exceed the max_total_tokens limit (default 30000 tokens).
Reranking for Improved Relevance: When enabled, the reranking phase takes retrieved chunks and reranks them using a specialized reranking model (like Cohere rerank or Jina rerank). This cross-encoder approach provides more accurate relevance scoring than pure vector similarity, especially for semantic matching. Chunks below the minimum rerank score threshold are filtered out, and only the top-k chunks after reranking are included in the final context.
Response Generation with Streaming: For streaming responses, the system establishes a connection to the LLM with stream=True, receives response tokens incrementally, and immediately forwards them to the client via Server-Sent Events (SSE). This provides real-time feedback to users and reduces perceived latency. For non-streaming responses, the system waits for the complete LLM response before returning it to the client along with metadata about entities, relationships, and chunks used in the context.
Storage Layer Architecture
The storage layer implements a plugin architecture with abstract base classes defining the contract for each storage type.
graph TB
subgraph "Abstract Interfaces"
BASE["StorageNameSpace<br/>(Base Class)"]
KV_BASE["BaseKVStorage"]
VEC_BASE["BaseVectorStorage"]
GRAPH_BASE["BaseGraphStorage"]
STATUS_BASE["DocStatusStorage"]
end
subgraph "KV Storage Implementations"
JSON_KV["JsonKVStorage<br/>(File-based)"]
PG_KV["PGKVStorage<br/>(PostgreSQL)"]
MONGO_KV["MongoKVStorage<br/>(MongoDB)"]
REDIS_KV["RedisKVStorage<br/>(Redis)"]
end
subgraph "Vector Storage Implementations"
NANO["NanoVectorDBStorage<br/>(In-memory)"]
FAISS["FaissVectorDBStorage<br/>(FAISS)"]
PG_VEC["PGVectorStorage<br/>(pgvector)"]
MILVUS["MilvusVectorStorage<br/>(Milvus)"]
QDRANT["QdrantVectorStorage<br/>(Qdrant)"]
end
subgraph "Graph Storage Implementations"
NX["NetworkXStorage<br/>(NetworkX)"]
NEO4J["Neo4jStorage<br/>(Neo4j)"]
MEMGRAPH["MemgraphStorage<br/>(Memgraph)"]
PG_GRAPH["PGGraphStorage<br/>(PostgreSQL)"]
end
subgraph "Doc Status Implementations"
JSON_STATUS["JsonDocStatusStorage<br/>(File-based)"]
PG_STATUS["PGDocStatusStorage<br/>(PostgreSQL)"]
MONGO_STATUS["MongoDocStatusStorage<br/>(MongoDB)"]
end
BASE --> KV_BASE
BASE --> VEC_BASE
BASE --> GRAPH_BASE
BASE --> STATUS_BASE
KV_BASE --> JSON_KV
KV_BASE --> PG_KV
KV_BASE --> MONGO_KV
KV_BASE --> REDIS_KV
VEC_BASE --> NANO
VEC_BASE --> FAISS
VEC_BASE --> PG_VEC
VEC_BASE --> MILVUS
VEC_BASE --> QDRANT
GRAPH_BASE --> NX
GRAPH_BASE --> NEO4J
GRAPH_BASE --> MEMGRAPH
GRAPH_BASE --> PG_GRAPH
STATUS_BASE --> JSON_STATUS
STATUS_BASE --> PG_STATUS
STATUS_BASE --> MONGO_STATUS
style BASE fill:#E6FFE6
style JSON_KV fill:#E6F3FF
style NANO fill:#FFE6E6
style NX fill:#FFF5E6
style JSON_STATUS fill:#FFE6F5
Storage Interface Contracts
BaseKVStorage Interface: Key-value storage manages cached data, text chunks, and full documents. Core methods include get_by_id(id) for retrieving a single value, get_by_ids(ids) for batch retrieval, filter_keys(keys) to check which keys don't exist, upsert(data) for inserting or updating entries, delete(ids) for removing entries, and index_done_callback() for persisting changes to disk. This interface supports both in-memory implementations with persistence and direct database implementations.
BaseVectorStorage Interface: Vector storage handles embeddings for entities, relationships, and chunks. Core methods include query(query, top_k, query_embedding) for similarity search, upsert(data) for storing vectors with metadata, delete(ids) for removing vectors, delete_entity(entity_name) for removing entity-related vectors, delete_entity_relation(entity_name) for removing relationship vectors, get_by_id(id) and get_by_ids(ids) for retrieving full vector data, and get_vectors_by_ids(ids) for efficient vector-only retrieval. All implementations must support cosine similarity search and metadata filtering.
BaseGraphStorage Interface: Graph storage maintains the entity-relationship graph structure. Core methods include has_node(node_id) and has_edge(source, target) for existence checks, node_degree(node_id) and edge_degree(src, tgt) for connectivity metrics, get_node(node_id) and upsert_node(node_id, data) for node operations, upsert_edge(source, target, data) for relationship operations, get_knowledge_graph(node_label, max_depth, max_nodes) for graph traversal and export, and delete operations for nodes and edges. The interface treats all relationships as undirected unless explicitly specified.
DocStatusStorage Interface: Document status storage tracks processing pipeline state for each document. Core methods include upsert(status) for updating document status, get_by_id(doc_id) for retrieving status, get_by_ids(doc_ids) for batch retrieval, filter_ids(doc_ids) for checking existence, delete(doc_ids) for cleanup, get_by_status(status) for finding documents in a specific state, and count_by_status() for pipeline metrics. This enables comprehensive monitoring and recovery of document processing operations.
Storage Implementation Patterns
File-Based Storage (JSON): Simple implementations store data in JSON files with in-memory caching for performance. All modifications are held in memory until index_done_callback() triggers a write to disk. These implementations are suitable for development, small deployments, and single-process scenarios. They provide atomic writes using temporary files and rename operations, handle concurrent access through file locking, and support workspace isolation through directory structure.
PostgreSQL Storage: Comprehensive implementations that leverage PostgreSQL's capabilities including JSON columns for flexible metadata, pgvector extension for vector similarity search, advisory locks for distributed coordination, connection pooling for performance, and transaction support for consistency. PostgreSQL implementations can handle all four storage types in a single database, simplifying deployment and backup. They support multi-tenant deployments through schema-based isolation and provide excellent performance for mixed workloads.
Specialized Vector Databases: Dedicated vector storage implementations like FAISS, Milvus, and Qdrant provide optimized vector similarity search with features like approximate nearest neighbor (ANN) search, GPU acceleration for large-scale similarity search, advanced indexing strategies (IVF, HNSW), and high-performance batch operations. These are recommended for deployments with large document sets (>1M chunks) or high query throughput requirements.
Graph Databases (Neo4j/Memgraph): Specialized graph implementations optimize graph traversal and pattern matching with native graph storage and indexing, Cypher query language for complex graph queries, visualization capabilities for knowledge graph exploration, and optimized algorithms for shortest path, centrality, and community detection. These are ideal for use cases requiring complex graph analytics and when graph visualization is a primary feature.
Concurrency and State Management
LightRAG implements sophisticated concurrency control to handle parallel document processing and query execution.
graph TB
subgraph "Concurrency Control"
SEM1["Semaphore<br/>LLM Calls<br/>(max_async)"]
SEM2["Semaphore<br/>Embeddings<br/>(embedding_func_max_async)"]
SEM3["Semaphore<br/>Graph Merging<br/>(graph_max_async)"]
LOCK1["Keyed Locks<br/>Entity Processing"]
LOCK2["Keyed Locks<br/>Relation Processing"]
LOCK3["Pipeline Status Lock"]
end
subgraph "State Management"
GLOBAL["Global Config<br/>workspace, paths, settings"]
PIPELINE["Pipeline Status<br/>current job, progress"]
NAMESPACE["Namespace Data<br/>storage instances, locks"]
end
subgraph "Task Coordination"
QUEUE["Task Queue<br/>Document Processing"]
PRIORITY["Priority Limiter<br/>Async Function Calls"]
TRACK["Track ID System<br/>Monitoring & Logging"]
end
SEM1 --> PRIORITY
SEM2 --> PRIORITY
SEM3 --> PRIORITY
LOCK1 --> NAMESPACE
LOCK2 --> NAMESPACE
LOCK3 --> PIPELINE
QUEUE --> TRACK
PRIORITY --> TRACK
GLOBAL --> NAMESPACE
PIPELINE --> TRACK
style SEM1 fill:#FFE6E6
style GLOBAL fill:#E6FFE6
style QUEUE fill:#E6F3FF
Concurrency Patterns
Semaphore-Based Rate Limiting: The system uses asyncio semaphores to limit concurrent operations and prevent overwhelming external services or exhausting resources. Different semaphores control different types of operations: LLM calls are limited by max_async (default 4), embedding function calls by embedding_func_max_async (default 8), and graph merging operations by graph_max_async (calculated as llm_model_max_async * 2). These semaphores ensure respectful API usage and prevent rate limit errors.
Keyed Locks for Data Consistency: When processing entities and relationships concurrently, the system uses keyed locks to ensure that multiple processes don't modify the same entity or relationship simultaneously. Each entity or relationship gets a unique lock based on its identifier, preventing race conditions during graph merging while still allowing parallel processing of different entities. This pattern enables high concurrency without sacrificing data consistency.
Pipeline Status Tracking: A shared pipeline status object tracks the current state of document processing including the active job name, number of documents being processed, current batch number, latest status message, and history of messages for debugging. This status is protected by an async lock and can be queried by clients to monitor progress. The status persists across the entire pipeline and provides visibility into long-running operations.
Workspace Isolation: The workspace concept provides multi-tenant isolation by prefixing all storage namespaces with a workspace identifier. Different workspaces maintain completely separate data including separate storage instances, independent configuration, isolated locks and semaphores, and separate pipeline status. This enables running multiple LightRAG instances in the same infrastructure without interference.
TypeScript Migration Considerations for Concurrency
Semaphore Implementation: Node.js doesn't have built-in semaphores, but the pattern can be implemented using the p-limit library, which provides similar functionality with a cleaner API. Example: const limiter = pLimit(4); await limiter(() => callLLM()).
Keyed Locks: For single-process deployments, a Map<string, Promise> can implement keyed locks. For multi-process deployments, consider using Redis with the Redlock algorithm or a dedicated lock service. The key insight is ensuring that operations on the same entity/relationship are serialized while different entities can be processed in parallel.
Shared State Management: Python's global dictionaries need to be replaced with class-based state management in TypeScript. For multi-process deployments, shared state should be externalized to Redis or a similar store. For single-process deployments, singleton classes can manage state with proper TypeScript visibility controls.
Pipeline Status Updates: Real-time status updates can be implemented using EventEmitter in Node.js for in-process communication, or Redis Pub/Sub for multi-process scenarios. WebSocket connections can provide real-time updates to clients without polling.
Summary
This architecture documentation provides a comprehensive view of LightRAG's design, from high-level layer organization to detailed component interactions and concurrency patterns. The system's modular design, with clear interfaces and abstractions, makes it well-suited for migration to TypeScript. The key architectural principles—layered separation of concerns, plugin-based storage abstraction, async-first concurrency, and comprehensive state tracking—translate well to TypeScript and Node.js idioms.
The subsequent documentation sections build on this architectural foundation, providing detailed specifications for data models, storage implementations, LLM integrations, and API contracts. Together, these documents form a complete blueprint for implementing a production-ready TypeScript version of LightRAG.