Merge branch 'main' into duplicate_dev

This commit is contained in:
FloretKu 2025-10-27 15:58:36 +08:00 committed by GitHub
commit 42f4eeb39b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
225 changed files with 14455 additions and 7108 deletions

207
.clinerules/01-basic.md Normal file
View file

@ -0,0 +1,207 @@
# LightRAG Project Intelligence (.clinerules)
## Project Overview
LightRAG is a mature, production-ready Retrieval-Augmented Generation (RAG) system with comprehensive knowledge graph capabilities. The system has evolved from experimental to production-ready status with extensive functionality across all major components.
## Current System State (August 15, 2025)
- **Status**: Production Ready - Stable and Mature
- **Configuration**: Gemini 2.5 Flash + BAAI/bge-m3 embeddings via custom endpoints
- **Storage**: Default in-memory with file persistence (JsonKVStorage, NetworkXStorage, NanoVectorDBStorage)
- **Language**: Chinese for summaries
- **Workspace**: `space1` for data isolation
- **Authentication**: JWT-based with admin/user accounts
## Critical Implementation Patterns
### 1. Embedding Format Compatibility (CRITICAL)
**Pattern**: Always handle both base64 and raw array embedding formats
**Location**: `lightrag/llm/openai.py` - `openai_embed` function
**Issue**: Custom OpenAI-compatible endpoints return embeddings as raw arrays, not base64 strings
**Solution**:
```python
np.array(dp.embedding, dtype=np.float32) if isinstance(dp.embedding, list)
else np.frombuffer(base64.b64decode(dp.embedding), dtype=np.float32)
```
**Impact**: Document processing fails completely without this dual format support
### 2. Async Pattern Consistency (CRITICAL)
**Pattern**: Always await coroutines before calling methods on the result
**Common Error**: `coroutine.method()` instead of `(await coroutine).method()`
**Locations**: MongoDB implementations, Neo4j operations
**Example**: `await self._data.list_indexes()` then `await cursor.to_list()`
### 3. Storage Layer Data Compatibility (CRITICAL)
**Pattern**: Always filter deprecated/incompatible fields during deserialization
**Common Fields to Remove**: `content`, `_id` (MongoDB), database-specific fields
**Implementation**: `data.pop('field_name', None)` before creating dataclass objects
**Locations**: All storage implementations (JSON, Redis, MongoDB, PostgreSQL)
### 4. Lock Key Generation (CRITICAL)
**Pattern**: Always sort relationship pairs for consistent lock keys
**Implementation**: `sorted_key_parts = sorted([src, tgt])` then `f"{sorted_key_parts[0]}-{sorted_key_parts[1]}"`
**Impact**: Prevents deadlocks in concurrent relationship processing
### 5. Event Loop Management (CRITICAL)
**Pattern**: Handle event loop mismatches during shutdown gracefully
**Implementation**: Timeout + specific RuntimeError handling for "attached to a different loop"
**Location**: Neo4j storage finalization
**Impact**: Prevents application shutdown failures
## Architecture Patterns
### 1. Dependency Injection
**Pattern**: Pass configuration through object constructors, not direct imports
**Example**: OllamaAPI receives configuration through LightRAG object
**Benefit**: Better testability and modularity
### 2. Memory Bank Documentation
**Pattern**: Maintain comprehensive memory bank for development continuity
**Structure**: Core files (projectbrief.md, activeContext.md, progress.md, etc.)
**Purpose**: Essential for context preservation across development sessions
### 3. Configuration Management
**Pattern**: Centralize defaults in constants.py, use environment variables for runtime config
**Implementation**: Default values in constants, override via .env file
**Benefit**: Consistent configuration across components
## Development Workflow Patterns
### 1. Frontend Development (CRITICAL)
**Package Manager**: **ALWAYS USE BUN** - Never use npm or yarn unless Bun is unavailable
**Commands**:
- `bun install` - Install dependencies
- `bun run dev` - Start development server
- `bun run build` - Build for production
- `bun run lint` - Run linting
- `bun test` - Run tests
- `bun run preview` - Preview production build
**Pattern**: All frontend operations must use Bun commands
**Fallback**: Only use npm/yarn if Bun installation fails
**Testing**: Use `bun test` for all frontend testing
### 2. Bug Fix Approach
1. **Identify root cause** - Don't just fix symptoms
2. **Implement robust solution** - Handle edge cases and format variations
3. **Maintain backward compatibility** - Preserve existing functionality
4. **Add comprehensive error handling** - Graceful degradation
5. **Document the fix** - Update memory bank with technical details
### 3. Feature Implementation
1. **Follow existing patterns** - Maintain architectural consistency
2. **Use dependency injection** - Avoid direct imports between modules
3. **Implement comprehensive error handling** - Handle all failure modes
4. **Add proper logging** - Debug and warning messages
5. **Update documentation** - Memory bank and code comments
6. **Comment Language** - Use English for comments and documentation
### 4. Performance Optimization
1. **Profile before optimizing** - Identify actual bottlenecks
2. **Maintain algorithmic correctness** - Don't sacrifice functionality for speed
3. **Use appropriate data structures** - Match structure to access patterns
4. **Implement caching strategically** - Cache expensive operations
5. **Monitor memory usage** - Prevent memory leaks
## Technology Stack Intelligence
### 1. LLM Integration
- **Primary**: Gemini 2.5 Flash via custom endpoint
- **Embedding**: BAAI/bge-m3 via custom endpoint
- **Reranking**: BAAI/bge-reranker-v2-m3
- **Pattern**: Always handle multiple provider formats
### 2. Storage Backends
- **Default**: In-memory with file persistence
- **Production Options**: PostgreSQL, MongoDB, Redis, Neo4j
- **Pattern**: Abstract storage interface with multiple implementations
### 3. API Architecture
- **Framework**: FastAPI with Gunicorn for production
- **Authentication**: JWT-based with role support
- **Compatibility**: Ollama-compatible endpoints for easy integration
### 4. Frontend
- **Framework**: React with TypeScript
- **Package Manager**: **BUN (REQUIRED)** - Always use Bun for all frontend operations
- **Build Tool**: Vite with Bun runtime
- **Visualization**: Sigma.js for graph rendering
- **State Management**: React hooks with context
- **Internationalization**: i18next for multi-language support
## Common Pitfalls and Solutions
### 1. Embedding Format Issues
**Pitfall**: Assuming all endpoints return base64-encoded embeddings
**Solution**: Always check format and handle both base64 and raw arrays
### 2. Async/Await Patterns
**Pitfall**: Calling methods on coroutines instead of awaited results
**Solution**: Always await coroutines before accessing their methods
### 3. Data Model Evolution
**Pitfall**: Breaking changes when removing fields from dataclasses
**Solution**: Filter deprecated fields during deserialization, don't break storage
### 4. Concurrency Issues
**Pitfall**: Inconsistent lock key generation causing deadlocks
**Solution**: Always sort keys for deterministic lock ordering
### 5. Event Loop Management
**Pitfall**: Event loop mismatches during shutdown
**Solution**: Implement timeout and specific error handling for loop issues
## Performance Considerations
### 1. Query Context Building
- **Algorithm**: Linear gradient weighted polling for fair resource allocation
- **Optimization**: Round-robin merging to eliminate mode bias
- **Pattern**: Smart chunk selection based on cross-entity occurrence
### 2. Graph Operations
- **Optimization**: Batch operations where possible
- **Pattern**: Use appropriate indexing for large datasets
- **Consideration**: Memory usage with large graphs
### 3. LLM Request Management
- **Pattern**: Priority-based queue for request ordering
- **Optimization**: Connection pooling and retry mechanisms
- **Consideration**: Rate limiting and cost management
## Security Patterns
### 1. Authentication
- **Implementation**: JWT tokens with role-based access
- **Pattern**: Stateless authentication with configurable expiration
- **Security**: Proper token validation and refresh mechanisms
### 2. API Security
- **Pattern**: Input validation and sanitization
- **Implementation**: FastAPI dependency injection for auth
- **Consideration**: Rate limiting and abuse prevention
## Maintenance Guidelines
### 1. Memory Bank Updates
- **Trigger**: After significant changes or bug fixes
- **Pattern**: Update activeContext.md and progress.md
- **Purpose**: Maintain development continuity
### 2. Configuration Management
- **Pattern**: Environment-based configuration with sensible defaults
- **Implementation**: .env files with example templates
- **Consideration**: Security for production deployments
### 3. Error Handling
- **Pattern**: Comprehensive logging with appropriate levels
- **Implementation**: Graceful degradation where possible
- **Consideration**: User-friendly error messages
## Project Evolution Notes
The project has evolved from experimental to production-ready status. Key milestones:
- **Early 2025**: Basic RAG implementation
- **Mid 2025**: Multiple storage backends and LLM providers
- **July 2025**: Major query optimization and algorithm improvements
- **August 2025**: Production-ready stable state
The system now supports enterprise-level deployments with comprehensive functionality across all components.

View file

@ -28,6 +28,12 @@ Makefile
# Exclude other projects
/tests
/scripts
/data
/dickens
/reproduce
/output_complete
/rag_storage
/inputs
# Python version manager file
.python-version

84
.github/workflows/docker-build-lite.yml vendored Normal file
View file

@ -0,0 +1,84 @@
name: Build Lite Docker Image
on:
workflow_dispatch:
inputs:
_notes_:
description: '⚠️ Create lite Docker images only after non-trivial version releases.'
required: false
type: boolean
default: false
permissions:
contents: read
packages: write
jobs:
build-and-push-lite:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get latest tag
id: get_tag
run: |
LATEST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -z "$LATEST_TAG" ]; then
LATEST_TAG="sha-$(git rev-parse --short HEAD)"
echo "No tags found, using commit SHA: $LATEST_TAG"
else
echo "Latest tag found: $LATEST_TAG"
fi
echo "tag=$LATEST_TAG" >> $GITHUB_OUTPUT
- name: Prepare lite tag
id: lite_tag
run: |
LITE_TAG="${{ steps.get_tag.outputs.tag }}-lite"
echo "Lite image tag: $LITE_TAG"
echo "lite_tag=$LITE_TAG" >> $GITHUB_OUTPUT
- name: Update version in __init__.py
run: |
sed -i "s/__version__ = \".*\"/__version__ = \"${{ steps.get_tag.outputs.tag }}\"/" lightrag/__init__.py
cat lightrag/__init__.py | grep __version__
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=raw,value=${{ steps.lite_tag.outputs.lite_tag }}
type=raw,value=lite
- name: Build and push lite Docker image
uses: docker/build-push-action@v5
with:
context: .
file: ./Dockerfile.lite
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=min
- name: Output image details
run: |
echo "Lite Docker image built and pushed successfully!"
echo "Image tag: ghcr.io/${{ github.repository }}:${{ steps.lite_tag.outputs.lite_tag }}"
echo "Base Git tag used: ${{ steps.get_tag.outputs.tag }}"

View file

@ -2,6 +2,12 @@ name: Build Test Docker Image manually
on:
workflow_dispatch:
inputs:
_notes_:
description: '⚠️ Please create a new git tag before building the docker image.'
required: false
type: boolean
default: false
permissions:
contents: read
@ -58,6 +64,7 @@ jobs:
uses: docker/build-push-action@v5
with:
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}

View file

@ -35,6 +35,18 @@ jobs:
echo "Found tag: $TAG"
echo "tag=$TAG" >> $GITHUB_OUTPUT
- name: Check if pre-release
id: check_prerelease
run: |
TAG="${{ steps.get_tag.outputs.tag }}"
if [[ "$TAG" == *"rc"* ]] || [[ "$TAG" == *"dev"* ]]; then
echo "is_prerelease=true" >> $GITHUB_OUTPUT
echo "This is a pre-release version: $TAG"
else
echo "is_prerelease=false" >> $GITHUB_OUTPUT
echo "This is a stable release: $TAG"
fi
- name: Update version in __init__.py
run: |
sed -i "s/__version__ = \".*\"/__version__ = \"${{ steps.get_tag.outputs.tag }}\"/" lightrag/__init__.py
@ -48,12 +60,13 @@ jobs:
images: ghcr.io/${{ github.repository }}
tags: |
type=raw,value=${{ steps.get_tag.outputs.tag }}
type=raw,value=latest
type=raw,value=latest,enable=${{ steps.check_prerelease.outputs.is_prerelease == 'false' }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.meta.outputs.tags }}

View file

@ -17,6 +17,29 @@ jobs:
with:
fetch-depth: 0 # Fetch all history for tags
# Build frontend WebUI
- name: Setup Bun
uses: oven-sh/setup-bun@v1
with:
bun-version: latest
- name: Build Frontend WebUI
run: |
cd lightrag_webui
bun install --frozen-lockfile
bun run build
cd ..
- name: Verify Frontend Build
run: |
if [ ! -f "lightrag/api/webui/index.html" ]; then
echo "❌ Error: Frontend build failed - index.html not found"
exit 1
fi
echo "✅ Frontend build verified"
echo "Frontend files:"
ls -lh lightrag/api/webui/ | head -10
- uses: actions/setup-python@v5
with:
python-version: "3.x"

11
.gitignore vendored
View file

@ -9,10 +9,10 @@ __pycache__/
# Virtual Environment
.venv/
env/
venv/
*.env*
.env_example
# Enviroment Variable Files
.env
# Build / Distribution
dist/
@ -66,10 +66,11 @@ download_models_hf.py
lightrag-dev/
gui/
# Frontend build output (built during PyPI release)
lightrag/api/webui/
# unit-test files
test_*
# Cline files
memory-bank
memory-bank/
.clinerules

39
AGENTS.md Normal file
View file

@ -0,0 +1,39 @@
# Repository Guidelines
LightRAG is an advanced Retrieval-Augmented Generation (RAG) framework designed to enhance information retrieval and generation through graph-based knowledge representation.
## Project Structure & Module Organization
- `lightrag/`: Core Python package with orchestrators (`lightrag/lightrag.py`), storage adapters in `kg/`, LLM bindings in `llm/`, and helpers such as `operate.py` and `utils_*.py`.
- `lightrag-api/`: FastAPI service (`lightrag_server.py`) with routers under `routers/` and Gunicorn launcher `run_with_gunicorn.py`.
- `lightrag_webui/`: React 19 + TypeScript client driven by Bun + Vite; UI components live in `src/`.
- Tests live in `tests/` and root-level `test_*.py`. Working datasets stay in `inputs/`, `rag_storage/`, `temp/`; deployment collateral lives in `docs/`, `k8s-deploy/`, and `docker-compose.yml`.
## Build, Test, and Development Commands
- `python -m venv .venv && source .venv/bin/activate`: set up the Python runtime.
- `pip install -e .` / `pip install -e .[api]`: install the package and API extras in editable mode.
- `lightrag-server` or `uvicorn lightrag.api.lightrag_server:app --reload`: start the API locally; ensure `.env` is present.
- `python -m pytest tests` or `python test_graph_storage.py`: run the full suite or a targeted script.
- `ruff check .`: lint Python sources before committing.
- `bun install`, `bun run dev`, `bun run build`, `bun test`: manage the web UI workflow (Bun is mandatory).
## Coding Style & Naming Conventions
- Backend code follow PEP 8 with four-space indentation, annotate functions, and reach for dataclasses when modelling state.
- Use `lightrag.utils.logger` instead of `print`; respect logger configuration flags.
- Extend storage or pipeline abstractions via `lightrag.base` and keep reusable helpers in the existing `utils_*.py`.
- Python modules remain lowercase with underscores; React components use `PascalCase.tsx` and hooks-first patterns.
- Front-end code should remain in TypeScript with two-space indentation, rely on functional React components with hooks, and follow Tailwind utility style.
## Testing Guidelines
- Add pytest cases beside the affected module or the relevant `test_*.py`; functions should start with `test_`.
- Export required `LIGHTRAG_*` environment variables before running integration or storage tests.
- For UI updates, pair code with Vitest specs and run `bun test`.
## Commit & Pull Request Guidelines
- Use concise, imperative commit subjects (e.g., `Fix lock key normalization`) and add body context only when necessary.
- PRs should include a summary, operational impact, linked issues, and screenshots or API samples for user-facing work.
- Verify `ruff check .`, `python -m pytest`, and affected Bun commands succeed before requesting review; note the runs in the PR text.
## Security & Configuration Tips
- Copy `.env.example` and `config.ini.example`; never commit secrets or real connection strings.
- Configure storage backends through `LIGHTRAG_*` variables and validate them with `docker-compose` services when needed.
- Treat `lightrag.log*` as local artefacts; purge sensitive information before sharing logs or outputs.

View file

@ -1,63 +1,101 @@
# Build stage
FROM python:3.12-slim AS builder
# Frontend build stage
FROM oven/bun:1 AS frontend-builder
WORKDIR /app
# Upgrade pip、setuptools and wheel to the latest version
RUN pip install --upgrade pip setuptools wheel
# Copy frontend source code
COPY lightrag_webui/ ./lightrag_webui/
# Install Rust and required build dependencies
RUN apt-get update && apt-get install -y \
curl \
build-essential \
pkg-config \
# Build frontend assets for inclusion in the API package
RUN cd lightrag_webui \
&& bun install --frozen-lockfile \
&& bun run build
# Python build stage - using uv for faster package installation
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder
ENV DEBIAN_FRONTEND=noninteractive
ENV UV_SYSTEM_PYTHON=1
ENV UV_COMPILE_BYTECODE=1
WORKDIR /app
# Install system deps (Rust is required by some wheels)
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
curl \
build-essential \
pkg-config \
&& rm -rf /var/lib/apt/lists/* \
&& curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y \
&& . $HOME/.cargo/env
&& curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
# Copy pyproject.toml and source code for dependency installation
ENV PATH="/root/.cargo/bin:/root/.local/bin:${PATH}"
# Ensure shared data directory exists for uv caches
RUN mkdir -p /root/.local/share/uv
# Copy project metadata and sources
COPY pyproject.toml .
COPY setup.py .
COPY uv.lock .
# Install base, API, and offline extras without the project to improve caching
RUN uv sync --frozen --no-dev --extra api --extra offline --no-install-project --no-editable
# Copy project sources after dependency layer
COPY lightrag/ ./lightrag/
# Install dependencies
ENV PATH="/root/.cargo/bin:${PATH}"
RUN pip install --user --no-cache-dir --use-pep517 .
RUN pip install --user --no-cache-dir --use-pep517 .[api]
# Include pre-built frontend assets from the previous stage
COPY --from=frontend-builder /app/lightrag/api/webui ./lightrag/api/webui
# Install depndencies for default storage
RUN pip install --user --no-cache-dir nano-vectordb networkx
# Install depndencies for default LLM
RUN pip install --user --no-cache-dir openai ollama tiktoken
# Install depndencies for default document loader
RUN pip install --user --no-cache-dir pypdf2 python-docx python-pptx openpyxl
# Sync project in non-editable mode and ensure pip is available for runtime installs
RUN uv sync --frozen --no-dev --extra api --extra offline --no-editable \
&& /app/.venv/bin/python -m ensurepip --upgrade
# Prepare offline cache directory and pre-populate tiktoken data
# Use uv run to execute commands from the virtual environment
RUN mkdir -p /app/data/tiktoken \
&& uv run lightrag-download-cache --cache-dir /app/data/tiktoken || status=$?; \
if [ -n "${status:-}" ] && [ "$status" -ne 0 ] && [ "$status" -ne 2 ]; then exit "$status"; fi
# Final stage
FROM python:3.12-slim
WORKDIR /app
# Upgrade pip and setuptools
RUN pip install --upgrade pip setuptools wheel
# Install uv for package management
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
# Copy only necessary files from builder
ENV UV_SYSTEM_PYTHON=1
# Copy installed packages and application code
COPY --from=builder /root/.local /root/.local
COPY ./lightrag ./lightrag
COPY --from=builder /app/.venv /app/.venv
COPY --from=builder /app/lightrag ./lightrag
COPY pyproject.toml .
COPY setup.py .
COPY uv.lock .
RUN pip install --use-pep517 ".[api]"
# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH
# Ensure the installed scripts are on PATH
ENV PATH=/app/.venv/bin:/root/.local/bin:$PATH
# Create necessary directories
RUN mkdir -p /app/data/rag_storage /app/data/inputs
# Install dependencies with uv sync (uses locked versions from uv.lock)
# And ensure pip is available for runtime installs
RUN uv sync --frozen --no-dev --extra api --extra offline --no-editable \
&& /app/.venv/bin/python -m ensurepip --upgrade
# Docker data directories
# Create persistent data directories AFTER package installation
RUN mkdir -p /app/data/rag_storage /app/data/inputs /app/data/tiktoken
# Copy offline cache into the newly created directory
COPY --from=builder /app/data/tiktoken /app/data/tiktoken
# Point to the prepared cache
ENV TIKTOKEN_CACHE_DIR=/app/data/tiktoken
ENV WORKING_DIR=/app/data/rag_storage
ENV INPUT_DIR=/app/data/inputs
# Expose the default port
# Expose API port
EXPOSE 9621
# Set entrypoint
ENTRYPOINT ["python", "-m", "lightrag.api.lightrag_server"]

102
Dockerfile.lite Normal file
View file

@ -0,0 +1,102 @@
# Frontend build stage
FROM oven/bun:1 AS frontend-builder
WORKDIR /app
# Copy frontend source code
COPY lightrag_webui/ ./lightrag_webui/
# Build frontend assets for inclusion in the API package
RUN cd lightrag_webui \
&& bun install --frozen-lockfile \
&& bun run build
# Python build stage - using uv for package installation
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim AS builder
ENV DEBIAN_FRONTEND=noninteractive
ENV UV_SYSTEM_PYTHON=1
ENV UV_COMPILE_BYTECODE=1
WORKDIR /app
# Install system dependencies required by some wheels
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
curl \
build-essential \
pkg-config \
&& rm -rf /var/lib/apt/lists/* \
&& curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:/root/.local/bin:${PATH}"
# Ensure shared data directory exists for uv caches
RUN mkdir -p /root/.local/share/uv
# Copy project metadata and sources
COPY pyproject.toml .
COPY setup.py .
COPY uv.lock .
# Install project dependencies (base + API extras) without the project to improve caching
RUN uv sync --frozen --no-dev --extra api --no-install-project --no-editable
# Copy project sources after dependency layer
COPY lightrag/ ./lightrag/
# Include pre-built frontend assets from the previous stage
COPY --from=frontend-builder /app/lightrag/api/webui ./lightrag/api/webui
# Sync project in non-editable mode and ensure pip is available for runtime installs
RUN uv sync --frozen --no-dev --extra api --no-editable \
&& /app/.venv/bin/python -m ensurepip --upgrade
# Prepare tiktoken cache directory and pre-populate tokenizer data
# Ignore exit code 2 which indicates assets already cached
RUN mkdir -p /app/data/tiktoken \
&& uv run lightrag-download-cache --cache-dir /app/data/tiktoken || status=$?; \
if [ -n "${status:-}" ] && [ "$status" -ne 0 ] && [ "$status" -ne 2 ]; then exit "$status"; fi
# Final stage
FROM python:3.12-slim
WORKDIR /app
# Install uv for package management
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
ENV UV_SYSTEM_PYTHON=1
# Copy installed packages and application code
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app/.venv /app/.venv
COPY --from=builder /app/lightrag ./lightrag
COPY pyproject.toml .
COPY setup.py .
COPY uv.lock .
# Ensure the installed scripts are on PATH
ENV PATH=/app/.venv/bin:/root/.local/bin:$PATH
# Sync dependencies inside the final image using uv
# And ensure pip is available for runtime installs
RUN uv sync --frozen --no-dev --extra api --no-editable \
&& /app/.venv/bin/python -m ensurepip --upgrade
# Create persistent data directories
RUN mkdir -p /app/data/rag_storage /app/data/inputs /app/data/tiktoken
# Copy cached tokenizer assets prepared in the builder stage
COPY --from=builder /app/data/tiktoken /app/data/tiktoken
# Docker data directories
ENV TIKTOKEN_CACHE_DIR=/app/data/tiktoken
ENV WORKING_DIR=/app/data/rag_storage
ENV INPUT_DIR=/app/data/inputs
# Expose API port
EXPOSE 9621
# Set entrypoint
ENTRYPOINT ["python", "-m", "lightrag.api.lightrag_server"]

View file

@ -352,7 +352,8 @@ class QueryParam:
user_prompt: str | None = None
"""User-provided prompt for the query.
If proivded, this will be use instead of the default vaulue from prompt template.
Addition instructions for LLM. If provided, this will be inject into the prompt template.
It's purpose is the let user customize the way LLM generate the response.
"""
enable_rerank: bool = True
@ -895,6 +896,10 @@ maxclients 500
为了保持对遗留数据的兼容在未配置工作空间时PostgreSQL非图存储的工作空间为`default`PostgreSQL AGE图存储的工作空间为空Neo4j图存储的默认工作空间为`base`。对于所有的外部存储,系统都提供了专用的工作空间环境变量,用于覆盖公共的 `WORKSPACE`环境变量配置。这些适用于指定存储类型的工作空间环境变量为:`REDIS_WORKSPACE`, `MILVUS_WORKSPACE`, `QDRANT_WORKSPACE`, `MONGODB_WORKSPACE`, `POSTGRES_WORKSPACE`, `NEO4J_WORKSPACE`
### AGENTS.md 自动编程引导文件
AGENTS.md 是一种简洁、开放的格式用于指导自动编程代理完成工作https://agents.md/)。它为 LightRAG 项目提供了一个专属且可预测的上下文与指令位置,帮助 AI 代码代理更好地开展工作。不同的 AI 代码代理不应各自维护独立的引导文件。如果某个 AI 代理无法自动识别 AGENTS.md可使用符号链接来解决。建立符号链接后可通过配置本地的 `.gitignore_global` 文件防止其被提交至 Git 仓库。
## 编辑实体和关系
LightRAG现在支持全面的知识图谱管理功能允许您在知识图谱中创建、编辑和删除实体和关系。

View file

@ -84,6 +84,8 @@
## Installation
> **📦 Offline Deployment**: For offline or air-gapped environments, see the [Offline Deployment Guide](./docs/OfflineDeployment.md) for instructions on pre-installing all dependencies and cache files.
### Install LightRAG Server
The LightRAG Server is designed to provide Web UI and API support. The Web UI facilitates document indexing, knowledge graph exploration, and a simple RAG query interface. LightRAG Server also provide an Ollama compatible interfaces, aiming to emulate LightRAG as an Ollama chat model. This allows AI chat bot, such as Open WebUI, to access LightRAG easily.
@ -353,7 +355,8 @@ class QueryParam:
user_prompt: str | None = None
"""User-provided prompt for the query.
If proivded, this will be use instead of the default vaulue from prompt template.
Addition instructions for LLM. If provided, this will be inject into the prompt template.
It's purpose is the let user customize the way LLM generate the response.
"""
enable_rerank: bool = True
@ -936,6 +939,10 @@ The `workspace` parameter ensures data isolation between different LightRAG inst
To maintain compatibility with legacy data, the default workspace for PostgreSQL non-graph storage is `default` and, for PostgreSQL AGE graph storage is null, for Neo4j graph storage is `base` when no workspace is configured. For all external storages, the system provides dedicated workspace environment variables to override the common `WORKSPACE` environment variable configuration. These storage-specific workspace environment variables are: `REDIS_WORKSPACE`, `MILVUS_WORKSPACE`, `QDRANT_WORKSPACE`, `MONGODB_WORKSPACE`, `POSTGRES_WORKSPACE`, `NEO4J_WORKSPACE`.
### AGENTS.md -- Guiding Coding Agents
AGENTS.md is a simple, open format for guiding coding agents (https://agents.md/). It is a dedicated, predictable place to provide the context and instructions to help AI coding agents work on LightRAG project. Different AI coders should not maintain separate guidance files individually. If any AI coder cannot automatically recognize AGENTS.md, symbolic links can be used as a solution. After establishing symbolic links, you can prevent them from being committed to the Git repository by configuring your local `.gitignore_global`.
## Edit Entities and Relations
LightRAG now supports comprehensive knowledge graph management capabilities, allowing you to create, edit, and delete entities and relationships within your knowledge graph.

77
docker-build-push.sh Executable file
View file

@ -0,0 +1,77 @@
#!/bin/bash
set -e
# Configuration
IMAGE_NAME="ghcr.io/hkuds/lightrag"
DOCKERFILE="Dockerfile"
TAG="latest"
# Get version from git tags
VERSION=$(git describe --tags --abbrev=0 2>/dev/null || echo "dev")
echo "=================================="
echo " Multi-Architecture Docker Build"
echo "=================================="
echo "Image: ${IMAGE_NAME}:${TAG}"
echo "Version: ${VERSION}"
echo "Platforms: linux/amd64, linux/arm64"
echo "=================================="
echo ""
# Check Docker login status (skip if CR_PAT is set for CI/CD)
if [ -z "$CR_PAT" ]; then
if ! docker info 2>/dev/null | grep -q "Username"; then
echo "⚠️ Warning: Not logged in to Docker registry"
echo "Please login first: docker login ghcr.io"
echo "Or set CR_PAT environment variable for automated login"
echo ""
read -p "Continue anyway? (y/n) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
fi
else
echo "Using CR_PAT environment variable for authentication"
fi
# Check if buildx builder exists, create if not
if ! docker buildx ls | grep -q "desktop-linux"; then
echo "Creating buildx builder..."
docker buildx create --name desktop-linux --use
docker buildx inspect --bootstrap
else
echo "Using existing buildx builder: desktop-linux"
docker buildx use desktop-linux
fi
echo ""
echo "Building and pushing multi-architecture image..."
echo ""
# Build and push
docker buildx build \
--platform linux/amd64,linux/arm64 \
--file ${DOCKERFILE} \
--tag ${IMAGE_NAME}:${TAG} \
--tag ${IMAGE_NAME}:${VERSION} \
--push \
.
echo ""
echo "✓ Build and push complete!"
echo ""
echo "Images pushed:"
echo " - ${IMAGE_NAME}:${TAG}"
echo " - ${IMAGE_NAME}:${VERSION}"
echo ""
echo "Verifying multi-architecture manifest..."
echo ""
# Verify
docker buildx imagetools inspect ${IMAGE_NAME}:${TAG}
echo ""
echo "✓ Verification complete!"
echo ""
echo "Pull with: docker pull ${IMAGE_NAME}:${TAG}"

View file

@ -12,13 +12,10 @@ services:
volumes:
- ./data/rag_storage:/app/data/rag_storage
- ./data/inputs:/app/data/inputs
- ./data/tiktoken:/app/data/tiktoken
- ./config.ini:/app/config.ini
- ./.env:/app/.env
env_file:
- .env
environment:
- TIKTOKEN_CACHE_DIR=/app/data/tiktoken
restart: unless-stopped
extra_hosts:
- "host.docker.internal:host-gateway"

View file

@ -1,17 +1,11 @@
# LightRAG
# LightRAG Docker Deployment
A lightweight Knowledge Graph Retrieval-Augmented Generation system with multiple LLM backend support.
## 🚀 Installation
## 🚀 Preparation
### Prerequisites
- Python 3.10+
- Git
- Docker (optional for Docker deployment)
### Clone the repository:
### Native Installation
1. Clone the repository:
```bash
# Linux/MacOS
git clone https://github.com/HKUDS/LightRAG.git
@ -23,7 +17,8 @@ git clone https://github.com/HKUDS/LightRAG.git
cd LightRAG
```
2. Configure your environment:
### Configure your environment:
```bash
# Linux/MacOS
cp .env.example .env
@ -35,141 +30,92 @@ Copy-Item .env.example .env
# Edit .env with your preferred configuration
```
3. Create and activate virtual environment:
```bash
# Linux/MacOS
python -m venv venv
source venv/bin/activate
```
```powershell
# Windows PowerShell
python -m venv venv
.\venv\Scripts\Activate
```
LightRAG can be configured using environment variables in the `.env` file:
4. Install dependencies:
```bash
# Both platforms
pip install -r requirements.txt
```
**Server Configuration**
- `HOST`: Server host (default: 0.0.0.0)
- `PORT`: Server port (default: 9621)
**LLM Configuration**
- `LLM_BINDING`: LLM backend to use (lollms/ollama/openai)
- `LLM_BINDING_HOST`: LLM server host URL
- `LLM_MODEL`: Model name to use
**Embedding Configuration**
- `EMBEDDING_BINDING`: Embedding backend (lollms/ollama/openai)
- `EMBEDDING_BINDING_HOST`: Embedding server host URL
- `EMBEDDING_MODEL`: Embedding model name
**RAG Configuration**
- `MAX_ASYNC`: Maximum async operations
- `MAX_TOKENS`: Maximum token size
- `EMBEDDING_DIM`: Embedding dimensions
## 🐳 Docker Deployment
Docker instructions work the same on all platforms with Docker Desktop installed.
1. Build and start the container:
### Start LightRAG server:
```bash
docker-compose up -d
```
### Configuration Options
LightRAG Server uses the following paths for data storage:
LightRAG can be configured using environment variables in the `.env` file:
#### Server Configuration
- `HOST`: Server host (default: 0.0.0.0)
- `PORT`: Server port (default: 9621)
#### LLM Configuration
- `LLM_BINDING`: LLM backend to use (lollms/ollama/openai)
- `LLM_BINDING_HOST`: LLM server host URL
- `LLM_MODEL`: Model name to use
#### Embedding Configuration
- `EMBEDDING_BINDING`: Embedding backend (lollms/ollama/openai)
- `EMBEDDING_BINDING_HOST`: Embedding server host URL
- `EMBEDDING_MODEL`: Embedding model name
#### RAG Configuration
- `MAX_ASYNC`: Maximum async operations
- `MAX_TOKENS`: Maximum token size
- `EMBEDDING_DIM`: Embedding dimensions
#### Security
- `LIGHTRAG_API_KEY`: API key for authentication
### Data Storage Paths
The system uses the following paths for data storage:
```
data/
├── rag_storage/ # RAG data persistence
└── inputs/ # Input documents
```
### Example Deployments
1. Using with Ollama:
```env
LLM_BINDING=ollama
LLM_BINDING_HOST=http://host.docker.internal:11434
LLM_MODEL=mistral
EMBEDDING_BINDING=ollama
EMBEDDING_BINDING_HOST=http://host.docker.internal:11434
EMBEDDING_MODEL=bge-m3
```
you can't just use localhost from docker, that's why you need to use host.docker.internal which is defined in the docker compose file and should allow you to access the localhost services.
2. Using with OpenAI:
```env
LLM_BINDING=openai
LLM_MODEL=gpt-3.5-turbo
EMBEDDING_BINDING=openai
EMBEDDING_MODEL=text-embedding-ada-002
OPENAI_API_KEY=your-api-key
```
### API Usage
Once deployed, you can interact with the API at `http://localhost:9621`
Example query using PowerShell:
```powershell
$headers = @{
"X-API-Key" = "your-api-key"
"Content-Type" = "application/json"
}
$body = @{
query = "your question here"
} | ConvertTo-Json
Invoke-RestMethod -Uri "http://localhost:9621/query" -Method Post -Headers $headers -Body $body
```
Example query using curl:
```bash
curl -X POST "http://localhost:9621/query" \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{"query": "your question here"}'
```
## 🔒 Security
Remember to:
1. Set a strong API key in production
2. Use SSL in production environments
3. Configure proper network security
## 📦 Updates
### Updates
To update the Docker container:
```bash
docker-compose pull
docker-compose up -d --build
docker-compose down
docker-compose up
```
To update native installation:
### Offline deployment
Software packages requiring `transformers`, `torch`, or `cuda` will is not preinstalled in the dokcer images. Consequently, document extraction tools such as Docling, as well as local LLM models like Hugging Face and LMDeploy, can not be used in an off line enviroment. These high-compute-resource-demanding services should not be integrated into LightRAG. Docling will be decoupled and deployed as a standalone service.
## 📦 Build Docker Images
### For local development and testing
```bash
# Linux/MacOS
git pull
source venv/bin/activate
pip install -r requirements.txt
# Build and run with docker-compose
docker compose up --build
```
```powershell
# Windows PowerShell
git pull
.\venv\Scripts\Activate
pip install -r requirements.txt
### For production release
**multi-architecture build and push**:
```bash
# Use the provided build script
./docker-build-push.sh
```
**The build script will**:
- Check Docker registry login status
- Create/use buildx builder automatically
- Build for both AMD64 and ARM64 architectures
- Push to GitHub Container Registry (ghcr.io)
- Verify the multi-architecture manifest
**Prerequisites**:
Before building multi-architecture images, ensure you have:
- Docker 20.10+ with Buildx support
- Sufficient disk space (20GB+ recommended for offline image)
- Registry access credentials (if pushing images)

207
docs/FrontendBuildGuide.md Normal file
View file

@ -0,0 +1,207 @@
# Frontend Build Guide
## Overview
The LightRAG project includes a React-based WebUI frontend. This guide explains how frontend building works in different scenarios.
## Key Principle
- **Git Repository**: Frontend build results are **NOT** included (kept clean)
- **PyPI Package**: Frontend build results **ARE** included (ready to use)
- **Build Tool**: Uses **Bun** (not npm/yarn)
## Installation Scenarios
### 1. End Users (From PyPI) ✨
**Command:**
```bash
pip install lightrag-hku[api]
```
**What happens:**
- Frontend is already built and included in the package
- No additional steps needed
- Web interface works immediately
---
### 2. Development Mode (Recommended for Contributors) 🔧
**Command:**
```bash
# Clone the repository
git clone https://github.com/HKUDS/LightRAG.git
cd LightRAG
# Install in editable mode (no frontend build required yet)
pip install -e ".[api]"
# Build frontend when needed (can be done anytime)
cd lightrag_webui
bun install --frozen-lockfile
bun run build
cd ..
```
**Advantages:**
- Install first, build later (flexible workflow)
- Changes take effect immediately (symlink mode)
- Frontend can be rebuilt anytime without reinstalling
**How it works:**
- Creates symlinks to source directory
- Frontend build output goes to `lightrag/api/webui/`
- Changes are immediately visible in installed package
---
### 3. Normal Installation (Testing Package Build) 📦
**Command:**
```bash
# Clone the repository
git clone https://github.com/HKUDS/LightRAG.git
cd LightRAG
# ⚠️ MUST build frontend FIRST
cd lightrag_webui
bun install --frozen-lockfile
bun run build
cd ..
# Now install
pip install ".[api]"
```
**What happens:**
- Frontend files are **copied** to site-packages
- Post-build modifications won't affect installed package
- Requires rebuild + reinstall to update
**When to use:**
- Testing complete installation process
- Verifying package configuration
- Simulating PyPI user experience
---
### 4. Creating Distribution Package 🚀
**Command:**
```bash
# Build frontend first
cd lightrag_webui
bun install --frozen-lockfile --production
bun run build
cd ..
# Create distribution packages
python -m build
# Output: dist/lightrag_hku-*.whl and dist/lightrag_hku-*.tar.gz
```
**What happens:**
- `setup.py` checks if frontend is built
- If missing, installation fails with helpful error message
- Generated package includes all frontend files
---
## GitHub Actions (Automated Release)
When creating a release on GitHub:
1. **Automatically builds frontend** using Bun
2. **Verifies** build completed successfully
3. **Creates Python package** with frontend included
4. **Publishes to PyPI** using existing trusted publisher setup
**No manual intervention required!**
---
## Quick Reference
| Scenario | Command | Frontend Required | Can Build After |
|----------|---------|-------------------|-----------------|
| From PyPI | `pip install lightrag-hku[api]` | Included | No (already installed) |
| Development | `pip install -e ".[api]"` | No | ✅ Yes (anytime) |
| Normal Install | `pip install ".[api]"` | ✅ Yes (before) | No (must reinstall) |
| Create Package | `python -m build` | ✅ Yes (before) | N/A |
---
## Bun Installation
If you don't have Bun installed:
```bash
# macOS/Linux
curl -fsSL https://bun.sh/install | bash
# Windows
powershell -c "irm bun.sh/install.ps1 | iex"
```
Official documentation: https://bun.sh
---
## File Structure
```
LightRAG/
├── lightrag_webui/ # Frontend source code
│ ├── src/ # React components
│ ├── package.json # Dependencies
│ └── vite.config.ts # Build configuration
│ └── outDir: ../lightrag/api/webui # Build output
├── lightrag/
│ └── api/
│ └── webui/ # Frontend build output (gitignored)
│ ├── index.html # Built files (after running bun run build)
│ └── assets/ # Built assets
├── setup.py # Build checks
├── pyproject.toml # Package configuration
└── .gitignore # Excludes lightrag/api/webui/* (except .gitkeep)
```
---
## Troubleshooting
### Q: I installed in development mode but the web interface doesn't work
**A:** Build the frontend:
```bash
cd lightrag_webui && bun run build
```
### Q: I built the frontend but it's not in my installed package
**A:** You probably used `pip install .` after building. Either:
- Use `pip install -e ".[api]"` for development
- Or reinstall: `pip uninstall lightrag-hku && pip install ".[api]"`
### Q: Where are the built frontend files?
**A:** In `lightrag/api/webui/` after running `bun run build`
### Q: Can I use npm or yarn instead of Bun?
**A:** The project is configured for Bun. While npm/yarn might work, Bun is recommended per project standards.
---
## Summary
**PyPI users**: No action needed, frontend included
**Developers**: Use `pip install -e ".[api]"`, build frontend when needed
**CI/CD**: Automatic build in GitHub Actions
**Git**: Frontend build output never committed
For questions or issues, please open a GitHub issue.

317
docs/OfflineDeployment.md Normal file
View file

@ -0,0 +1,317 @@
# LightRAG Offline Deployment Guide
This guide provides comprehensive instructions for deploying LightRAG in offline environments where internet access is limited or unavailable.
If you deploy LightRAG using Docker, there is no need to refer to this document, as the LightRAG Docker image is pre-configured for offline operation.
> Software packages requiring `transformers`, `torch`, or `cuda` will not be included in the offline dependency group. Consequently, document extraction tools such as Docling, as well as local LLM models like Hugging Face and LMDeploy, are outside the scope of offline installation support. These high-compute-resource-demanding services should not be integrated into LightRAG. Docling will be decoupled and deployed as a standalone service.
## Table of Contents
- [Overview](#overview)
- [Quick Start](#quick-start)
- [Layered Dependencies](#layered-dependencies)
- [Tiktoken Cache Management](#tiktoken-cache-management)
- [Complete Offline Deployment Workflow](#complete-offline-deployment-workflow)
- [Troubleshooting](#troubleshooting)
## Overview
LightRAG uses dynamic package installation (`pipmaster`) for optional features based on file types and configurations. In offline environments, these dynamic installations will fail. This guide shows you how to pre-install all necessary dependencies and cache files.
### What Gets Dynamically Installed?
LightRAG dynamically installs packages for:
- **Document Processing**: `docling`, `pypdf2`, `python-docx`, `python-pptx`, `openpyxl`
- **Storage Backends**: `redis`, `neo4j`, `pymilvus`, `pymongo`, `asyncpg`, `qdrant-client`
- **LLM Providers**: `openai`, `anthropic`, `ollama`, `zhipuai`, `aioboto3`, `voyageai`, `llama-index`, `lmdeploy`, `transformers`, `torch`
- Tiktoken Models**: BPE encoding models downloaded from OpenAI CDN
## Quick Start
### Option 1: Using pip with Offline Extras
```bash
# Online environment: Install all offline dependencies
pip install lightrag-hku[offline]
# Download tiktoken cache
lightrag-download-cache
# Create offline package
pip download lightrag-hku[offline] -d ./offline-packages
tar -czf lightrag-offline.tar.gz ./offline-packages ~/.tiktoken_cache
# Transfer to offline server
scp lightrag-offline.tar.gz user@offline-server:/path/to/
# Offline environment: Install
tar -xzf lightrag-offline.tar.gz
pip install --no-index --find-links=./offline-packages lightrag-hku[offline]
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
```
### Option 2: Using Requirements Files
```bash
# Online environment: Download packages
pip download -r requirements-offline.txt -d ./packages
# Transfer to offline server
tar -czf packages.tar.gz ./packages
scp packages.tar.gz user@offline-server:/path/to/
# Offline environment: Install
tar -xzf packages.tar.gz
pip install --no-index --find-links=./packages -r requirements-offline.txt
```
## Layered Dependencies
LightRAG provides flexible dependency groups for different use cases:
### Available Dependency Groups
| Group | Description | Use Case |
|-------|-------------|----------|
| `offline-docs` | Document processing | PDF, DOCX, PPTX, XLSX files |
| `offline-storage` | Storage backends | Redis, Neo4j, MongoDB, PostgreSQL, etc. |
| `offline-llm` | LLM providers | OpenAI, Anthropic, Ollama, etc. |
| `offline` | All of the above | Complete offline deployment |
> Software packages requiring `transformers`, `torch`, or `cuda` will not be included in the offline dependency group.
### Installation Examples
```bash
# Install only document processing dependencies
pip install lightrag-hku[offline-docs]
# Install document processing and storage backends
pip install lightrag-hku[offline-docs,offline-storage]
# Install all offline dependencies
pip install lightrag-hku[offline]
```
### Using Individual Requirements Files
```bash
# Document processing only
pip install -r requirements-offline-docs.txt
# Storage backends only
pip install -r requirements-offline-storage.txt
# LLM providers only
pip install -r requirements-offline-llm.txt
# All offline dependencies
pip install -r requirements-offline.txt
```
## Tiktoken Cache Management
Tiktoken downloads BPE encoding models on first use. In offline environments, you must pre-download these models.
### Using the CLI Command
After installing LightRAG, use the built-in command:
```bash
# Download to default location (~/.tiktoken_cache)
lightrag-download-cache
# Download to specific directory
lightrag-download-cache --cache-dir ./tiktoken_cache
# Download specific models only
lightrag-download-cache --models gpt-4o-mini gpt-4
```
### Default Models Downloaded
- `gpt-4o-mini` (LightRAG default)
- `gpt-4o`
- `gpt-4`
- `gpt-3.5-turbo`
- `text-embedding-ada-002`
- `text-embedding-3-small`
- `text-embedding-3-large`
### Setting Cache Location in Offline Environment
```bash
# Option 1: Environment variable (temporary)
export TIKTOKEN_CACHE_DIR=/path/to/tiktoken_cache
# Option 2: Add to ~/.bashrc or ~/.zshrc (persistent)
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
source ~/.bashrc
# Option 3: Copy to default location
cp -r /path/to/tiktoken_cache ~/.tiktoken_cache/
```
## Complete Offline Deployment Workflow
### Step 1: Prepare in Online Environment
```bash
# 1. Install LightRAG with offline dependencies
pip install lightrag-hku[offline]
# 2. Download tiktoken cache
lightrag-download-cache --cache-dir ./offline_cache/tiktoken
# 3. Download all Python packages
pip download lightrag-hku[offline] -d ./offline_cache/packages
# 4. Create archive for transfer
tar -czf lightrag-offline-complete.tar.gz ./offline_cache
# 5. Verify contents
tar -tzf lightrag-offline-complete.tar.gz | head -20
```
### Step 2: Transfer to Offline Environment
```bash
# Using scp
scp lightrag-offline-complete.tar.gz user@offline-server:/tmp/
# Or using USB/physical media
# Copy lightrag-offline-complete.tar.gz to USB drive
```
### Step 3: Install in Offline Environment
```bash
# 1. Extract archive
cd /tmp
tar -xzf lightrag-offline-complete.tar.gz
# 2. Install Python packages
pip install --no-index \
--find-links=/tmp/offline_cache/packages \
lightrag-hku[offline]
# 3. Set up tiktoken cache
mkdir -p ~/.tiktoken_cache
cp -r /tmp/offline_cache/tiktoken/* ~/.tiktoken_cache/
export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache
# 4. Add to shell profile for persistence
echo 'export TIKTOKEN_CACHE_DIR=~/.tiktoken_cache' >> ~/.bashrc
```
### Step 4: Verify Installation
```bash
# Test Python import
python -c "from lightrag import LightRAG; print('✓ LightRAG imported')"
# Test tiktoken
python -c "from lightrag.utils import TiktokenTokenizer; t = TiktokenTokenizer(); print('✓ Tiktoken working')"
# Test optional dependencies (if installed)
python -c "import docling; print('✓ Docling available')"
python -c "import redis; print('✓ Redis available')"
```
## Troubleshooting
### Issue: Tiktoken fails with network error
**Problem**: `Unable to load tokenizer for model gpt-4o-mini`
**Solution**:
```bash
# Ensure TIKTOKEN_CACHE_DIR is set
echo $TIKTOKEN_CACHE_DIR
# Verify cache files exist
ls -la ~/.tiktoken_cache/
# If empty, you need to download cache in online environment first
```
### Issue: Dynamic package installation fails
**Problem**: `Error installing package xxx`
**Solution**:
```bash
# Pre-install the specific package you need
# For document processing:
pip install lightrag-hku[offline-docs]
# For storage backends:
pip install lightrag-hku[offline-storage]
# For LLM providers:
pip install lightrag-hku[offline-llm]
```
### Issue: Missing dependencies at runtime
**Problem**: `ModuleNotFoundError: No module named 'xxx'`
**Solution**:
```bash
# Check what you have installed
pip list | grep -i xxx
# Install missing component
pip install lightrag-hku[offline] # Install all offline deps
```
### Issue: Permission denied on tiktoken cache
**Problem**: `PermissionError: [Errno 13] Permission denied`
**Solution**:
```bash
# Ensure cache directory has correct permissions
chmod 755 ~/.tiktoken_cache
chmod 644 ~/.tiktoken_cache/*
# Or use a user-writable directory
export TIKTOKEN_CACHE_DIR=~/my_tiktoken_cache
mkdir -p ~/my_tiktoken_cache
```
## Best Practices
1. **Test in Online Environment First**: Always test your complete setup in an online environment before going offline.
2. **Keep Cache Updated**: Periodically update your offline cache when new models are released.
3. **Document Your Setup**: Keep notes on which optional dependencies you actually need.
4. **Version Pinning**: Consider pinning specific versions in production:
```bash
pip freeze > requirements-production.txt
```
5. **Minimal Installation**: Only install what you need:
```bash
# If you only process PDFs with OpenAI
pip install lightrag-hku[offline-docs]
# Then manually add: pip install openai
```
## Additional Resources
- [LightRAG GitHub Repository](https://github.com/HKUDS/LightRAG)
- [Docker Deployment Guide](./DockerDeployment.md)
- [API Documentation](../lightrag/api/README.md)
## Support
If you encounter issues not covered in this guide:
1. Check the [GitHub Issues](https://github.com/HKUDS/LightRAG/issues)
2. Review the [project documentation](../README.md)
3. Create a new issue with your offline deployment details

170
docs/UV_LOCK_GUIDE.md Normal file
View file

@ -0,0 +1,170 @@
# uv.lock Update Guide
## What is uv.lock?
`uv.lock` is uv's lock file. It captures the exact version of every dependency, including transitive ones, much like:
- Node.js `package-lock.json`
- Rust `Cargo.lock`
- Python Poetry `poetry.lock`
Keeping `uv.lock` in version control guarantees that everyone installs the same dependency set.
## When does uv.lock change?
### Situations where it does *not* change automatically
- Running `uv sync --frozen`
- Building Docker images that call `uv sync --frozen`
- Editing source code without touching dependency metadata
### Situations where it will change
1. **`uv lock` or `uv lock --upgrade`**
```bash
uv lock # Resolve according to current constraints
uv lock --upgrade # Re-resolve and upgrade to the newest compatible releases
```
Use these commands after modifying `pyproject.toml`, when you want fresh dependency versions, or if the lock file was deleted or corrupted.
2. **`uv add`**
```bash
uv add requests # Adds the dependency and updates both files
uv add --dev pytest # Adds a dev dependency
```
`uv add` edits `pyproject.toml` and refreshes `uv.lock` in one step.
3. **`uv remove`**
```bash
uv remove requests
```
This removes the dependency from `pyproject.toml` and rewrites `uv.lock`.
4. **`uv sync` without `--frozen`**
```bash
uv sync
```
Normally this only installs what is already locked. However, if `pyproject.toml` and `uv.lock` disagree or the lock file is missing, uv will regenerate and update `uv.lock`. In CI and production builds you should prefer `uv sync --frozen` to prevent unintended updates.
## Example workflows
### Scenario 1: Add a new dependency
```bash
# Recommended: let uv handle both files
uv add fastapi
git add pyproject.toml uv.lock
git commit -m "Add fastapi dependency"
# Manual alternative
# 1. Edit pyproject.toml
# 2. Regenerate the lock file
uv lock
git add pyproject.toml uv.lock
git commit -m "Add fastapi dependency"
```
### Scenario 2: Relax or tighten a version constraint
```bash
# 1. Edit the requirement in pyproject.toml,
# e.g. openai>=1.0.0,<2.0.0 -> openai>=1.5.0,<2.0.0
# 2. Re-resolve the lock file
uv lock
# 3. Commit both files
git add pyproject.toml uv.lock
git commit -m "Update openai to >=1.5.0"
```
### Scenario 3: Upgrade everything to the newest compatible versions
```bash
uv lock --upgrade
git diff uv.lock
git add uv.lock
git commit -m "Upgrade dependencies to latest compatible versions"
```
### Scenario 4: Teammate syncing the project
```bash
git pull # Fetch latest code and lock file
uv sync --frozen # Install exactly what uv.lock specifies
```
## Using uv.lock in Docker
```dockerfile
RUN uv sync --frozen --no-dev --extra api
```
`--frozen` guarantees reproducible builds because uv will refuse to deviate from the locked versions.
`--extra api` install API server
## Generating a lock file that includes offline dependencies
If you need `uv.lock` to capture the optional offline stacks, regenerate it with the relevant extras enabled:
```bash
uv lock --extra api --extra offline
```
This command resolves the base project requirements plus both the `api` and `offline` optional dependency sets, ensuring downstream `uv sync --frozen --extra api --extra offline` installs work without further resolution.
## Frequently asked questions
- **`uv.lock` is almost 1MB. Does that matter?**
No. The file is read only during dependency resolution.
- **Should we commit `uv.lock`?**
Yes. Commit it so collaborators and CI jobs share the same dependency graph.
- **Deleted the lock file by accident?**
Run `uv lock` to regenerate it from `pyproject.toml`.
- **Can `uv.lock` and `requirements.txt` coexist?**
They can, but maintaining both is redundant. Prefer relying on `uv.lock` alone whenever possible.
- **How do I inspect locked versions?**
```bash
uv tree
grep -A5 'name = "openai"' uv.lock
```
## Best practices
### Recommended
1. Commit `uv.lock` alongside `pyproject.toml`.
2. Use `uv sync --frozen` in CI, Docker, and other reproducible environments.
3. Use plain `uv sync` during local development if you want uv to reconcile the lock for you.
4. Run `uv lock --upgrade` periodically to pick up the latest compatible releases.
5. Regenerate the lock file immediately after changing dependency constraints.
### Avoid
1. Running `uv sync` without `--frozen` in CI or production pipelines.
2. Editing `uv.lock` by hand—uv will overwrite manual edits.
3. Ignoring lock file diffs in code reviews—unexpected dependency changes can break builds.
## Summary
| Command | Updates `uv.lock` | Typical use |
|-----------------------|-------------------|-------------------------------------------|
| `uv lock` | ✅ Yes | After editing constraints |
| `uv lock --upgrade` | ✅ Yes | Upgrade to the newest compatible versions |
| `uv add <pkg>` | ✅ Yes | Add a dependency |
| `uv remove <pkg>` | ✅ Yes | Remove a dependency |
| `uv sync` | ⚠️ Maybe | Local development; can regenerate the lock |
| `uv sync --frozen` | ❌ No | CI/CD, Docker, reproducible builds |
Remember: `uv.lock` only changes when you run a command that tells it to. Keep it in sync with your project and commit it whenever it changes.

View file

@ -23,13 +23,13 @@ WEBUI_DESCRIPTION="Simple and Fast Graph Based RAG System"
# WORKING_DIR=<absolute_path_for_working_dir>
### Tiktoken cache directory (Store cached files in this folder for offline deployment)
# TIKTOKEN_CACHE_DIR=./temp/tiktoken
# TIKTOKEN_CACHE_DIR=/app/data/tiktoken
### Ollama Emulating Model and Tag
# OLLAMA_EMULATING_MODEL_NAME=lightrag
OLLAMA_EMULATING_MODEL_TAG=latest
### Max nodes return from grap retrieval in webui
### Max nodes return from graph retrieval in webui
# MAX_GRAPH_NODES=1000
### Logging level
@ -56,29 +56,24 @@ OLLAMA_EMULATING_MODEL_TAG=latest
######################################################################################
### Query Configuration
###
### How to control the context lenght sent to LLM:
### How to control the context length sent to LLM:
### MAX_ENTITY_TOKENS + MAX_RELATION_TOKENS < MAX_TOTAL_TOKENS
### Chunk_Tokens = MAX_TOTAL_TOKENS - Actual_Entity_Tokens - Actual_Reation_Tokens
### Chunk_Tokens = MAX_TOTAL_TOKENS - Actual_Entity_Tokens - Actual_Relation_Tokens
######################################################################################
# LLM responde cache for query (Not valid for streaming response)
# LLM response cache for query (Not valid for streaming response)
ENABLE_LLM_CACHE=true
# COSINE_THRESHOLD=0.2
### Number of entities or relations retrieved from KG
# TOP_K=40
### Maxmium number or chunks for naive vector search
### Maximum number or chunks for naive vector search
# CHUNK_TOP_K=20
### control the actual enties send to LLM
### control the actual entities send to LLM
# MAX_ENTITY_TOKENS=6000
### control the actual relations send to LLM
# MAX_RELATION_TOKENS=8000
### control the maximum tokens send to LLM (include entities, raltions and chunks)
### control the maximum tokens send to LLM (include entities, relations and chunks)
# MAX_TOTAL_TOKENS=30000
### maximum number of related chunks per source entity or relation
### The chunk picker uses this value to determine the total number of chunks selected from KG(knowledge graph)
### Higher values increase re-ranking time
# RELATED_CHUNK_NUMBER=5
### chunk selection strategies
### VECTOR: Pick KG chunks by vector similarity, delivered chunks to the LLM aligning more closely with naive retrieval
### WEIGHT: Pick KG chunks by entity and chunk weight, delivered more solely KG related chunks to the LLM
@ -93,7 +88,7 @@ ENABLE_LLM_CACHE=true
RERANK_BINDING=null
### Enable rerank by default in query params when RERANK_BINDING is not null
# RERANK_BY_DEFAULT=True
### rerank score chunk filter(set to 0.0 to keep all chunks, 0.6 or above if LLM is not strong enought)
### rerank score chunk filter(set to 0.0 to keep all chunks, 0.6 or above if LLM is not strong enough)
# MIN_RERANK_SCORE=0.0
### For local deployment with vLLM
@ -131,7 +126,7 @@ SUMMARY_LANGUAGE=English
# CHUNK_SIZE=1200
# CHUNK_OVERLAP_SIZE=100
### Number of summary semgments or tokens to trigger LLM summary on entity/relation merge (at least 3 is recommented)
### Number of summary segments or tokens to trigger LLM summary on entity/relation merge (at least 3 is recommended)
# FORCE_LLM_SUMMARY_ON_MERGE=8
### Max description token size to trigger LLM summary
# SUMMARY_MAX_TOKENS = 1200
@ -140,6 +135,22 @@ SUMMARY_LANGUAGE=English
### Maximum context size sent to LLM for description summary
# SUMMARY_CONTEXT_SIZE=12000
### control the maximum chunk_ids stored in vector and graph db
# MAX_SOURCE_IDS_PER_ENTITY=300
# MAX_SOURCE_IDS_PER_RELATION=300
### control chunk_ids limitation method: FIFO, KEEP
### FIFO: First in first out
### KEEP: Keep oldest (less merge action and faster)
# SOURCE_IDS_LIMIT_METHOD=FIFO
# Maximum number of file paths stored in entity/relation file_path field (For displayed only, does not affect query performance)
# MAX_FILE_PATHS=100
### maximum number of related chunks per source entity or relation
### The chunk picker uses this value to determine the total number of chunks selected from KG(knowledge graph)
### Higher values increase re-ranking time
# RELATED_CHUNK_NUMBER=5
###############################
### Concurrency Configuration
###############################
@ -179,7 +190,7 @@ LLM_BINDING_API_KEY=your_api_key
# OPENAI_LLM_TEMPERATURE=0.9
### Set the max_tokens to mitigate endless output of some LLM (less than LLM_TIMEOUT * llm_output_tokens/second, i.e. 9000 = 180s * 50 tokens/s)
### Typically, max_tokens does not include prompt content, though some models, such as Gemini Models, are exceptions
### For vLLM/SGLang doployed models, or most of OpenAI compatible API provider
### For vLLM/SGLang deployed models, or most of OpenAI compatible API provider
# OPENAI_LLM_MAX_TOKENS=9000
### For OpenAI o1-mini or newer modles
OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
@ -193,10 +204,11 @@ OPENAI_LLM_MAX_COMPLETION_TOKENS=9000
# OPENAI_LLM_REASONING_EFFORT=minimal
### OpenRouter Specific Parameters
# OPENAI_LLM_EXTRA_BODY='{"reasoning": {"enabled": false}}'
### Qwen3 Specific Parameters depoly by vLLM
### Qwen3 Specific Parameters deploy by vLLM
# OPENAI_LLM_EXTRA_BODY='{"chat_template_kwargs": {"enable_thinking": false}}'
### use the following command to see all support options for Ollama LLM
### If LightRAG deployed in Docker uses host.docker.internal instead of localhost in LLM_BINDING_HOST
### lightrag-server --llm-binding ollama --help
### Ollama Server Specific Parameters
### OLLAMA_LLM_NUM_CTX must be provided, and should at least larger than MAX_TOTAL_TOKENS + 2000
@ -218,7 +230,7 @@ EMBEDDING_BINDING=ollama
EMBEDDING_MODEL=bge-m3:latest
EMBEDDING_DIM=1024
EMBEDDING_BINDING_API_KEY=your_api_key
# If the embedding service is deployed within the same Docker stack, use host.docker.internal instead of localhost
# If LightRAG deployed in Docker uses host.docker.internal instead of localhost
EMBEDDING_BINDING_HOST=http://localhost:11434
### OpenAI compatible (VoyageAI embedding openai compatible)
@ -247,8 +259,8 @@ OLLAMA_EMBEDDING_NUM_CTX=8192
### lightrag-server --embedding-binding ollama --help
####################################################################
### WORKSPACE setting workspace name for all storage types
### in the purpose of isolating data from LightRAG instances.
### WORKSPACE sets workspace name for all storage types
### for the purpose of isolating data from LightRAG instances.
### Valid workspace name constraints: a-z, A-Z, 0-9, and _
####################################################################
# WORKSPACE=space1
@ -303,6 +315,16 @@ POSTGRES_HNSW_M=16
POSTGRES_HNSW_EF=200
POSTGRES_IVFFLAT_LISTS=100
### PostgreSQL Connection Retry Configuration (Network Robustness)
### Number of retry attempts (1-10, default: 3)
### Initial retry backoff in seconds (0.1-5.0, default: 0.5)
### Maximum retry backoff in seconds (backoff-60.0, default: 5.0)
### Connection pool close timeout in seconds (1.0-30.0, default: 5.0)
# POSTGRES_CONNECTION_RETRIES=3
# POSTGRES_CONNECTION_RETRY_BACKOFF=0.5
# POSTGRES_CONNECTION_RETRY_BACKOFF_MAX=5.0
# POSTGRES_POOL_CLOSE_TIMEOUT=5.0
### PostgreSQL SSL Configuration (Optional)
# POSTGRES_SSL_MODE=require
# POSTGRES_SSL_CERT=/path/to/client-cert.pem
@ -310,6 +332,14 @@ POSTGRES_IVFFLAT_LISTS=100
# POSTGRES_SSL_ROOT_CERT=/path/to/ca-cert.pem
# POSTGRES_SSL_CRL=/path/to/crl.pem
### PostgreSQL Server Settings (for Supabase Supavisor)
# Use this to pass extra options to the PostgreSQL connection string.
# For Supabase, you might need to set it like this:
# POSTGRES_SERVER_SETTINGS="options=reference%3D[project-ref]"
# Default is 100 set to 0 to disable
# POSTGRES_STATEMENT_CACHE_SIZE=100
### Neo4j Configuration
NEO4J_URI=neo4j+s://xxxxxxxx.databases.neo4j.io
NEO4J_USERNAME=neo4j

View file

@ -2,7 +2,7 @@ apiVersion: v2
name: lightrag
description: A Helm chart for LightRAG, an efficient and lightweight RAG system
type: application
version: 0.1.0
version: 0.1.1
appVersion: "1.0.0"
maintainers:
- name: LightRAG Team

View file

@ -43,6 +43,22 @@ spec:
- name: env-file
mountPath: /app/.env
subPath: .env
{{- $envFrom := default (dict) .Values.envFrom }}
{{- $envFromEntries := list }}
{{- range (default (list) (index $envFrom "secrets")) }}
{{- $envFromEntries = append $envFromEntries (dict "secretRef" (dict "name" .name)) }}
{{- end }}
{{- range (default (list) (index $envFrom "configmaps")) }}
{{- $envFromEntries = append $envFromEntries (dict "configMapRef" (dict "name" .name)) }}
{{- end }}
{{- if gt (len $envFromEntries) 0 }}
envFrom:
{{- toYaml $envFromEntries | nindent 12 }}
{{- end }}
{{- with .Values.image.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
volumes:
- name: env-file
secret:
@ -60,3 +76,6 @@ spec:
- name: inputs
emptyDir: {}
{{- end }}
strategy:
{{- toYaml .Values.updateStrategy | nindent 4 }}

View file

@ -3,6 +3,23 @@ replicaCount: 1
image:
repository: ghcr.io/hkuds/lightrag
tag: latest
# Optionally specify imagePullSecrets if your image is in a private registry
# example:
# imagePullSecrets:
# - name: my-registry-secret
imagePullSecrets: []
# Specify a deployment strategy
# example:
# updateStrategy:
# type: RollingUpdate
# rollingUpdate:
# maxUnavailable: 25%
# maxSurge: 25%
# Default for now should be Recreate as any RollingUpdate will cause issues with
# multiple instances trying to access the same persistent storage if not using RWX volumes.
updateStrategy:
type: Recreate
service:
type: ClusterIP
@ -23,6 +40,13 @@ persistence:
inputs:
size: 5Gi
# Allow specifying additional environment variables from ConfigMaps or Secrets created outside of this chart
envFrom:
configmaps: []
# - name: my-shiny-configmap-1
secrets: []
# - name: my-shiny-secret-1
env:
HOST: 0.0.0.0
PORT: 9621
@ -38,8 +62,8 @@ env:
EMBEDDING_BINDING_API_KEY:
LIGHTRAG_KV_STORAGE: PGKVStorage
LIGHTRAG_VECTOR_STORAGE: PGVectorStorage
# LIGHTRAG_KV_STORAGE: RedisKVStorage
# LIGHTRAG_VECTOR_STORAGE: QdrantVectorDBStorage
# LIGHTRAG_KV_STORAGE: RedisKVStorage
# LIGHTRAG_VECTOR_STORAGE: QdrantVectorDBStorage
LIGHTRAG_GRAPH_STORAGE: Neo4JStorage
LIGHTRAG_DOC_STATUS_STORAGE: PGDocStatusStorage
# Replace with your POSTGRES credentials

View file

@ -1,5 +1,5 @@
from .lightrag import LightRAG as LightRAG, QueryParam as QueryParam
__version__ = "1.4.9"
__version__ = "1.4.9.5"
__author__ = "Zirui Guo"
__url__ = "https://github.com/HKUDS/LightRAG"

View file

@ -21,15 +21,24 @@ pip install "lightrag-hku[api]"
* 从源代码安装
```bash
# 克隆仓库
# Clone the repository
git clone https://github.com/HKUDS/lightrag.git
# 切换到仓库目录
# Change to the repository directory
cd lightrag
# 如有必要,创建 Python 虚拟环境
# 以可编辑模式安装并支持 API
# Create a Python virtual environment
uv venv --seed --python 3.12
source .venv/bin/activate
# Install in editable mode with API support
pip install -e ".[api]"
# Build front-end artifacts
cd lightrag_webui
bun install --frozen-lockfile
bun run build
cd ..
```
### 启动 LightRAG 服务器前的准备
@ -109,28 +118,10 @@ lightrag-gunicorn --workers 4
### 使用 Docker 启动 LightRAG 服务器
* 配置 .env 文件:
通过复制示例文件 [`env.example`](env.example) 创建个性化的 .env 文件,并根据实际需求设置 LLM 及 Embedding 参数。
* 创建一个名为 docker-compose.yml 的文件:
```yaml
services:
lightrag:
container_name: lightrag
image: ghcr.io/hkuds/lightrag:latest
ports:
- "${PORT:-9621}:9621"
volumes:
- ./data/rag_storage:/app/data/rag_storage
- ./data/inputs:/app/data/inputs
- ./config.ini:/app/config.ini
- ./.env:/app/.env
env_file:
- .env
restart: unless-stopped
extra_hosts:
- "host.docker.internal:host-gateway"
```
使用 Docker Compose 是部署和运行 LightRAG Server 最便捷的方式。
- 创建一个项目目录。
- 将 LightRAG 仓库中的 `docker-compose.yml` 文件复制到您的项目目录中。
- 准备 `.env` 文件:复制示例文件 [`env.example`](https://ai.znipower.com:5013/c/env.example) 创建自定义的 `.env` 文件,并根据您的具体需求配置 LLM 和嵌入参数。
* 通过以下命令启动 LightRAG 服务器:
@ -138,7 +129,11 @@ services:
docker compose up
# 如果希望启动后让程序退到后台运行,需要在命令的最后添加 -d 参数
```
> 可以通过以下链接获取官方的docker compose文件[docker-compose.yml]( https://raw.githubusercontent.com/HKUDS/LightRAG/refs/heads/main/docker-compose.yml) 。如需获取LightRAG的历史版本镜像可以访问以下链接: [LightRAG Docker Images]( https://github.com/HKUDS/LightRAG/pkgs/container/lightrag)
> 可以通过以下链接获取官方的docker compose文件[docker-compose.yml]( https://raw.githubusercontent.com/HKUDS/LightRAG/refs/heads/main/docker-compose.yml) 。如需获取LightRAG的历史版本镜像可以访问以下链接: [LightRAG Docker Images]( https://github.com/HKUDS/LightRAG/pkgs/container/lightrag). 如需获取更多关于docker部署的信息请参阅 [DockerDeployment.md](./../../docs/DockerDeployment.md).
### 离线部署
官方的 LightRAG Docker 镜像完全兼容离线或隔离网络环境。如需搭建自己的离线部署环境,请参考 [离线部署指南](./../../docs/OfflineDeployment.md)。
### 启动多个LightRAG实例
@ -278,7 +273,17 @@ LIGHTRAG_API_KEY=your-secure-api-key-here
WHITELIST_PATHS=/health,/api/*
```
> 健康检查和 Ollama 模拟端点默认不进行 API 密钥检查。
> 健康检查和 Ollama 模拟端点默认不进行 API 密钥检查。为了安全原因如果不需要提供Ollama服务应该把`/api/*`从WHITELIST_PATHS中移除。
API Key使用的请求头是 `X-API-Key` 。以下是使用API访问LightRAG Server的一个例子
```
curl -X 'POST' \
'http://localhost:9621/documents/scan' \
-H 'accept: application/json' \
-H 'X-API-Key: your-secure-api-key-here-123' \
-d ''
```
* 账户凭证Web 界面需要登录后才能访问)

View file

@ -27,9 +27,18 @@ git clone https://github.com/HKUDS/lightrag.git
# Change to the repository directory
cd lightrag
# create a Python virtual environment if necessary
# Create a Python virtual environment
uv venv --seed --python 3.12
source .venv/bin/activate
# Install in editable mode with API support
pip install -e ".[api]"
# Build front-end artifacts
cd lightrag_webui
bun install --frozen-lockfile
bun run build
cd ..
```
### Before Starting LightRAG Server
@ -110,29 +119,13 @@ During startup, configurations in the `.env` file can be overridden by command-l
### Launching LightRAG Server with Docker
* Prepare the .env file:
Create a personalized .env file by copying the sample file [`env.example`](env.example). Configure the LLM and embedding parameters according to your requirements.
Using Docker Compose is the most convenient way to deploy and run the LightRAG Server.
* Create a file named `docker-compose.yml`:
* Create a project directory.
```yaml
services:
lightrag:
container_name: lightrag
image: ghcr.io/hkuds/lightrag:latest
ports:
- "${PORT:-9621}:9621"
volumes:
- ./data/rag_storage:/app/data/rag_storage
- ./data/inputs:/app/data/inputs
- ./config.ini:/app/config.ini
- ./.env:/app/.env
env_file:
- .env
restart: unless-stopped
extra_hosts:
- "host.docker.internal:host-gateway"
```
* Copy the `docker-compose.yml` file from the LightRAG repository into your project directory.
* Prepare the `.env` file: Duplicate the sample file [`env.example`](https://ai.znipower.com:5013/c/env.example)to create a customized `.env` file, and configure the LLM and embedding parameters according to your specific requirements.
* Start the LightRAG Server with the following command:
@ -141,7 +134,11 @@ docker compose up
# If you want the program to run in the background after startup, add the -d parameter at the end of the command.
```
> You can get the official docker compose file from here: [docker-compose.yml](https://raw.githubusercontent.com/HKUDS/LightRAG/refs/heads/main/docker-compose.yml). For historical versions of LightRAG docker images, visit this link: [LightRAG Docker Images](https://github.com/HKUDS/LightRAG/pkgs/container/lightrag)
You can get the official docker compose file from here: [docker-compose.yml](https://raw.githubusercontent.com/HKUDS/LightRAG/refs/heads/main/docker-compose.yml). For historical versions of LightRAG docker images, visit this link: [LightRAG Docker Images](https://github.com/HKUDS/LightRAG/pkgs/container/lightrag). For more details about docker deployment, please refer to [DockerDeployment.md](./../../docs/DockerDeployment.md).
### Offline Deployment
Official LightRAG Docker images are fully compatible with offline or air-gapped environments. If you want to build up you own offline enviroment, please refer to [Offline Deployment Guide](./../../docs/OfflineDeployment.md).
### Starting Multiple LightRAG Instances
@ -280,7 +277,17 @@ LIGHTRAG_API_KEY=your-secure-api-key-here
WHITELIST_PATHS=/health,/api/*
```
> Health check and Ollama emulation endpoints are excluded from API Key check by default.
> Health check and Ollama emulation endpoints are excluded from API Key check by default. For security reasons, remove `/api/*` from `WHITELIST_PATHS` if the Ollama service is not required.
The API key is passed using the request header `X-API-Key`. Below is an example of accessing the LightRAG Server via API:
```
curl -X 'POST' \
'http://localhost:9621/documents/scan' \
-H 'accept: application/json' \
-H 'X-API-Key: your-secure-api-key-here-123' \
-d ''
```
* Account credentials (the Web UI requires login before access can be granted):

View file

@ -1 +1 @@
__api_version__ = "0230"
__api_version__ = "0245"

View file

@ -145,7 +145,129 @@ class LLMConfigCache:
self.ollama_embedding_options = {}
def check_frontend_build():
"""Check if frontend is built and optionally check if source is up-to-date"""
webui_dir = Path(__file__).parent / "webui"
index_html = webui_dir / "index.html"
# 1. Check if build files exist (required)
if not index_html.exists():
ASCIIColors.red("\n" + "=" * 80)
ASCIIColors.red("ERROR: Frontend Not Built")
ASCIIColors.red("=" * 80)
ASCIIColors.yellow("The WebUI frontend has not been built yet.")
ASCIIColors.yellow(
"Please build the frontend code first using the following commands:\n"
)
ASCIIColors.cyan(" cd lightrag_webui")
ASCIIColors.cyan(" bun install --frozen-lockfile")
ASCIIColors.cyan(" bun run build")
ASCIIColors.cyan(" cd ..")
ASCIIColors.yellow("\nThen restart the service.\n")
ASCIIColors.cyan(
"Note: Make sure you have Bun installed. Visit https://bun.sh for installation."
)
ASCIIColors.red("=" * 80 + "\n")
sys.exit(1) # Exit immediately
# 2. Check if this is a development environment (source directory exists)
try:
source_dir = Path(__file__).parent.parent.parent / "lightrag_webui"
src_dir = source_dir / "src"
# Determine if this is a development environment: source directory exists and contains src directory
if not source_dir.exists() or not src_dir.exists():
# Production environment, skip source code check
logger.debug(
"Production environment detected, skipping source freshness check"
)
return
# Development environment, perform source code timestamp check
logger.debug("Development environment detected, checking source freshness")
# Source code file extensions (files to check)
source_extensions = {
".ts",
".tsx",
".js",
".jsx",
".mjs",
".cjs", # TypeScript/JavaScript
".css",
".scss",
".sass",
".less", # Style files
".json",
".jsonc", # Configuration/data files
".html",
".htm", # Template files
".md",
".mdx", # Markdown
}
# Key configuration files (in lightrag_webui root directory)
key_files = [
source_dir / "package.json",
source_dir / "bun.lock",
source_dir / "vite.config.ts",
source_dir / "tsconfig.json",
source_dir / "tailwind.config.js",
source_dir / "index.html",
]
# Get the latest modification time of source code
latest_source_time = 0
# Check source code files in src directory
for file_path in src_dir.rglob("*"):
if file_path.is_file():
# Only check source code files, ignore temporary files and logs
if file_path.suffix.lower() in source_extensions:
mtime = file_path.stat().st_mtime
latest_source_time = max(latest_source_time, mtime)
# Check key configuration files
for key_file in key_files:
if key_file.exists():
mtime = key_file.stat().st_mtime
latest_source_time = max(latest_source_time, mtime)
# Get build time
build_time = index_html.stat().st_mtime
# Compare timestamps (5 second tolerance to avoid file system time precision issues)
if latest_source_time > build_time + 5:
ASCIIColors.yellow("\n" + "=" * 80)
ASCIIColors.yellow("WARNING: Frontend Source Code Has Been Updated")
ASCIIColors.yellow("=" * 80)
ASCIIColors.yellow(
"The frontend source code is newer than the current build."
)
ASCIIColors.yellow(
"This might happen after 'git pull' or manual code changes.\n"
)
ASCIIColors.cyan(
"Recommended: Rebuild the frontend to use the latest changes:"
)
ASCIIColors.cyan(" cd lightrag_webui")
ASCIIColors.cyan(" bun install --frozen-lockfile")
ASCIIColors.cyan(" bun run build")
ASCIIColors.cyan(" cd ..")
ASCIIColors.yellow("\nThe server will continue with the current build.")
ASCIIColors.yellow("=" * 80 + "\n")
else:
logger.info("Frontend build is up-to-date")
except Exception as e:
# If check fails, log warning but don't affect startup
logger.warning(f"Failed to check frontend source freshness: {e}")
def create_app(args):
# Check frontend build first
check_frontend_build()
# Setup logging
logger.setLevel(args.log_level)
set_verbose_debug(args.verbose)
@ -223,14 +345,17 @@ def create_app(args):
finalize_share_data()
# Initialize FastAPI
base_description = (
"Providing API for LightRAG core, Web UI and Ollama Model Emulation"
)
swagger_description = (
base_description
+ (" (API-Key Enabled)" if api_key else "")
+ "\n\n[View ReDoc documentation](/redoc)"
)
app_kwargs = {
"title": "LightRAG Server API",
"description": (
"Providing API for LightRAG core, Web UI and Ollama Model Emulation"
+ "(With authentication)"
if api_key
else ""
),
"description": swagger_description,
"version": __api_version__,
"openapi_url": "/openapi.json", # Explicitly set OpenAPI schema URL
"docs_url": "/docs", # Explicitly set docs URL
@ -786,7 +911,9 @@ def create_app(args):
async def get_response(self, path: str, scope):
response = await super().get_response(path, scope)
if path.endswith(".html"):
is_html = path.endswith(".html") or response.media_type == "text/html"
if is_html:
response.headers["Cache-Control"] = (
"no-cache, no-store, must-revalidate"
)

View file

@ -134,6 +134,55 @@ class ScanResponse(BaseModel):
}
class ReprocessResponse(BaseModel):
"""Response model for reprocessing failed documents operation
Attributes:
status: Status of the reprocessing operation
message: Message describing the operation result
track_id: Tracking ID for monitoring reprocessing progress
"""
status: Literal["reprocessing_started"] = Field(
description="Status of the reprocessing operation"
)
message: str = Field(description="Human-readable message describing the operation")
track_id: str = Field(
description="Tracking ID for monitoring reprocessing progress"
)
class Config:
json_schema_extra = {
"example": {
"status": "reprocessing_started",
"message": "Reprocessing of failed documents has been initiated in background",
"track_id": "retry_20250729_170612_def456",
}
}
class CancelPipelineResponse(BaseModel):
"""Response model for pipeline cancellation operation
Attributes:
status: Status of the cancellation request
message: Message describing the operation result
"""
status: Literal["cancellation_requested", "not_busy"] = Field(
description="Status of the cancellation request"
)
message: str = Field(description="Human-readable message describing the operation")
class Config:
json_schema_extra = {
"example": {
"status": "cancellation_requested",
"message": "Pipeline cancellation has been requested. Documents will be marked as FAILED.",
}
}
class InsertTextRequest(BaseModel):
"""Request model for inserting a single text document
@ -309,6 +358,10 @@ class DeleteDocRequest(BaseModel):
default=False,
description="Whether to delete the corresponding file in the upload directory.",
)
delete_llm_cache: bool = Field(
default=False,
description="Whether to delete cached LLM extraction results for the documents.",
)
@field_validator("doc_ids", mode="after")
@classmethod
@ -379,7 +432,7 @@ class DocStatusResponse(BaseModel):
"id": "doc_123456",
"content_summary": "Research paper on machine learning",
"content_length": 15240,
"status": "PROCESSED",
"status": "processed",
"created_at": "2025-03-31T12:34:56",
"updated_at": "2025-03-31T12:35:30",
"track_id": "upload_20250729_170612_abc123",
@ -412,7 +465,7 @@ class DocsStatusesResponse(BaseModel):
"id": "doc_123",
"content_summary": "Pending document",
"content_length": 5000,
"status": "PENDING",
"status": "pending",
"created_at": "2025-03-31T10:00:00",
"updated_at": "2025-03-31T10:00:00",
"track_id": "upload_20250331_100000_abc123",
@ -422,12 +475,27 @@ class DocsStatusesResponse(BaseModel):
"file_path": "pending_doc.pdf",
}
],
"PREPROCESSED": [
{
"id": "doc_789",
"content_summary": "Document pending final indexing",
"content_length": 7200,
"status": "preprocessed",
"created_at": "2025-03-31T09:30:00",
"updated_at": "2025-03-31T09:35:00",
"track_id": "upload_20250331_093000_xyz789",
"chunks_count": 10,
"error": None,
"metadata": None,
"file_path": "preprocessed_doc.pdf",
}
],
"PROCESSED": [
{
"id": "doc_456",
"content_summary": "Processed document",
"content_length": 8000,
"status": "PROCESSED",
"status": "processed",
"created_at": "2025-03-31T09:00:00",
"updated_at": "2025-03-31T09:05:00",
"track_id": "insert_20250331_090000_def456",
@ -599,6 +667,7 @@ class PaginatedDocsResponse(BaseModel):
"status_counts": {
"PENDING": 10,
"PROCESSING": 5,
"PREPROCESSED": 5,
"PROCESSED": 130,
"FAILED": 5,
},
@ -621,6 +690,7 @@ class StatusCountsResponse(BaseModel):
"status_counts": {
"PENDING": 10,
"PROCESSING": 5,
"PREPROCESSED": 5,
"PROCESSED": 130,
"FAILED": 5,
}
@ -1443,6 +1513,7 @@ async def background_delete_documents(
doc_manager: DocumentManager,
doc_ids: List[str],
delete_file: bool = False,
delete_llm_cache: bool = False,
):
"""Background task to delete multiple documents"""
from lightrag.kg.shared_storage import (
@ -1477,11 +1548,27 @@ async def background_delete_documents(
)
# Use slice assignment to clear the list in place
pipeline_status["history_messages"][:] = ["Starting document deletion process"]
if delete_llm_cache:
pipeline_status["history_messages"].append(
"LLM cache cleanup requested for this deletion job"
)
try:
# Loop through each document ID and delete them one by one
for i, doc_id in enumerate(doc_ids, 1):
# Check for cancellation at the start of each document deletion
async with pipeline_status_lock:
if pipeline_status.get("cancellation_requested", False):
cancel_msg = f"Deletion cancelled by user at document {i}/{total_docs}. {len(successful_deletions)} deleted, {total_docs - i + 1} remaining."
logger.info(cancel_msg)
pipeline_status["latest_message"] = cancel_msg
pipeline_status["history_messages"].append(cancel_msg)
# Add remaining documents to failed list with cancellation reason
failed_deletions.extend(
doc_ids[i - 1 :]
) # i-1 because enumerate starts at 1
break # Exit the loop, remaining documents unchanged
start_msg = f"Deleting document {i}/{total_docs}: {doc_id}"
logger.info(start_msg)
pipeline_status["cur_batch"] = i
@ -1490,7 +1577,9 @@ async def background_delete_documents(
file_path = "#"
try:
result = await rag.adelete_by_doc_id(doc_id)
result = await rag.adelete_by_doc_id(
doc_id, delete_llm_cache=delete_llm_cache
)
file_path = (
getattr(result, "file_path", "-") if "result" in locals() else "-"
)
@ -1642,6 +1731,10 @@ async def background_delete_documents(
# Final summary and check for pending requests
async with pipeline_status_lock:
pipeline_status["busy"] = False
pipeline_status["pending_requests"] = False # Reset pending requests flag
pipeline_status["cancellation_requested"] = (
False # Always reset cancellation flag
)
completion_msg = f"Deletion completed: {len(successful_deletions)} successful, {len(failed_deletions)} failed"
pipeline_status["latest_message"] = completion_msg
pipeline_status["history_messages"].append(completion_msg)
@ -1959,6 +2052,8 @@ def create_document_routes(
rag.full_docs,
rag.full_entities,
rag.full_relations,
rag.entity_chunks,
rag.relation_chunks,
rag.entities_vdb,
rag.relationships_vdb,
rag.chunks_vdb,
@ -2173,20 +2268,24 @@ def create_document_routes(
logger.error(traceback.format_exc())
raise HTTPException(status_code=500, detail=str(e))
# TODO: Deprecated, use /documents/paginated instead
@router.get(
"", response_model=DocsStatusesResponse, dependencies=[Depends(combined_auth)]
)
async def documents() -> DocsStatusesResponse:
"""
Get the status of all documents in the system.
Get the status of all documents in the system. This endpoint is deprecated; use /documents/paginated instead.
To prevent excessive resource consumption, a maximum of 1,000 records is returned.
This endpoint retrieves the current status of all documents, grouped by their
processing status (PENDING, PROCESSING, PROCESSED, FAILED).
processing status (PENDING, PROCESSING, PREPROCESSED, PROCESSED, FAILED). The results are
limited to 1000 total documents with fair distribution across all statuses.
Returns:
DocsStatusesResponse: A response object containing a dictionary where keys are
DocStatus values and values are lists of DocStatusResponse
objects representing documents in each status category.
Maximum 1000 documents total will be returned.
Raises:
HTTPException: If an error occurs while retrieving document statuses (500).
@ -2195,6 +2294,7 @@ def create_document_routes(
statuses = (
DocStatus.PENDING,
DocStatus.PROCESSING,
DocStatus.PREPROCESSED,
DocStatus.PROCESSED,
DocStatus.FAILED,
)
@ -2203,12 +2303,45 @@ def create_document_routes(
results: List[Dict[str, DocProcessingStatus]] = await asyncio.gather(*tasks)
response = DocsStatusesResponse()
total_documents = 0
max_documents = 1000
# Convert results to lists for easier processing
status_documents = []
for idx, result in enumerate(results):
status = statuses[idx]
docs_list = []
for doc_id, doc_status in result.items():
docs_list.append((doc_id, doc_status))
status_documents.append((status, docs_list))
# Fair distribution: round-robin across statuses
status_indices = [0] * len(
status_documents
) # Track current index for each status
current_status_idx = 0
while total_documents < max_documents:
# Check if we have any documents left to process
has_remaining = False
for status_idx, (status, docs_list) in enumerate(status_documents):
if status_indices[status_idx] < len(docs_list):
has_remaining = True
break
if not has_remaining:
break
# Try to get a document from the current status
status, docs_list = status_documents[current_status_idx]
current_index = status_indices[current_status_idx]
if current_index < len(docs_list):
doc_id, doc_status = docs_list[current_index]
if status not in response.statuses:
response.statuses[status] = []
response.statuses[status].append(
DocStatusResponse(
id=doc_id,
@ -2224,6 +2357,13 @@ def create_document_routes(
file_path=doc_status.file_path,
)
)
status_indices[current_status_idx] += 1
total_documents += 1
# Move to next status (round-robin)
current_status_idx = (current_status_idx + 1) % len(status_documents)
return response
except Exception as e:
logger.error(f"Error GET /documents: {str(e)}")
@ -2253,21 +2393,20 @@ def create_document_routes(
Delete documents and all their associated data by their IDs using background processing.
Deletes specific documents and all their associated data, including their status,
text chunks, vector embeddings, and any related graph data.
text chunks, vector embeddings, and any related graph data. When requested,
cached LLM extraction responses are removed after graph deletion/rebuild completes.
The deletion process runs in the background to avoid blocking the client connection.
It is disabled when llm cache for entity extraction is disabled.
This operation is irreversible and will interact with the pipeline status.
Args:
delete_request (DeleteDocRequest): The request containing the document IDs and delete_file options.
delete_request (DeleteDocRequest): The request containing the document IDs and deletion options.
background_tasks: FastAPI BackgroundTasks for async processing
Returns:
DeleteDocByIdResponse: The result of the deletion operation.
- status="deletion_started": The document deletion has been initiated in the background.
- status="busy": The pipeline is busy with another operation.
- status="not_allowed": Operation not allowed when LLM cache for entity extraction is disabled.
Raises:
HTTPException:
@ -2275,15 +2414,6 @@ def create_document_routes(
"""
doc_ids = delete_request.doc_ids
# The rag object is initialized from the server startup args,
# so we can access its properties here.
if not rag.enable_llm_cache_for_entity_extract:
return DeleteDocByIdResponse(
status="not_allowed",
message="Operation not allowed when LLM cache for entity extraction is disabled.",
doc_id=", ".join(delete_request.doc_ids),
)
try:
from lightrag.kg.shared_storage import get_namespace_data
@ -2304,6 +2434,7 @@ def create_document_routes(
doc_manager,
doc_ids,
delete_request.delete_file,
delete_request.delete_llm_cache,
)
return DeleteDocByIdResponse(
@ -2613,4 +2744,111 @@ def create_document_routes(
logger.error(traceback.format_exc())
raise HTTPException(status_code=500, detail=str(e))
@router.post(
"/reprocess_failed",
response_model=ReprocessResponse,
dependencies=[Depends(combined_auth)],
)
async def reprocess_failed_documents(background_tasks: BackgroundTasks):
"""
Reprocess failed and pending documents.
This endpoint triggers the document processing pipeline which automatically
picks up and reprocesses documents in the following statuses:
- FAILED: Documents that failed during previous processing attempts
- PENDING: Documents waiting to be processed
- PROCESSING: Documents with abnormally terminated processing (e.g., server crashes)
This is useful for recovering from server crashes, network errors, LLM service
outages, or other temporary failures that caused document processing to fail.
The processing happens in the background and can be monitored using the
returned track_id or by checking the pipeline status.
Returns:
ReprocessResponse: Response with status, message, and track_id
Raises:
HTTPException: If an error occurs while initiating reprocessing (500).
"""
try:
# Generate track_id with "retry" prefix for retry operation
track_id = generate_track_id("retry")
# Start the reprocessing in the background
background_tasks.add_task(rag.apipeline_process_enqueue_documents)
logger.info(
f"Reprocessing of failed documents initiated with track_id: {track_id}"
)
return ReprocessResponse(
status="reprocessing_started",
message="Reprocessing of failed documents has been initiated in background",
track_id=track_id,
)
except Exception as e:
logger.error(f"Error initiating reprocessing of failed documents: {str(e)}")
logger.error(traceback.format_exc())
raise HTTPException(status_code=500, detail=str(e))
@router.post(
"/cancel_pipeline",
response_model=CancelPipelineResponse,
dependencies=[Depends(combined_auth)],
)
async def cancel_pipeline():
"""
Request cancellation of the currently running pipeline.
This endpoint sets a cancellation flag in the pipeline status. The pipeline will:
1. Check this flag at key processing points
2. Stop processing new documents
3. Cancel all running document processing tasks
4. Mark all PROCESSING documents as FAILED with reason "User cancelled"
The cancellation is graceful and ensures data consistency. Documents that have
completed processing will remain in PROCESSED status.
Returns:
CancelPipelineResponse: Response with status and message
- status="cancellation_requested": Cancellation flag has been set
- status="not_busy": Pipeline is not currently running
Raises:
HTTPException: If an error occurs while setting cancellation flag (500).
"""
try:
from lightrag.kg.shared_storage import (
get_namespace_data,
get_pipeline_status_lock,
)
pipeline_status = await get_namespace_data("pipeline_status")
pipeline_status_lock = get_pipeline_status_lock()
async with pipeline_status_lock:
if not pipeline_status.get("busy", False):
return CancelPipelineResponse(
status="not_busy",
message="Pipeline is not currently running. No cancellation needed.",
)
# Set cancellation flag
pipeline_status["cancellation_requested"] = True
cancel_msg = "Pipeline cancellation requested by user"
logger.info(cancel_msg)
pipeline_status["latest_message"] = cancel_msg
pipeline_status["history_messages"].append(cancel_msg)
return CancelPipelineResponse(
status="cancellation_requested",
message="Pipeline cancellation has been requested. Documents will be marked as FAILED.",
)
except Exception as e:
logger.error(f"Error requesting pipeline cancellation: {str(e)}")
logger.error(traceback.format_exc())
raise HTTPException(status_code=500, detail=str(e))
return router

View file

@ -5,7 +5,7 @@ This module contains all graph-related routes for the LightRAG API.
from typing import Optional, Dict, Any
import traceback
from fastapi import APIRouter, Depends, Query, HTTPException
from pydantic import BaseModel
from pydantic import BaseModel, Field
from lightrag.utils import logger
from ..utils_api import get_combined_auth_dependency
@ -25,6 +25,66 @@ class RelationUpdateRequest(BaseModel):
updated_data: Dict[str, Any]
class EntityMergeRequest(BaseModel):
entities_to_change: list[str] = Field(
...,
description="List of entity names to be merged and deleted. These are typically duplicate or misspelled entities.",
min_length=1,
examples=[["Elon Msk", "Ellon Musk"]],
)
entity_to_change_into: str = Field(
...,
description="Target entity name that will receive all relationships from the source entities. This entity will be preserved.",
min_length=1,
examples=["Elon Musk"],
)
class EntityCreateRequest(BaseModel):
entity_name: str = Field(
...,
description="Unique name for the new entity",
min_length=1,
examples=["Tesla"],
)
entity_data: Dict[str, Any] = Field(
...,
description="Dictionary containing entity properties. Common fields include 'description' and 'entity_type'.",
examples=[
{
"description": "Electric vehicle manufacturer",
"entity_type": "ORGANIZATION",
}
],
)
class RelationCreateRequest(BaseModel):
source_entity: str = Field(
...,
description="Name of the source entity. This entity must already exist in the knowledge graph.",
min_length=1,
examples=["Elon Musk"],
)
target_entity: str = Field(
...,
description="Name of the target entity. This entity must already exist in the knowledge graph.",
min_length=1,
examples=["Tesla"],
)
relation_data: Dict[str, Any] = Field(
...,
description="Dictionary containing relationship properties. Common fields include 'description', 'keywords', and 'weight'.",
examples=[
{
"description": "Elon Musk is the CEO of Tesla",
"keywords": "CEO, founder",
"weight": 1.0,
}
],
)
def create_graph_routes(rag, api_key: Optional[str] = None):
combined_auth = get_combined_auth_dependency(api_key)
@ -225,4 +285,247 @@ def create_graph_routes(rag, api_key: Optional[str] = None):
status_code=500, detail=f"Error updating relation: {str(e)}"
)
@router.post("/graph/entity/create", dependencies=[Depends(combined_auth)])
async def create_entity(request: EntityCreateRequest):
"""
Create a new entity in the knowledge graph
This endpoint creates a new entity node in the knowledge graph with the specified
properties. The system automatically generates vector embeddings for the entity
to enable semantic search and retrieval.
Request Body:
entity_name (str): Unique name identifier for the entity
entity_data (dict): Entity properties including:
- description (str): Textual description of the entity
- entity_type (str): Category/type of the entity (e.g., PERSON, ORGANIZATION, LOCATION)
- source_id (str): Related chunk_id from which the description originates
- Additional custom properties as needed
Response Schema:
{
"status": "success",
"message": "Entity 'Tesla' created successfully",
"data": {
"entity_name": "Tesla",
"description": "Electric vehicle manufacturer",
"entity_type": "ORGANIZATION",
"source_id": "chunk-123<SEP>chunk-456"
... (other entity properties)
}
}
HTTP Status Codes:
200: Entity created successfully
400: Invalid request (e.g., missing required fields, duplicate entity)
500: Internal server error
Example Request:
POST /graph/entity/create
{
"entity_name": "Tesla",
"entity_data": {
"description": "Electric vehicle manufacturer",
"entity_type": "ORGANIZATION"
}
}
"""
try:
# Use the proper acreate_entity method which handles:
# - Graph lock for concurrency
# - Vector embedding creation in entities_vdb
# - Metadata population and defaults
# - Index consistency via _edit_entity_done
result = await rag.acreate_entity(
entity_name=request.entity_name,
entity_data=request.entity_data,
)
return {
"status": "success",
"message": f"Entity '{request.entity_name}' created successfully",
"data": result,
}
except ValueError as ve:
logger.error(
f"Validation error creating entity '{request.entity_name}': {str(ve)}"
)
raise HTTPException(status_code=400, detail=str(ve))
except Exception as e:
logger.error(f"Error creating entity '{request.entity_name}': {str(e)}")
logger.error(traceback.format_exc())
raise HTTPException(
status_code=500, detail=f"Error creating entity: {str(e)}"
)
@router.post("/graph/relation/create", dependencies=[Depends(combined_auth)])
async def create_relation(request: RelationCreateRequest):
"""
Create a new relationship between two entities in the knowledge graph
This endpoint establishes an undirected relationship between two existing entities.
The provided source/target order is accepted for convenience, but the backend
stored edge is undirected and may be returned with the entities swapped.
Both entities must already exist in the knowledge graph. The system automatically
generates vector embeddings for the relationship to enable semantic search and graph traversal.
Prerequisites:
- Both source_entity and target_entity must exist in the knowledge graph
- Use /graph/entity/create to create entities first if they don't exist
Request Body:
source_entity (str): Name of the source entity (relationship origin)
target_entity (str): Name of the target entity (relationship destination)
relation_data (dict): Relationship properties including:
- description (str): Textual description of the relationship
- keywords (str): Comma-separated keywords describing the relationship type
- source_id (str): Related chunk_id from which the description originates
- weight (float): Relationship strength/importance (default: 1.0)
- Additional custom properties as needed
Response Schema:
{
"status": "success",
"message": "Relation created successfully between 'Elon Musk' and 'Tesla'",
"data": {
"src_id": "Elon Musk",
"tgt_id": "Tesla",
"description": "Elon Musk is the CEO of Tesla",
"keywords": "CEO, founder",
"source_id": "chunk-123<SEP>chunk-456"
"weight": 1.0,
... (other relationship properties)
}
}
HTTP Status Codes:
200: Relationship created successfully
400: Invalid request (e.g., missing entities, invalid data, duplicate relationship)
500: Internal server error
Example Request:
POST /graph/relation/create
{
"source_entity": "Elon Musk",
"target_entity": "Tesla",
"relation_data": {
"description": "Elon Musk is the CEO of Tesla",
"keywords": "CEO, founder",
"weight": 1.0
}
}
"""
try:
# Use the proper acreate_relation method which handles:
# - Graph lock for concurrency
# - Entity existence validation
# - Duplicate relation checks
# - Vector embedding creation in relationships_vdb
# - Index consistency via _edit_relation_done
result = await rag.acreate_relation(
source_entity=request.source_entity,
target_entity=request.target_entity,
relation_data=request.relation_data,
)
return {
"status": "success",
"message": f"Relation created successfully between '{request.source_entity}' and '{request.target_entity}'",
"data": result,
}
except ValueError as ve:
logger.error(
f"Validation error creating relation between '{request.source_entity}' and '{request.target_entity}': {str(ve)}"
)
raise HTTPException(status_code=400, detail=str(ve))
except Exception as e:
logger.error(
f"Error creating relation between '{request.source_entity}' and '{request.target_entity}': {str(e)}"
)
logger.error(traceback.format_exc())
raise HTTPException(
status_code=500, detail=f"Error creating relation: {str(e)}"
)
@router.post("/graph/entities/merge", dependencies=[Depends(combined_auth)])
async def merge_entities(request: EntityMergeRequest):
"""
Merge multiple entities into a single entity, preserving all relationships
This endpoint consolidates duplicate or misspelled entities while preserving the entire
graph structure. It's particularly useful for cleaning up knowledge graphs after document
processing or correcting entity name variations.
What the Merge Operation Does:
1. Deletes the specified source entities from the knowledge graph
2. Transfers all relationships from source entities to the target entity
3. Intelligently merges duplicate relationships (if multiple sources have the same relationship)
4. Updates vector embeddings for accurate retrieval and search
5. Preserves the complete graph structure and connectivity
6. Maintains relationship properties and metadata
Use Cases:
- Fixing spelling errors in entity names (e.g., "Elon Msk" -> "Elon Musk")
- Consolidating duplicate entities discovered after document processing
- Merging name variations (e.g., "NY", "New York", "New York City")
- Cleaning up the knowledge graph for better query performance
- Standardizing entity names across the knowledge base
Request Body:
entities_to_change (list[str]): List of entity names to be merged and deleted
entity_to_change_into (str): Target entity that will receive all relationships
Response Schema:
{
"status": "success",
"message": "Successfully merged 2 entities into 'Elon Musk'",
"data": {
"merged_entity": "Elon Musk",
"deleted_entities": ["Elon Msk", "Ellon Musk"],
"relationships_transferred": 15,
... (merge operation details)
}
}
HTTP Status Codes:
200: Entities merged successfully
400: Invalid request (e.g., empty entity list, target entity doesn't exist)
500: Internal server error
Example Request:
POST /graph/entities/merge
{
"entities_to_change": ["Elon Msk", "Ellon Musk"],
"entity_to_change_into": "Elon Musk"
}
Note:
- The target entity (entity_to_change_into) must exist in the knowledge graph
- Source entities will be permanently deleted after the merge
- This operation cannot be undone, so verify entity names before merging
"""
try:
result = await rag.amerge_entities(
source_entities=request.entities_to_change,
target_entity=request.entity_to_change_into,
)
return {
"status": "success",
"message": f"Successfully merged {len(request.entities_to_change)} entities into '{request.entity_to_change_into}'",
"data": result,
}
except ValueError as ve:
logger.error(
f"Validation error merging entities {request.entities_to_change} into '{request.entity_to_change_into}': {str(ve)}"
)
raise HTTPException(status_code=400, detail=str(ve))
except Exception as e:
logger.error(
f"Error merging entities {request.entities_to_change} into '{request.entity_to_change_into}': {str(e)}"
)
logger.error(traceback.format_exc())
raise HTTPException(
status_code=500, detail=f"Error merging entities: {str(e)}"
)
return router

View file

@ -483,6 +483,12 @@ class OllamaAPI:
if not messages:
raise HTTPException(status_code=400, detail="No messages provided")
# Validate that the last message is from a user
if messages[-1].role != "user":
raise HTTPException(
status_code=400, detail="Last message must be from user role"
)
# Get the last message as query and previous messages as history
query = messages[-1].content
# Convert OllamaMessage objects to dictionaries
@ -499,7 +505,7 @@ class OllamaAPI:
prompt_tokens = estimate_tokens(cleaned_query)
param_dict = {
"mode": mode,
"mode": mode.value,
"stream": request.stream,
"only_need_context": only_need_context,
"conversation_history": conversation_history,

View file

@ -73,6 +73,16 @@ class QueryRequest(BaseModel):
ge=1,
)
hl_keywords: list[str] = Field(
default_factory=list,
description="List of high-level keywords to prioritize in retrieval. Leave empty to use the LLM to generate the keywords.",
)
ll_keywords: list[str] = Field(
default_factory=list,
description="List of low-level keywords to refine retrieval focus. Leave empty to use the LLM to generate the keywords.",
)
conversation_history: Optional[List[Dict[str, Any]]] = Field(
default=None,
description="Stores past conversation history to maintain context. Format: [{'role': 'user/assistant', 'content': 'message'}].",
@ -88,6 +98,16 @@ class QueryRequest(BaseModel):
description="Enable reranking for retrieved text chunks. If True but no rerank model is configured, a warning will be issued. Default is True.",
)
include_references: Optional[bool] = Field(
default=True,
description="If True, includes reference list in responses. Affects /query and /query/stream endpoints. /query/data always includes references.",
)
stream: Optional[bool] = Field(
default=True,
description="If True, enables streaming output for real-time responses. Only affects /query/stream endpoint.",
)
@field_validator("query", mode="after")
@classmethod
def query_strip_after(cls, query: str) -> str:
@ -101,10 +121,10 @@ class QueryRequest(BaseModel):
if conversation_history is None:
return None
for msg in conversation_history:
if "role" not in msg or msg["role"] not in {"user", "assistant"}:
raise ValueError(
"Each message must have a 'role' key with value 'user' or 'assistant'."
)
if "role" not in msg:
raise ValueError("Each message must have a 'role' key.")
if not isinstance(msg["role"], str) or not msg["role"].strip():
raise ValueError("Each message 'role' must be a non-empty string.")
return conversation_history
def to_query_params(self, is_stream: bool) -> "QueryParam":
@ -122,6 +142,10 @@ class QueryResponse(BaseModel):
response: str = Field(
description="The generated response",
)
references: Optional[List[Dict[str, str]]] = Field(
default=None,
description="Reference list (Disabled when include_references=False, /query/data always includes references.)",
)
class QueryDataResponse(BaseModel):
@ -135,78 +159,473 @@ class QueryDataResponse(BaseModel):
)
class StreamChunkResponse(BaseModel):
"""Response model for streaming chunks in NDJSON format"""
references: Optional[List[Dict[str, str]]] = Field(
default=None,
description="Reference list (only in first chunk when include_references=True)",
)
response: Optional[str] = Field(
default=None, description="Response content chunk or complete response"
)
error: Optional[str] = Field(
default=None, description="Error message if processing fails"
)
def create_query_routes(rag, api_key: Optional[str] = None, top_k: int = 60):
combined_auth = get_combined_auth_dependency(api_key)
@router.post(
"/query", response_model=QueryResponse, dependencies=[Depends(combined_auth)]
"/query",
response_model=QueryResponse,
dependencies=[Depends(combined_auth)],
responses={
200: {
"description": "Successful RAG query response",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"response": {
"type": "string",
"description": "The generated response from the RAG system",
},
"references": {
"type": "array",
"items": {
"type": "object",
"properties": {
"reference_id": {"type": "string"},
"file_path": {"type": "string"},
},
},
"description": "Reference list (only included when include_references=True)",
},
},
"required": ["response"],
},
"examples": {
"with_references": {
"summary": "Response with references",
"description": "Example response when include_references=True",
"value": {
"response": "Artificial Intelligence (AI) is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence, such as learning, reasoning, and problem-solving.",
"references": [
{
"reference_id": "1",
"file_path": "/documents/ai_overview.pdf",
},
{
"reference_id": "2",
"file_path": "/documents/machine_learning.txt",
},
],
},
},
"without_references": {
"summary": "Response without references",
"description": "Example response when include_references=False",
"value": {
"response": "Artificial Intelligence (AI) is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence, such as learning, reasoning, and problem-solving."
},
},
"different_modes": {
"summary": "Different query modes",
"description": "Examples of responses from different query modes",
"value": {
"local_mode": "Focuses on specific entities and their relationships",
"global_mode": "Provides broader context from relationship patterns",
"hybrid_mode": "Combines local and global approaches",
"naive_mode": "Simple vector similarity search",
"mix_mode": "Integrates knowledge graph and vector retrieval",
},
},
},
}
},
},
400: {
"description": "Bad Request - Invalid input parameters",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {"detail": {"type": "string"}},
},
"example": {
"detail": "Query text must be at least 3 characters long"
},
}
},
},
500: {
"description": "Internal Server Error - Query processing failed",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {"detail": {"type": "string"}},
},
"example": {
"detail": "Failed to process query: LLM service unavailable"
},
}
},
},
},
)
async def query_text(request: QueryRequest):
"""
Handle a POST request at the /query endpoint to process user queries using RAG capabilities.
Comprehensive RAG query endpoint with non-streaming response. Parameter "stream" is ignored.
This endpoint performs Retrieval-Augmented Generation (RAG) queries using various modes
to provide intelligent responses based on your knowledge base.
**Query Modes:**
- **local**: Focuses on specific entities and their direct relationships
- **global**: Analyzes broader patterns and relationships across the knowledge graph
- **hybrid**: Combines local and global approaches for comprehensive results
- **naive**: Simple vector similarity search without knowledge graph
- **mix**: Integrates knowledge graph retrieval with vector search (recommended)
- **bypass**: Direct LLM query without knowledge retrieval
conversation_history parameteris sent to LLM only, does not affect retrieval results.
**Usage Examples:**
Basic query:
```json
{
"query": "What is machine learning?",
"mode": "mix"
}
```
Bypass initial LLM call by providing high-level and low-level keywords:
```json
{
"query": "What is Retrieval-Augmented-Generation?",
"hl_keywords": ["machine learning", "information retrieval", "natural language processing"],
"ll_keywords": ["retrieval augmented generation", "RAG", "knowledge base"],
"mode": "mix"
}
```
Advanced query with references:
```json
{
"query": "Explain neural networks",
"mode": "hybrid",
"include_references": true,
"response_type": "Multiple Paragraphs",
"top_k": 10
}
```
Conversation with history:
```json
{
"query": "Can you give me more details?",
"conversation_history": [
{"role": "user", "content": "What is AI?"},
{"role": "assistant", "content": "AI is artificial intelligence..."}
]
}
```
Args:
request (QueryRequest): The request object containing query parameters:
- **query**: The question or prompt to process (min 3 characters)
- **mode**: Query strategy - "mix" recommended for best results
- **include_references**: Whether to include source citations
- **response_type**: Format preference (e.g., "Multiple Paragraphs")
- **top_k**: Number of top entities/relations to retrieve
- **conversation_history**: Previous dialogue context
- **max_total_tokens**: Token budget for the entire response
Parameters:
request (QueryRequest): The request object containing the query parameters.
Returns:
QueryResponse: A Pydantic model containing the result of the query processing.
If a string is returned (e.g., cache hit), it's directly returned.
Otherwise, an async generator may be used to build the response.
QueryResponse: JSON response containing:
- **response**: The generated answer to your query
- **references**: Source citations (if include_references=True)
Raises:
HTTPException: Raised when an error occurs during the request handling process,
with status code 500 and detail containing the exception message.
HTTPException:
- 400: Invalid input parameters (e.g., query too short)
- 500: Internal processing error (e.g., LLM service unavailable)
"""
try:
param = request.to_query_params(False)
response = await rag.aquery(request.query, param=param)
param = request.to_query_params(
False
) # Ensure stream=False for non-streaming endpoint
# Force stream=False for /query endpoint regardless of include_references setting
param.stream = False
# If response is a string (e.g. cache hit), return directly
if isinstance(response, str):
return QueryResponse(response=response)
# Unified approach: always use aquery_llm for both cases
result = await rag.aquery_llm(request.query, param=param)
if isinstance(response, dict):
result = json.dumps(response, indent=2)
return QueryResponse(response=result)
# Extract LLM response and references from unified result
llm_response = result.get("llm_response", {})
references = result.get("data", {}).get("references", [])
# Get the non-streaming response content
response_content = llm_response.get("content", "")
if not response_content:
response_content = "No relevant context found for the query."
# Return response with or without references based on request
if request.include_references:
return QueryResponse(response=response_content, references=references)
else:
return QueryResponse(response=str(response))
return QueryResponse(response=response_content, references=None)
except Exception as e:
trace_exception(e)
raise HTTPException(status_code=500, detail=str(e))
@router.post("/query/stream", dependencies=[Depends(combined_auth)])
@router.post(
"/query/stream",
dependencies=[Depends(combined_auth)],
responses={
200: {
"description": "Flexible RAG query response - format depends on stream parameter",
"content": {
"application/x-ndjson": {
"schema": {
"type": "string",
"format": "ndjson",
"description": "Newline-delimited JSON (NDJSON) format used for both streaming and non-streaming responses. For streaming: multiple lines with separate JSON objects. For non-streaming: single line with complete JSON object.",
"example": '{"references": [{"reference_id": "1", "file_path": "/documents/ai.pdf"}]}\n{"response": "Artificial Intelligence is"}\n{"response": " a field of computer science"}\n{"response": " that focuses on creating intelligent machines."}',
},
"examples": {
"streaming_with_references": {
"summary": "Streaming mode with references (stream=true)",
"description": "Multiple NDJSON lines when stream=True and include_references=True. First line contains references, subsequent lines contain response chunks.",
"value": '{"references": [{"reference_id": "1", "file_path": "/documents/ai_overview.pdf"}, {"reference_id": "2", "file_path": "/documents/ml_basics.txt"}]}\n{"response": "Artificial Intelligence (AI) is a branch of computer science"}\n{"response": " that aims to create intelligent machines capable of performing"}\n{"response": " tasks that typically require human intelligence, such as learning,"}\n{"response": " reasoning, and problem-solving."}',
},
"streaming_without_references": {
"summary": "Streaming mode without references (stream=true)",
"description": "Multiple NDJSON lines when stream=True and include_references=False. Only response chunks are sent.",
"value": '{"response": "Machine learning is a subset of artificial intelligence"}\n{"response": " that enables computers to learn and improve from experience"}\n{"response": " without being explicitly programmed for every task."}',
},
"non_streaming_with_references": {
"summary": "Non-streaming mode with references (stream=false)",
"description": "Single NDJSON line when stream=False and include_references=True. Complete response with references in one message.",
"value": '{"references": [{"reference_id": "1", "file_path": "/documents/neural_networks.pdf"}], "response": "Neural networks are computational models inspired by biological neural networks that consist of interconnected nodes (neurons) organized in layers. They are fundamental to deep learning and can learn complex patterns from data through training processes."}',
},
"non_streaming_without_references": {
"summary": "Non-streaming mode without references (stream=false)",
"description": "Single NDJSON line when stream=False and include_references=False. Complete response only.",
"value": '{"response": "Deep learning is a subset of machine learning that uses neural networks with multiple layers (hence deep) to model and understand complex patterns in data. It has revolutionized fields like computer vision, natural language processing, and speech recognition."}',
},
"error_response": {
"summary": "Error during streaming",
"description": "Error handling in NDJSON format when an error occurs during processing.",
"value": '{"references": [{"reference_id": "1", "file_path": "/documents/ai.pdf"}]}\n{"response": "Artificial Intelligence is"}\n{"error": "LLM service temporarily unavailable"}',
},
},
}
},
},
400: {
"description": "Bad Request - Invalid input parameters",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {"detail": {"type": "string"}},
},
"example": {
"detail": "Query text must be at least 3 characters long"
},
}
},
},
500: {
"description": "Internal Server Error - Query processing failed",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {"detail": {"type": "string"}},
},
"example": {
"detail": "Failed to process streaming query: Knowledge graph unavailable"
},
}
},
},
},
)
async def query_text_stream(request: QueryRequest):
"""
This endpoint performs a retrieval-augmented generation (RAG) query and streams the response.
Advanced RAG query endpoint with flexible streaming response.
This endpoint provides the most flexible querying experience, supporting both real-time streaming
and complete response delivery based on your integration needs.
**Response Modes:**
- Real-time response delivery as content is generated
- NDJSON format: each line is a separate JSON object
- First line: `{"references": [...]}` (if include_references=True)
- Subsequent lines: `{"response": "content chunk"}`
- Error handling: `{"error": "error message"}`
> If stream parameter is False, or the query hit LLM cache, complete response delivered in a single streaming message.
**Response Format Details**
- **Content-Type**: `application/x-ndjson` (Newline-Delimited JSON)
- **Structure**: Each line is an independent, valid JSON object
- **Parsing**: Process line-by-line, each line is self-contained
- **Headers**: Includes cache control and connection management
**Query Modes (same as /query endpoint)**
- **local**: Entity-focused retrieval with direct relationships
- **global**: Pattern analysis across the knowledge graph
- **hybrid**: Combined local and global strategies
- **naive**: Vector similarity search only
- **mix**: Integrated knowledge graph + vector retrieval (recommended)
- **bypass**: Direct LLM query without knowledge retrieval
conversation_history parameteris sent to LLM only, does not affect retrieval results.
**Usage Examples**
Real-time streaming query:
```json
{
"query": "Explain machine learning algorithms",
"mode": "mix",
"stream": true,
"include_references": true
}
```
Bypass initial LLM call by providing high-level and low-level keywords:
```json
{
"query": "What is Retrieval-Augmented-Generation?",
"hl_keywords": ["machine learning", "information retrieval", "natural language processing"],
"ll_keywords": ["retrieval augmented generation", "RAG", "knowledge base"],
"mode": "mix"
}
```
Complete response query:
```json
{
"query": "What is deep learning?",
"mode": "hybrid",
"stream": false,
"response_type": "Multiple Paragraphs"
}
```
Conversation with context:
```json
{
"query": "Can you elaborate on that?",
"stream": true,
"conversation_history": [
{"role": "user", "content": "What is neural network?"},
{"role": "assistant", "content": "A neural network is..."}
]
}
```
**Response Processing:**
```python
async for line in response.iter_lines():
data = json.loads(line)
if "references" in data:
# Handle references (first message)
references = data["references"]
if "response" in data:
# Handle content chunk
content_chunk = data["response"]
if "error" in data:
# Handle error
error_message = data["error"]
```
**Error Handling:**
- Streaming errors are delivered as `{"error": "message"}` lines
- Non-streaming errors raise HTTP exceptions
- Partial responses may be delivered before errors in streaming mode
- Always check for error objects when processing streaming responses
Args:
request (QueryRequest): The request object containing the query parameters.
optional_api_key (Optional[str], optional): An optional API key for authentication. Defaults to None.
request (QueryRequest): The request object containing query parameters:
- **query**: The question or prompt to process (min 3 characters)
- **mode**: Query strategy - "mix" recommended for best results
- **stream**: Enable streaming (True) or complete response (False)
- **include_references**: Whether to include source citations
- **response_type**: Format preference (e.g., "Multiple Paragraphs")
- **top_k**: Number of top entities/relations to retrieve
- **conversation_history**: Previous dialogue context for multi-turn conversations
- **max_total_tokens**: Token budget for the entire response
Returns:
StreamingResponse: A streaming response containing the RAG query results.
StreamingResponse: NDJSON streaming response containing:
- **Streaming mode**: Multiple JSON objects, one per line
- References object (if requested): `{"references": [...]}`
- Content chunks: `{"response": "chunk content"}`
- Error objects: `{"error": "error message"}`
- **Non-streaming mode**: Single JSON object
- Complete response: `{"references": [...], "response": "complete content"}`
Raises:
HTTPException:
- 400: Invalid input parameters (e.g., query too short, invalid mode)
- 500: Internal processing error (e.g., LLM service unavailable)
Note:
This endpoint is ideal for applications requiring flexible response delivery.
Use streaming mode for real-time interfaces and non-streaming for batch processing.
"""
try:
param = request.to_query_params(True)
response = await rag.aquery(request.query, param=param)
# Use the stream parameter from the request, defaulting to True if not specified
stream_mode = request.stream if request.stream is not None else True
param = request.to_query_params(stream_mode)
from fastapi.responses import StreamingResponse
# Unified approach: always use aquery_llm for all cases
result = await rag.aquery_llm(request.query, param=param)
async def stream_generator():
if isinstance(response, str):
# If it's a string, send it all at once
yield f"{json.dumps({'response': response})}\n"
elif response is None:
# Handle None response (e.g., when only_need_context=True but no context found)
yield f"{json.dumps({'response': 'No relevant context found for the query.'})}\n"
# Extract references and LLM response from unified result
references = result.get("data", {}).get("references", [])
llm_response = result.get("llm_response", {})
if llm_response.get("is_streaming"):
# Streaming mode: send references first, then stream response chunks
if request.include_references:
yield f"{json.dumps({'references': references})}\n"
response_stream = llm_response.get("response_iterator")
if response_stream:
try:
async for chunk in response_stream:
if chunk: # Only send non-empty content
yield f"{json.dumps({'response': chunk})}\n"
except Exception as e:
logging.error(f"Streaming error: {str(e)}")
yield f"{json.dumps({'error': str(e)})}\n"
else:
# If it's an async generator, send chunks one by one
try:
async for chunk in response:
if chunk: # Only send non-empty content
yield f"{json.dumps({'response': chunk})}\n"
except Exception as e:
logging.error(f"Streaming error: {str(e)}")
yield f"{json.dumps({'error': str(e)})}\n"
# Non-streaming mode: send complete response in one message
response_content = llm_response.get("content", "")
if not response_content:
response_content = "No relevant context found for the query."
# Create complete response object
complete_response = {"response": response_content}
if request.include_references:
complete_response["references"] = references
yield f"{json.dumps(complete_response)}\n"
return StreamingResponse(
stream_generator(),
@ -226,26 +645,400 @@ def create_query_routes(rag, api_key: Optional[str] = None, top_k: int = 60):
"/query/data",
response_model=QueryDataResponse,
dependencies=[Depends(combined_auth)],
responses={
200: {
"description": "Successful data retrieval response with structured RAG data",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["success", "failure"],
"description": "Query execution status",
},
"message": {
"type": "string",
"description": "Status message describing the result",
},
"data": {
"type": "object",
"properties": {
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"entity_name": {"type": "string"},
"entity_type": {"type": "string"},
"description": {"type": "string"},
"source_id": {"type": "string"},
"file_path": {"type": "string"},
"reference_id": {"type": "string"},
},
},
"description": "Retrieved entities from knowledge graph",
},
"relationships": {
"type": "array",
"items": {
"type": "object",
"properties": {
"src_id": {"type": "string"},
"tgt_id": {"type": "string"},
"description": {"type": "string"},
"keywords": {"type": "string"},
"weight": {"type": "number"},
"source_id": {"type": "string"},
"file_path": {"type": "string"},
"reference_id": {"type": "string"},
},
},
"description": "Retrieved relationships from knowledge graph",
},
"chunks": {
"type": "array",
"items": {
"type": "object",
"properties": {
"content": {"type": "string"},
"file_path": {"type": "string"},
"chunk_id": {"type": "string"},
"reference_id": {"type": "string"},
},
},
"description": "Retrieved text chunks from vector database",
},
"references": {
"type": "array",
"items": {
"type": "object",
"properties": {
"reference_id": {"type": "string"},
"file_path": {"type": "string"},
},
},
"description": "Reference list for citation purposes",
},
},
"description": "Structured retrieval data containing entities, relationships, chunks, and references",
},
"metadata": {
"type": "object",
"properties": {
"query_mode": {"type": "string"},
"keywords": {
"type": "object",
"properties": {
"high_level": {
"type": "array",
"items": {"type": "string"},
},
"low_level": {
"type": "array",
"items": {"type": "string"},
},
},
},
"processing_info": {
"type": "object",
"properties": {
"total_entities_found": {
"type": "integer"
},
"total_relations_found": {
"type": "integer"
},
"entities_after_truncation": {
"type": "integer"
},
"relations_after_truncation": {
"type": "integer"
},
"final_chunks_count": {
"type": "integer"
},
},
},
},
"description": "Query metadata including mode, keywords, and processing information",
},
},
"required": ["status", "message", "data", "metadata"],
},
"examples": {
"successful_local_mode": {
"summary": "Local mode data retrieval",
"description": "Example of structured data from local mode query focusing on specific entities",
"value": {
"status": "success",
"message": "Query executed successfully",
"data": {
"entities": [
{
"entity_name": "Neural Networks",
"entity_type": "CONCEPT",
"description": "Computational models inspired by biological neural networks",
"source_id": "chunk-123",
"file_path": "/documents/ai_basics.pdf",
"reference_id": "1",
}
],
"relationships": [
{
"src_id": "Neural Networks",
"tgt_id": "Machine Learning",
"description": "Neural networks are a subset of machine learning algorithms",
"keywords": "subset, algorithm, learning",
"weight": 0.85,
"source_id": "chunk-123",
"file_path": "/documents/ai_basics.pdf",
"reference_id": "1",
}
],
"chunks": [
{
"content": "Neural networks are computational models that mimic the way biological neural networks work...",
"file_path": "/documents/ai_basics.pdf",
"chunk_id": "chunk-123",
"reference_id": "1",
}
],
"references": [
{
"reference_id": "1",
"file_path": "/documents/ai_basics.pdf",
}
],
},
"metadata": {
"query_mode": "local",
"keywords": {
"high_level": ["neural", "networks"],
"low_level": [
"computation",
"model",
"algorithm",
],
},
"processing_info": {
"total_entities_found": 5,
"total_relations_found": 3,
"entities_after_truncation": 1,
"relations_after_truncation": 1,
"final_chunks_count": 1,
},
},
},
},
"global_mode": {
"summary": "Global mode data retrieval",
"description": "Example of structured data from global mode query analyzing broader patterns",
"value": {
"status": "success",
"message": "Query executed successfully",
"data": {
"entities": [],
"relationships": [
{
"src_id": "Artificial Intelligence",
"tgt_id": "Machine Learning",
"description": "AI encompasses machine learning as a core component",
"keywords": "encompasses, component, field",
"weight": 0.92,
"source_id": "chunk-456",
"file_path": "/documents/ai_overview.pdf",
"reference_id": "2",
}
],
"chunks": [],
"references": [
{
"reference_id": "2",
"file_path": "/documents/ai_overview.pdf",
}
],
},
"metadata": {
"query_mode": "global",
"keywords": {
"high_level": [
"artificial",
"intelligence",
"overview",
],
"low_level": [],
},
},
},
},
"naive_mode": {
"summary": "Naive mode data retrieval",
"description": "Example of structured data from naive mode using only vector search",
"value": {
"status": "success",
"message": "Query executed successfully",
"data": {
"entities": [],
"relationships": [],
"chunks": [
{
"content": "Deep learning is a subset of machine learning that uses neural networks with multiple layers...",
"file_path": "/documents/deep_learning.pdf",
"chunk_id": "chunk-789",
"reference_id": "3",
}
],
"references": [
{
"reference_id": "3",
"file_path": "/documents/deep_learning.pdf",
}
],
},
"metadata": {
"query_mode": "naive",
"keywords": {"high_level": [], "low_level": []},
},
},
},
},
}
},
},
400: {
"description": "Bad Request - Invalid input parameters",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {"detail": {"type": "string"}},
},
"example": {
"detail": "Query text must be at least 3 characters long"
},
}
},
},
500: {
"description": "Internal Server Error - Data retrieval failed",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {"detail": {"type": "string"}},
},
"example": {
"detail": "Failed to retrieve data: Knowledge graph unavailable"
},
}
},
},
},
)
async def query_data(request: QueryRequest):
"""
Retrieve structured data without LLM generation.
Advanced data retrieval endpoint for structured RAG analysis.
This endpoint returns raw retrieval results including entities, relationships,
and text chunks that would be used for RAG, but without generating a final response.
All parameters are compatible with the regular /query endpoint.
This endpoint provides raw retrieval results without LLM generation, perfect for:
- **Data Analysis**: Examine what information would be used for RAG
- **System Integration**: Get structured data for custom processing
- **Debugging**: Understand retrieval behavior and quality
- **Research**: Analyze knowledge graph structure and relationships
Parameters:
request (QueryRequest): The request object containing the query parameters.
**Key Features:**
- No LLM generation - pure data retrieval
- Complete structured output with entities, relationships, and chunks
- Always includes references for citation
- Detailed metadata about processing and keywords
- Compatible with all query modes and parameters
**Query Mode Behaviors:**
- **local**: Returns entities and their direct relationships + related chunks
- **global**: Returns relationship patterns across the knowledge graph
- **hybrid**: Combines local and global retrieval strategies
- **naive**: Returns only vector-retrieved text chunks (no knowledge graph)
- **mix**: Integrates knowledge graph data with vector-retrieved chunks
- **bypass**: Returns empty data arrays (used for direct LLM queries)
**Data Structure:**
- **entities**: Knowledge graph entities with descriptions and metadata
- **relationships**: Connections between entities with weights and descriptions
- **chunks**: Text segments from documents with source information
- **references**: Citation information mapping reference IDs to file paths
- **metadata**: Processing information, keywords, and query statistics
**Usage Examples:**
Analyze entity relationships:
```json
{
"query": "machine learning algorithms",
"mode": "local",
"top_k": 10
}
```
Explore global patterns:
```json
{
"query": "artificial intelligence trends",
"mode": "global",
"max_relation_tokens": 2000
}
```
Vector similarity search:
```json
{
"query": "neural network architectures",
"mode": "naive",
"chunk_top_k": 5
}
```
Bypass initial LLM call by providing high-level and low-level keywords:
```json
{
"query": "What is Retrieval-Augmented-Generation?",
"hl_keywords": ["machine learning", "information retrieval", "natural language processing"],
"ll_keywords": ["retrieval augmented generation", "RAG", "knowledge base"],
"mode": "mix"
}
```
**Response Analysis:**
- **Empty arrays**: Normal for certain modes (e.g., naive mode has no entities/relationships)
- **Processing info**: Shows retrieval statistics and token usage
- **Keywords**: High-level and low-level keywords extracted from query
- **Reference mapping**: Links all data back to source documents
Args:
request (QueryRequest): The request object containing query parameters:
- **query**: The search query to analyze (min 3 characters)
- **mode**: Retrieval strategy affecting data types returned
- **top_k**: Number of top entities/relationships to retrieve
- **chunk_top_k**: Number of text chunks to retrieve
- **max_entity_tokens**: Token limit for entity context
- **max_relation_tokens**: Token limit for relationship context
- **max_total_tokens**: Overall token budget for retrieval
Returns:
QueryDataResponse: A Pydantic model containing structured data with status,
message, data (entities, relationships, chunks, references),
and metadata.
QueryDataResponse: Structured JSON response containing:
- **status**: "success" or "failure"
- **message**: Human-readable status description
- **data**: Complete retrieval results with entities, relationships, chunks, references
- **metadata**: Query processing information and statistics
Raises:
HTTPException: Raised when an error occurs during the request handling process,
with status code 500 and detail containing the exception message.
HTTPException:
- 400: Invalid input parameters (e.g., query too short, invalid mode)
- 500: Internal processing error (e.g., knowledge graph unavailable)
Note:
This endpoint always includes references regardless of the include_references parameter,
as structured data analysis typically requires source attribution.
"""
try:
param = request.to_query_params(False) # No streaming for data endpoint

View file

@ -1 +0,0 @@
import{e as o,c as l,g as b,k as O,h as P,j as p,l as w,m as c,n as v,t as A,o as N}from"./_baseUniq-BcN6yDOS.js";import{a_ as g,aw as _,a$ as $,b0 as E,b1 as F,b2 as x,b3 as M,b4 as y,b5 as B,b6 as T}from"./mermaid-vendor-DB8JVoWC.js";var S=/\s/;function G(n){for(var r=n.length;r--&&S.test(n.charAt(r)););return r}var H=/^\s+/;function L(n){return n&&n.slice(0,G(n)+1).replace(H,"")}var m=NaN,R=/^[-+]0x[0-9a-f]+$/i,q=/^0b[01]+$/i,z=/^0o[0-7]+$/i,C=parseInt;function K(n){if(typeof n=="number")return n;if(o(n))return m;if(g(n)){var r=typeof n.valueOf=="function"?n.valueOf():n;n=g(r)?r+"":r}if(typeof n!="string")return n===0?n:+n;n=L(n);var t=q.test(n);return t||z.test(n)?C(n.slice(2),t?2:8):R.test(n)?m:+n}var W=1/0,X=17976931348623157e292;function Y(n){if(!n)return n===0?n:0;if(n=K(n),n===W||n===-1/0){var r=n<0?-1:1;return r*X}return n===n?n:0}function D(n){var r=Y(n),t=r%1;return r===r?t?r-t:r:0}function fn(n){var r=n==null?0:n.length;return r?l(n):[]}var I=Object.prototype,J=I.hasOwnProperty,dn=_(function(n,r){n=Object(n);var t=-1,e=r.length,i=e>2?r[2]:void 0;for(i&&$(r[0],r[1],i)&&(e=1);++t<e;)for(var f=r[t],a=E(f),s=-1,d=a.length;++s<d;){var u=a[s],h=n[u];(h===void 0||F(h,I[u])&&!J.call(n,u))&&(n[u]=f[u])}return n});function un(n){var r=n==null?0:n.length;return r?n[r-1]:void 0}function Q(n){return function(r,t,e){var i=Object(r);if(!x(r)){var f=b(t);r=O(r),t=function(s){return f(i[s],s,i)}}var a=n(r,t,e);return a>-1?i[f?r[a]:a]:void 0}}var U=Math.max;function Z(n,r,t){var e=n==null?0:n.length;if(!e)return-1;var i=t==null?0:D(t);return i<0&&(i=U(e+i,0)),P(n,b(r),i)}var hn=Q(Z);function V(n,r){var t=-1,e=x(n)?Array(n.length):[];return p(n,function(i,f,a){e[++t]=r(i,f,a)}),e}function gn(n,r){var t=M(n)?w:V;return t(n,b(r))}var j=Object.prototype,k=j.hasOwnProperty;function nn(n,r){return n!=null&&k.call(n,r)}function bn(n,r){return n!=null&&c(n,r,nn)}function rn(n,r){return n<r}function tn(n,r,t){for(var e=-1,i=n.length;++e<i;){var f=n[e],a=r(f);if(a!=null&&(s===void 0?a===a&&!o(a):t(a,s)))var s=a,d=f}return d}function mn(n){return n&&n.length?tn(n,y,rn):void 0}function an(n,r,t,e){if(!g(n))return n;r=v(r,n);for(var i=-1,f=r.length,a=f-1,s=n;s!=null&&++i<f;){var d=A(r[i]),u=t;if(d==="__proto__"||d==="constructor"||d==="prototype")return n;if(i!=a){var h=s[d];u=void 0,u===void 0&&(u=g(h)?h:B(r[i+1])?[]:{})}T(s,d,u),s=s[d]}return n}function on(n,r,t){for(var e=-1,i=r.length,f={};++e<i;){var a=r[e],s=N(n,a);t(s,a)&&an(f,v(a,n),s)}return f}export{rn as a,tn as b,V as c,on as d,mn as e,fn as f,hn as g,bn as h,dn as i,D as j,un as l,gn as m,Y as t};

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -1 +0,0 @@
import{_ as l}from"./mermaid-vendor-DB8JVoWC.js";function m(e,c){var i,t,o;e.accDescr&&((i=c.setAccDescription)==null||i.call(c,e.accDescr)),e.accTitle&&((t=c.setAccTitle)==null||t.call(c,e.accTitle)),e.title&&((o=c.setDiagramTitle)==null||o.call(c,e.title))}l(m,"populateCommonDb");export{m as p};

View file

@ -1 +0,0 @@
import{_ as n,a2 as x,j as l}from"./mermaid-vendor-DB8JVoWC.js";var c=n((a,t)=>{const e=a.append("rect");if(e.attr("x",t.x),e.attr("y",t.y),e.attr("fill",t.fill),e.attr("stroke",t.stroke),e.attr("width",t.width),e.attr("height",t.height),t.name&&e.attr("name",t.name),t.rx&&e.attr("rx",t.rx),t.ry&&e.attr("ry",t.ry),t.attrs!==void 0)for(const r in t.attrs)e.attr(r,t.attrs[r]);return t.class&&e.attr("class",t.class),e},"drawRect"),d=n((a,t)=>{const e={x:t.startx,y:t.starty,width:t.stopx-t.startx,height:t.stopy-t.starty,fill:t.fill,stroke:t.stroke,class:"rect"};c(a,e).lower()},"drawBackgroundRect"),g=n((a,t)=>{const e=t.text.replace(x," "),r=a.append("text");r.attr("x",t.x),r.attr("y",t.y),r.attr("class","legend"),r.style("text-anchor",t.anchor),t.class&&r.attr("class",t.class);const s=r.append("tspan");return s.attr("x",t.x+t.textMargin*2),s.text(e),r},"drawText"),h=n((a,t,e,r)=>{const s=a.append("image");s.attr("x",t),s.attr("y",e);const i=l.sanitizeUrl(r);s.attr("xlink:href",i)},"drawImage"),m=n((a,t,e,r)=>{const s=a.append("use");s.attr("x",t),s.attr("y",e);const i=l.sanitizeUrl(r);s.attr("xlink:href",`#${i}`)},"drawEmbeddedImage"),y=n(()=>({x:0,y:0,width:100,height:100,fill:"#EDF2AE",stroke:"#666",anchor:"start",rx:0,ry:0}),"getNoteRect"),p=n(()=>({x:0,y:0,width:100,height:100,"text-anchor":"start",style:"#666",textMargin:0,rx:0,ry:0,tspan:!0}),"getTextObj");export{d as a,p as b,m as c,c as d,h as e,g as f,y as g};

View file

@ -1 +0,0 @@
import{_ as s}from"./mermaid-vendor-DB8JVoWC.js";var t,e=(t=class{constructor(i){this.init=i,this.records=this.init()}reset(){this.records=this.init()}},s(t,"ImperativeState"),t);export{e as I};

View file

@ -1 +0,0 @@
import{_ as a,d as o}from"./mermaid-vendor-DB8JVoWC.js";var d=a((t,e)=>{let n;return e==="sandbox"&&(n=o("#i"+t)),(e==="sandbox"?o(n.nodes()[0].contentDocument.body):o("body")).select(`[id="${t}"]`)},"getDiagramElement");export{d as g};

View file

@ -1,15 +0,0 @@
import{_ as e}from"./mermaid-vendor-DB8JVoWC.js";var l=e(()=>`
/* Font Awesome icon styling - consolidated */
.label-icon {
display: inline-block;
height: 1em;
overflow: visible;
vertical-align: -0.125em;
}
.node .label-icon path {
fill: currentColor;
stroke: revert;
stroke-width: revert;
}
`,"getIconStyles");export{l as g};

Some files were not shown because too many files have changed in this diff Show more