No description
Find a file
clssck abb44eccb1 feat(lightrag): improve entity extraction prompts and rerank chunking
Enhance entity extraction with better structured prompts:
- Reorganize prompt format for improved clarity and consistency
- Add XML-style formatting tags for better LLM parsing
- Include language parameter in keywords extraction cache key
- Fix language parameter usage in keywords_extraction prompt

Improve rerank module with chunking fixes:
- Fix top_n behavior to limit documents instead of chunks
- Add Cohere reranker support with proper chunking
- Improve error handling for rerank API responses

Update operate.py:
- Better entity extraction parsing and validation
- Improved cache key generation for multilingual support
2025-12-12 16:45:14 +01:00
.clinerules Add testing workflow guidelines to basic development rules 2025-11-18 11:54:19 +08:00
.github chore: sync with upstream (#4) 2025-12-03 13:16:28 +01:00
assets
docker/postgres-age-vector feat: add automatic entity resolution with 3-layer matching 2025-11-27 15:35:02 +01:00
docs refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
examples refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
lightrag feat(lightrag): improve entity extraction prompts and rerank chunking 2025-12-12 16:45:14 +01:00
lightrag_webui refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
optimization_results refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
reproduce test(lightrag,api): add comprehensive test coverage and S3 support 2025-12-05 23:13:39 +01:00
tests refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
.dockerignore test: fix env handling, add type hints, improve docs 2025-12-03 15:02:11 +01:00
.gitattributes
.gitignore test(lightrag,examples,api): comprehensive ruff formatting and type hints 2025-12-05 15:17:06 +01:00
.pre-commit-config.yaml
.python-version refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
AGENTS.md refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
CLAUDE.md refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
config.ini.example refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
docker-compose.test.yml refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
Dockerfile refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
env.example refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
LICENSE
lightrag.service.example Refactor systemd service config to use environment variables 2025-10-29 20:14:17 +08:00
Makefile refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
MANIFEST.in Include static files in package distribution 2025-10-30 10:50:28 +08:00
monitor_pipeline.py refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
pyproject.toml refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
pyrightconfig.json refactor: remove legacy storage implementations and k8s deployment 2025-12-09 14:02:00 +01:00
README.md refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
ruff.toml test(lightrag,examples,api): comprehensive ruff formatting and type hints 2025-12-05 15:17:06 +01:00
SECURITY.md Fix linting 2025-05-12 23:27:41 +08:00
setup.py Refactor setup.py to utilize pyproject.toml for project installation. 2025-07-05 11:19:00 +08:00
ty.toml refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
upload_pdfs.py refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00
uv.lock refactor: consolidate to PostgreSQL-only backend and modernize stack 2025-12-12 16:28:49 +01:00

LightRAG Logo

🚀 LightRAG: Specialized Production Fork

A production-ready fork of LightRAG featuring S3 storage integration, a modernized Web UI, and a robust API.

🔱 About This Fork

This repository is a specialized fork of LightRAG, designed to bridge the gap between research and production. While preserving the core "Simple and Fast" philosophy, we have added critical infrastructure components:

  • ☁️ S3 Storage Integration: Native support for S3-compatible object storage (AWS, MinIO, Cloudflare R2) for scalable document and artifact management.
  • 🖥️ Modern Web UI: A completely redesigned interface featuring:
    • S3 Browser: Integrated file management system.
    • File Viewers: Built-in PDF and text viewers.
    • Enhanced Layout: Resizable panes and improved UX.
  • 🔌 Robust API: Expanded REST endpoints supporting multipart uploads, bulk operations, and advanced search parameters.
  • 🛡️ Code Quality: Comprehensive type hinting (Pyright strict), Ruff formatting, and extensive test coverage for critical paths.

📖 Introduction to LightRAG

LightRAG incorporates graph structures into text indexing and retrieval processes. This innovative framework employs a dual-level retrieval system that enhances comprehensive information retrieval from low-level entities to high-level broader topics.

Algorithm Flowchart

LightRAG Indexing Flowchart Figure 1: LightRAG Indexing Flowchart (Source)

Quick Start

1. Installation

This project uses uv for fast and reliable package management.

Option A: Install from PyPI

uv pip install "lightrag-hku[api]"

Option B: Install from Source (Recommended for this Fork)

git clone https://github.com/YOUR_GITHUB_USERNAME/LightRAG.git
cd LightRAG
uv sync --extra api
source .venv/bin/activate

2. Running the Server (UI + API)

The easiest way to experience the enhancements in this fork is via the LightRAG Server.

  1. Configure Environment:

    cp env.example .env
    # Edit .env to add your API keys (OpenAI/Azure/etc.) and S3 credentials
    
  2. Start the Server:

    lightrag-server
    
  3. Access the UI: Open http://localhost:9600 to view the Knowledge Graph, upload files via the S3 browser, and perform queries.

3. Python API Example

You can also use LightRAG directly in your Python code:

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed

WORKING_DIR = "./rag_storage"
if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

async def main():
    # Initialize LightRAG
    rag = LightRAG(
        working_dir=WORKING_DIR,
        embedding_func=openai_embed,
        llm_model_func=gpt_4o_mini_complete,
    )
    await rag.initialize_storages()

    # Insert Document
    await rag.ainsert("LightRAG is a retrieval-augmented generation framework.")

    # Query
    print(await rag.aquery(
        "What is LightRAG?",
        param=QueryParam(mode="hybrid")
    ))

if __name__ == "__main__":
    asyncio.run(main())

📦 Features & Architecture

Storage Backends

Recommended production stack: PostgreSQL + pgvector + AGE-compatible graph, with S3 (or local) object storage. Other supported backends remain available where implemented (e.g., JsonKVStorage/RedisKVStorage, Neo4j/Mongo/Qdrant variants in the codebase); check env.example for the current list and maturity notes.

Type Implementations (this fork)
KV Storage PGKVStorage (recommended); JsonKVStorage / RedisKVStorage (legacy/optional)
Vector Storage PGVectorStorage (pgvector)
Graph Storage PGGraphStorage (AGE/PG)
Object Storage S3Storage, LocalFileStorage

Specialized API Routes

This fork exposes additional endpoints:

  • POST /documents/upload: Multipart file upload (supports PDF, TXT, MD).
  • GET /storage/list: List files in S3/Local storage.
  • GET /storage/content: Retrieve file content.

🛠️ Configuration

See env.example for a complete list of configuration options. Key variables for this fork:

# S3 Configuration (Optional)
S3_ENDPOINT_URL=https://<accountid>.r2.cloudflarestorage.com
S3_ACCESS_KEY_ID=<your_access_key>
S3_SECRET_ACCESS_KEY=<your_secret_key>
S3_BUCKET_NAME=lightrag-docs

📚 Documentation

🤝 Contribution

Contributions are welcome! Please ensure you:

  1. Install development dependencies: uv sync --extra test
  2. Run tests before submitting: pytest tests/
  3. Format code: ruff format .

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.


Fork maintained by clssck. Based on the excellent work by the HKUDS team.