clssck 59e89772de refactor: consolidate to PostgreSQL-only backend and modernize stack

Remove legacy storage implementations and deprecated examples:
- Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends
- Remove Kubernetes deployment manifests and installation scripts
- Delete unofficial examples for deprecated backends and offline deployment docs
Streamline core infrastructure:
- Consolidate storage layer to PostgreSQL-only implementation
- Add full-text search caching with FTS cache module
- Implement metrics collection and monitoring pipeline
- Add explain and metrics API routes
Modernize frontend and tooling:
- Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles
- Update Dockerfile for PostgreSQL-only deployment
- Add Makefile for common development tasks
- Update environment and configuration examples
Enhance evaluation and testing capabilities:
- Add prompt optimization with DSPy and auto-tuning
- Implement ground truth regeneration and variant testing
- Add prompt debugging and response comparison utilities
- Expand test coverage with new integration scenarios
Simplify dependencies and configuration:
- Remove offline-specific requirement files
- Update pyproject.toml with streamlined dependencies
- Add Python version pinning with .python-version
- Create project guidelines in CLAUDE.md and AGENTS.md

2025-12-12 16:28:49 +01:00

6 KiB

Raw Blame History

🚀 LightRAG: Specialized Production Fork

A production-ready fork of LightRAG featuring S3 storage integration, a modernized Web UI, and a robust API.

🔱 About This Fork

This repository is a specialized fork of LightRAG, designed to bridge the gap between research and production. While preserving the core "Simple and Fast" philosophy, we have added critical infrastructure components:

☁️ S3 Storage Integration: Native support for S3-compatible object storage (AWS, MinIO, Cloudflare R2) for scalable document and artifact management.
🖥️ Modern Web UI: A completely redesigned interface featuring:
- S3 Browser: Integrated file management system.
- File Viewers: Built-in PDF and text viewers.
- Enhanced Layout: Resizable panes and improved UX.
🔌 Robust API: Expanded REST endpoints supporting multipart uploads, bulk operations, and advanced search parameters.
🛡️ Code Quality: Comprehensive type hinting (Pyright strict), Ruff formatting, and extensive test coverage for critical paths.

📖 Introduction to LightRAG

LightRAG incorporates graph structures into text indexing and retrieval processes. This innovative framework employs a dual-level retrieval system that enhances comprehensive information retrieval from low-level entities to high-level broader topics.

Algorithm Flowchart

Figure 1: LightRAG Indexing Flowchart (Source)

⚡ Quick Start

1. Installation

This project uses uv for fast and reliable package management.

Option A: Install from PyPI

uv pip install "lightrag-hku[api]"

Option B: Install from Source (Recommended for this Fork)

git clone https://github.com/YOUR_GITHUB_USERNAME/LightRAG.git
cd LightRAG
uv sync --extra api
source .venv/bin/activate

2. Running the Server (UI + API)

The easiest way to experience the enhancements in this fork is via the LightRAG Server.

Configure Environment:

cp env.example .env
# Edit .env to add your API keys (OpenAI/Azure/etc.) and S3 credentials

Start the Server:
```
lightrag-server
```
Access the UI: Open http://localhost:9600 to view the Knowledge Graph, upload files via the S3 browser, and perform queries.

3. Python API Example

You can also use LightRAG directly in your Python code:

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed

WORKING_DIR = "./rag_storage"
if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

async def main():
    # Initialize LightRAG
    rag = LightRAG(
        working_dir=WORKING_DIR,
        embedding_func=openai_embed,
        llm_model_func=gpt_4o_mini_complete,
    )
    await rag.initialize_storages()

    # Insert Document
    await rag.ainsert("LightRAG is a retrieval-augmented generation framework.")

    # Query
    print(await rag.aquery(
        "What is LightRAG?",
        param=QueryParam(mode="hybrid")
    ))

if __name__ == "__main__":
    asyncio.run(main())

📦 Features & Architecture

Storage Backends

Recommended production stack: PostgreSQL + pgvector + AGE-compatible graph, with S3 (or local) object storage. Other supported backends remain available where implemented (e.g., JsonKVStorage/RedisKVStorage, Neo4j/Mongo/Qdrant variants in the codebase); check env.example for the current list and maturity notes.

Type	Implementations (this fork)
KV Storage	PGKVStorage (recommended); JsonKVStorage / RedisKVStorage (legacy/optional)
Vector Storage	PGVectorStorage (pgvector)
Graph Storage	PGGraphStorage (AGE/PG)
Object Storage	S3Storage, LocalFileStorage

Specialized API Routes

This fork exposes additional endpoints:

POST /documents/upload: Multipart file upload (supports PDF, TXT, MD).
GET /storage/list: List files in S3/Local storage.
GET /storage/content: Retrieve file content.

🛠️ Configuration

See env.example for a complete list of configuration options. Key variables for this fork:

# S3 Configuration (Optional)
S3_ENDPOINT_URL=https://<accountid>.r2.cloudflarestorage.com
S3_ACCESS_KEY_ID=<your_access_key>
S3_SECRET_ACCESS_KEY=<your_secret_key>
S3_BUCKET_NAME=lightrag-docs

📚 Documentation

🤝 Contribution

Contributions are welcome! Please ensure you:

Install development dependencies: uv sync --extra test
Run tests before submitting: pytest tests/
Format code: ruff format .

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

_{Fork maintained by clssck. Based on the excellent work by the HKUDS team.}

6 KiB Raw Blame History