LightRAG/README.md
clssck 59e89772de refactor: consolidate to PostgreSQL-only backend and modernize stack
Remove legacy storage implementations and deprecated examples:
- Delete FAISS, JSON, Memgraph, Milvus, MongoDB, Nano Vector DB, Neo4j, NetworkX, Qdrant, Redis storage backends
- Remove Kubernetes deployment manifests and installation scripts
- Delete unofficial examples for deprecated backends and offline deployment docs
Streamline core infrastructure:
- Consolidate storage layer to PostgreSQL-only implementation
- Add full-text search caching with FTS cache module
- Implement metrics collection and monitoring pipeline
- Add explain and metrics API routes
Modernize frontend and tooling:
- Switch web UI to Bun with bun.lock, remove npm and pnpm lockfiles
- Update Dockerfile for PostgreSQL-only deployment
- Add Makefile for common development tasks
- Update environment and configuration examples
Enhance evaluation and testing capabilities:
- Add prompt optimization with DSPy and auto-tuning
- Implement ground truth regeneration and variant testing
- Add prompt debugging and response comparison utilities
- Expand test coverage with new integration scenarios
Simplify dependencies and configuration:
- Remove offline-specific requirement files
- Update pyproject.toml with streamlined dependencies
- Add Python version pinning with .python-version
- Create project guidelines in CLAUDE.md and AGENTS.md
2025-12-12 16:28:49 +01:00

173 lines
6 KiB
Markdown

<div align="center">
<div style="margin: 20px 0;">
<img src="./assets/logo.png" width="120" height="120" alt="LightRAG Logo" style="border-radius: 20px; box-shadow: 0 8px 32px rgba(0, 217, 255, 0.3);">
</div>
# 🚀 LightRAG: Specialized Production Fork
<div align="center">
<img src="https://img.shields.io/badge/Release-Specialized%20Fork-00d9ff?style=for-the-badge&logo=git&logoColor=white&labelColor=1a1a2e">
<a href="https://github.com/HKUDS/LightRAG"><img src="https://img.shields.io/badge/Upstream-HKUDS%2FLightRAG-7289da?style=for-the-badge&logo=github&logoColor=white&labelColor=1a1a2e"></a>
</div>
<div align="center">
<div style="width: 100%; height: 2px; margin: 20px 0; background: linear-gradient(90deg, transparent, #00d9ff, transparent);"></div>
</div>
<p align="center">
<b>A production-ready fork of LightRAG featuring S3 storage integration, a modernized Web UI, and a robust API.</b>
</p>
</div>
## 🔱 About This Fork
This repository is a specialized fork of [LightRAG](https://github.com/HKUDS/LightRAG), designed to bridge the gap between research and production. While preserving the core "Simple and Fast" philosophy, we have added critical infrastructure components:
- **☁️ S3 Storage Integration**: Native support for S3-compatible object storage (AWS, MinIO, Cloudflare R2) for scalable document and artifact management.
- **🖥️ Modern Web UI**: A completely redesigned interface featuring:
- **S3 Browser**: Integrated file management system.
- **File Viewers**: Built-in PDF and text viewers.
- **Enhanced Layout**: Resizable panes and improved UX.
- **🔌 Robust API**: Expanded REST endpoints supporting multipart uploads, bulk operations, and advanced search parameters.
- **🛡️ Code Quality**: Comprehensive type hinting (Pyright strict), Ruff formatting, and extensive test coverage for critical paths.
---
## 📖 Introduction to LightRAG
**LightRAG** incorporates graph structures into text indexing and retrieval processes. This innovative framework employs a dual-level retrieval system that enhances comprehensive information retrieval from low-level entities to high-level broader topics.
<details>
<summary><b>Algorithm Flowchart</b></summary>
![LightRAG Indexing Flowchart](https://learnopencv.com/wp-content/uploads/2024/11/LightRAG-VectorDB-Json-KV-Store-Indexing-Flowchart-scaled.jpg)
*Figure 1: LightRAG Indexing Flowchart ([Source](https://learnopencv.com/lightrag/))*
</details>
## ⚡ Quick Start
### 1. Installation
This project uses [uv](https://docs.astral.sh/uv/) for fast and reliable package management.
**Option A: Install from PyPI**
```bash
uv pip install "lightrag-hku[api]"
```
**Option B: Install from Source (Recommended for this Fork)**
```bash
git clone https://github.com/YOUR_GITHUB_USERNAME/LightRAG.git
cd LightRAG
uv sync --extra api
source .venv/bin/activate
```
### 2. Running the Server (UI + API)
The easiest way to experience the enhancements in this fork is via the LightRAG Server.
1. **Configure Environment**:
```bash
cp env.example .env
# Edit .env to add your API keys (OpenAI/Azure/etc.) and S3 credentials
```
2. **Start the Server**:
```bash
lightrag-server
```
3. **Access the UI**:
Open [http://localhost:9600](http://localhost:9600) to view the Knowledge Graph, upload files via the S3 browser, and perform queries.
### 3. Python API Example
You can also use LightRAG directly in your Python code:
```python
import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed
WORKING_DIR = "./rag_storage"
if not os.path.exists(WORKING_DIR):
os.mkdir(WORKING_DIR)
async def main():
# Initialize LightRAG
rag = LightRAG(
working_dir=WORKING_DIR,
embedding_func=openai_embed,
llm_model_func=gpt_4o_mini_complete,
)
await rag.initialize_storages()
# Insert Document
await rag.ainsert("LightRAG is a retrieval-augmented generation framework.")
# Query
print(await rag.aquery(
"What is LightRAG?",
param=QueryParam(mode="hybrid")
))
if __name__ == "__main__":
asyncio.run(main())
```
## 📦 Features & Architecture
### Storage Backends
Recommended production stack: PostgreSQL + pgvector + AGE-compatible graph, with S3 (or local) object storage. Other supported backends remain available where implemented (e.g., JsonKVStorage/RedisKVStorage, Neo4j/Mongo/Qdrant variants in the codebase); check `env.example` for the current list and maturity notes.
| Type | Implementations (this fork) |
|------|-----------------------------|
| **KV Storage** | PGKVStorage (recommended); JsonKVStorage / RedisKVStorage (legacy/optional) |
| **Vector Storage** | PGVectorStorage (pgvector) |
| **Graph Storage** | PGGraphStorage (AGE/PG) |
| **Object Storage** | S3Storage, LocalFileStorage |
### Specialized API Routes
This fork exposes additional endpoints:
- `POST /documents/upload`: Multipart file upload (supports PDF, TXT, MD).
- `GET /storage/list`: List files in S3/Local storage.
- `GET /storage/content`: Retrieve file content.
## 🛠️ Configuration
See `env.example` for a complete list of configuration options. Key variables for this fork:
```ini
# S3 Configuration (Optional)
S3_ENDPOINT_URL=https://<accountid>.r2.cloudflarestorage.com
S3_ACCESS_KEY_ID=<your_access_key>
S3_SECRET_ACCESS_KEY=<your_secret_key>
S3_BUCKET_NAME=lightrag-docs
```
## 📚 Documentation
- [API Documentation](./lightrag/api/README.md)
- [Offline Deployment](./docs/OfflineDeployment.md)
- [Docker Deployment](./docs/DockerDeployment.md)
## 🤝 Contribution
Contributions are welcome! Please ensure you:
1. Install development dependencies: `uv sync --extra test`
2. Run tests before submitting: `pytest tests/`
3. Format code: `ruff format .`
## 📜 License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
---
<div align="center">
<sub>Fork maintained by <a href="https://github.com/clssck">clssck</a>. Based on the excellent work by the <a href="https://github.com/HKUDS">HKUDS</a> team.</sub>
</div>