* Partial implementation of phase-0 * Partial implementation of phase-1 * add report * add postgress * Revert "add postgress" This reverts commit 27778dc6bb3906b5220dd386e47fe32ca7415332. * remove junk * Cleaned up annd setup docs * update docs * moved report * Updated load_markdown_files function: Now returns tuples with (content, title, relative_path) instead of just (content, title) * fixes to load docs script and more env variables for llm configuration * update prod values * update docs * apolo docs support with linking * update docs to reflect url conventions and mapping with docs * Adds ingress and forwardAuth configurations Adds ingress configuration to expose the application. Adds forwardAuth configuration to enable user authentication. Includes middleware to strip headers. * Adds ingress and forward authentication middleware support
174 lines
No EOL
5.7 KiB
Markdown
174 lines
No EOL
5.7 KiB
Markdown
# LightRAG Documentation Loader
|
|
|
|
Advanced script to load markdown documentation into LightRAG with flexible reference modes.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Default mode (file path references)
|
|
python load_docs.py /path/to/your/docs
|
|
|
|
# URL mode (website link references)
|
|
python load_docs.py /path/to/docs --mode urls --base-url https://docs.example.com/
|
|
```
|
|
|
|
## Reference Modes
|
|
|
|
### Files Mode (Default)
|
|
Uses local file paths in query response citations:
|
|
```bash
|
|
python load_docs.py docs/
|
|
python load_docs.py docs/ --mode files
|
|
```
|
|
|
|
**Query Response Example:**
|
|
```
|
|
### References
|
|
- [DC] getting-started/installation.md
|
|
- [KG] administration/setup.md
|
|
```
|
|
|
|
### URLs Mode
|
|
Uses website URLs in query response citations:
|
|
```bash
|
|
python load_docs.py docs/ --mode urls --base-url https://docs.apolo.us/index/
|
|
python load_docs.py docs/ --mode urls --base-url https://my-docs.com/v1/
|
|
```
|
|
|
|
**Query Response Example:**
|
|
```
|
|
### References
|
|
- [DC] https://docs.apolo.us/index/getting-started/installation
|
|
- [KG] https://docs.apolo.us/index/administration/setup
|
|
```
|
|
|
|
**⚠️ Important for URLs Mode**: Your local file structure must match your documentation site's URL structure for proper link generation.
|
|
|
|
**File Structure Requirements:**
|
|
```
|
|
docs/
|
|
├── getting-started/
|
|
│ ├── installation.md → https://docs.example.com/getting-started/installation
|
|
│ └── first-steps.md → https://docs.example.com/getting-started/first-steps
|
|
├── administration/
|
|
│ ├── README.md → https://docs.example.com/administration
|
|
│ └── setup.md → https://docs.example.com/administration/setup
|
|
└── README.md → https://docs.example.com/
|
|
```
|
|
|
|
**URL Mapping Rules:**
|
|
- `.md` extension is removed from URLs
|
|
- `README.md` files map to their directory URL
|
|
- Subdirectories become URL path segments
|
|
- Hyphens and underscores in filenames are preserved
|
|
|
|
### Organizing Docs for URL Mode
|
|
|
|
**Step 1: Analyze Your Documentation Site Structure**
|
|
```bash
|
|
# Visit your docs site and note the URL patterns:
|
|
# https://docs.example.com/getting-started/installation
|
|
# https://docs.example.com/api/authentication
|
|
# https://docs.example.com/guides/deployment
|
|
```
|
|
|
|
**Step 2: Create Matching Directory Structure**
|
|
```bash
|
|
mkdir -p docs/{getting-started,api,guides}
|
|
```
|
|
|
|
**Step 3: Organize Your Markdown Files**
|
|
```bash
|
|
# Match each URL to a file path:
|
|
docs/getting-started/installation.md # → /getting-started/installation
|
|
docs/api/authentication.md # → /api/authentication
|
|
docs/guides/deployment.md # → /guides/deployment
|
|
docs/guides/README.md # → /guides (overview page)
|
|
```
|
|
|
|
**Step 4: Verify URL Mapping**
|
|
```bash
|
|
# Test a few URLs manually to ensure they work:
|
|
curl -I https://docs.example.com/getting-started/installation
|
|
curl -I https://docs.example.com/api/authentication
|
|
```
|
|
|
|
**Common Documentation Site Patterns:**
|
|
|
|
| Site Type | File Structure | URL Structure |
|
|
|-----------|---------------|---------------|
|
|
| **GitBook** | `docs/section/page.md` | `/section/page` |
|
|
| **Docusaurus** | `docs/section/page.md` | `/docs/section/page` |
|
|
| **MkDocs** | `docs/section/page.md` | `/section/page/` |
|
|
| **Custom** | Varies | Match your site's pattern |
|
|
|
|
**Real Example: Apolo Documentation**
|
|
```bash
|
|
# Apolo docs site: https://docs.apolo.us/index/
|
|
# Your local structure should match:
|
|
apolo-docs/
|
|
├── getting-started/
|
|
│ ├── first-steps/
|
|
│ │ ├── getting-started.md → /index/getting-started/first-steps/getting-started
|
|
│ │ └── README.md → /index/getting-started/first-steps
|
|
│ ├── apolo-base-docker-image.md → /index/getting-started/apolo-base-docker-image
|
|
│ └── faq.md → /index/getting-started/faq
|
|
├── apolo-console/
|
|
│ └── getting-started/
|
|
│ └── sign-up-login.md → /index/apolo-console/getting-started/sign-up-login
|
|
└── README.md → /index/
|
|
|
|
# Load with correct base URL:
|
|
python load_docs.py apolo-docs/ --mode urls --base-url https://docs.apolo.us/index/
|
|
```
|
|
|
|
## Complete Usage Examples
|
|
|
|
```bash
|
|
# Load Apolo documentation with URL references
|
|
python load_docs.py ../apolo-copilot/docs/official-apolo-documentation/docs \
|
|
--mode urls --base-url https://docs.apolo.us/index/
|
|
|
|
# Load with custom LightRAG endpoint
|
|
python load_docs.py docs/ --endpoint https://lightrag.example.com
|
|
|
|
# Load to local instance, skip test query
|
|
python load_docs.py docs/ --no-test
|
|
|
|
# Files mode with custom endpoint
|
|
python load_docs.py docs/ --mode files --endpoint http://localhost:9621
|
|
```
|
|
|
|
## Features
|
|
|
|
- **Dual Reference Modes**: File paths or live website URLs in citations
|
|
- **Flexible Base URL**: Works with any documentation site structure
|
|
- **Simple dependency**: Only requires `httpx` and Python standard library
|
|
- **Automatic discovery**: Finds all `.md` files recursively
|
|
- **Smart metadata**: Adds appropriate title, path/URL, and source information
|
|
- **Progress tracking**: Shows loading progress with success/failure counts
|
|
- **Health checks**: Verifies LightRAG connectivity before loading
|
|
- **Test queries**: Validates functionality after loading
|
|
- **Error handling**: Clear validation and error messages
|
|
|
|
## Requirements
|
|
|
|
```bash
|
|
pip install httpx
|
|
```
|
|
|
|
## Use Cases
|
|
|
|
This loader is perfect for:
|
|
- **Kubernetes deployments**: Self-contained with minimal dependencies
|
|
- **Quick testing**: Immediate setup without complex environments
|
|
- **Documentation loading**: Any markdown-based documentation
|
|
- **Development workflows**: Fast iteration and testing
|
|
|
|
## Requirements
|
|
|
|
```bash
|
|
pip install httpx
|
|
```
|
|
|
|
**Note**: This script is included with LightRAG deployments and provides a simple way to load any markdown documentation into your LightRAG instance. |