* Partial implementation of phase-0 * Partial implementation of phase-1 * add report * add postgress * Revert "add postgress" This reverts commit 27778dc6bb3906b5220dd386e47fe32ca7415332. * remove junk * Cleaned up annd setup docs * update docs * moved report * Updated load_markdown_files function: Now returns tuples with (content, title, relative_path) instead of just (content, title) * fixes to load docs script and more env variables for llm configuration * update prod values * update docs * apolo docs support with linking * update docs to reflect url conventions and mapping with docs * Adds ingress and forwardAuth configurations Adds ingress configuration to expose the application. Adds forwardAuth configuration to enable user authentication. Includes middleware to strip headers. * Adds ingress and forward authentication middleware support
5.7 KiB
LightRAG Documentation Loader
Advanced script to load markdown documentation into LightRAG with flexible reference modes.
Quick Start
# Default mode (file path references)
python load_docs.py /path/to/your/docs
# URL mode (website link references)
python load_docs.py /path/to/docs --mode urls --base-url https://docs.example.com/
Reference Modes
Files Mode (Default)
Uses local file paths in query response citations:
python load_docs.py docs/
python load_docs.py docs/ --mode files
Query Response Example:
### References
- [DC] getting-started/installation.md
- [KG] administration/setup.md
URLs Mode
Uses website URLs in query response citations:
python load_docs.py docs/ --mode urls --base-url https://docs.apolo.us/index/
python load_docs.py docs/ --mode urls --base-url https://my-docs.com/v1/
Query Response Example:
### References
- [DC] https://docs.apolo.us/index/getting-started/installation
- [KG] https://docs.apolo.us/index/administration/setup
⚠️ Important for URLs Mode: Your local file structure must match your documentation site's URL structure for proper link generation.
File Structure Requirements:
docs/
├── getting-started/
│ ├── installation.md → https://docs.example.com/getting-started/installation
│ └── first-steps.md → https://docs.example.com/getting-started/first-steps
├── administration/
│ ├── README.md → https://docs.example.com/administration
│ └── setup.md → https://docs.example.com/administration/setup
└── README.md → https://docs.example.com/
URL Mapping Rules:
.mdextension is removed from URLsREADME.mdfiles map to their directory URL- Subdirectories become URL path segments
- Hyphens and underscores in filenames are preserved
Organizing Docs for URL Mode
Step 1: Analyze Your Documentation Site Structure
# Visit your docs site and note the URL patterns:
# https://docs.example.com/getting-started/installation
# https://docs.example.com/api/authentication
# https://docs.example.com/guides/deployment
Step 2: Create Matching Directory Structure
mkdir -p docs/{getting-started,api,guides}
Step 3: Organize Your Markdown Files
# Match each URL to a file path:
docs/getting-started/installation.md # → /getting-started/installation
docs/api/authentication.md # → /api/authentication
docs/guides/deployment.md # → /guides/deployment
docs/guides/README.md # → /guides (overview page)
Step 4: Verify URL Mapping
# Test a few URLs manually to ensure they work:
curl -I https://docs.example.com/getting-started/installation
curl -I https://docs.example.com/api/authentication
Common Documentation Site Patterns:
| Site Type | File Structure | URL Structure |
|---|---|---|
| GitBook | docs/section/page.md |
/section/page |
| Docusaurus | docs/section/page.md |
/docs/section/page |
| MkDocs | docs/section/page.md |
/section/page/ |
| Custom | Varies | Match your site's pattern |
Real Example: Apolo Documentation
# Apolo docs site: https://docs.apolo.us/index/
# Your local structure should match:
apolo-docs/
├── getting-started/
│ ├── first-steps/
│ │ ├── getting-started.md → /index/getting-started/first-steps/getting-started
│ │ └── README.md → /index/getting-started/first-steps
│ ├── apolo-base-docker-image.md → /index/getting-started/apolo-base-docker-image
│ └── faq.md → /index/getting-started/faq
├── apolo-console/
│ └── getting-started/
│ └── sign-up-login.md → /index/apolo-console/getting-started/sign-up-login
└── README.md → /index/
# Load with correct base URL:
python load_docs.py apolo-docs/ --mode urls --base-url https://docs.apolo.us/index/
Complete Usage Examples
# Load Apolo documentation with URL references
python load_docs.py ../apolo-copilot/docs/official-apolo-documentation/docs \
--mode urls --base-url https://docs.apolo.us/index/
# Load with custom LightRAG endpoint
python load_docs.py docs/ --endpoint https://lightrag.example.com
# Load to local instance, skip test query
python load_docs.py docs/ --no-test
# Files mode with custom endpoint
python load_docs.py docs/ --mode files --endpoint http://localhost:9621
Features
- Dual Reference Modes: File paths or live website URLs in citations
- Flexible Base URL: Works with any documentation site structure
- Simple dependency: Only requires
httpxand Python standard library - Automatic discovery: Finds all
.mdfiles recursively - Smart metadata: Adds appropriate title, path/URL, and source information
- Progress tracking: Shows loading progress with success/failure counts
- Health checks: Verifies LightRAG connectivity before loading
- Test queries: Validates functionality after loading
- Error handling: Clear validation and error messages
Requirements
pip install httpx
Use Cases
This loader is perfect for:
- Kubernetes deployments: Self-contained with minimal dependencies
- Quick testing: Immediate setup without complex environments
- Documentation loading: Any markdown-based documentation
- Development workflows: Fast iteration and testing
Requirements
pip install httpx
Note: This script is included with LightRAG deployments and provides a simple way to load any markdown documentation into your LightRAG instance.