385 lines
9.3 KiB
Markdown
385 lines
9.3 KiB
Markdown
# OpenRAG Development Guide
|
|
|
|
A comprehensive guide for setting up and developing OpenRAG locally.
|
|
|
|
## Table of Contents
|
|
|
|
- [Architecture Overview](#architecture-overview)
|
|
- [Prerequisites](#prerequisites)
|
|
- [Environment Setup](#environment-setup)
|
|
- [Development Methods](#development-methods)
|
|
- [Local Development (Non-Docker)](#local-development-non-docker)
|
|
- [Docker Development](#docker-development)
|
|
- [API Documentation](#api-documentation)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [Contributing](#contributing)
|
|
|
|
## Architecture Overview
|
|
|
|
OpenRAG consists of four main services:
|
|
|
|
1. **Backend** (`src/`) - Python FastAPI/Starlette application with document processing, search, and chat
|
|
2. **Frontend** (`frontend/`) - Next.js React application
|
|
3. **OpenSearch** - Document storage and vector search engine
|
|
4. **Langflow** - AI workflow engine for chat functionality
|
|
|
|
### Key Technologies
|
|
|
|
- **Backend**: Python 3.13+, Starlette, OpenAI, Docling, OpenSearch
|
|
- **Frontend**: Next.js 15, React 19, TypeScript, Tailwind CSS
|
|
- **Dependencies**: UV (Python), npm (Node.js)
|
|
- **Containerization**: Docker/Podman with Compose
|
|
|
|
## Prerequisites
|
|
|
|
### System Requirements
|
|
|
|
- **Python**: 3.13+ (for local development)
|
|
- **Node.js**: 18+ (for frontend development)
|
|
- **Container Runtime**: Docker or Podman with Compose
|
|
- **Memory**: 8GB+ RAM recommended (especially for GPU workloads)
|
|
|
|
### Development Tools
|
|
|
|
```bash
|
|
# Python dependency manager
|
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
|
|
# Node.js (via nvm recommended)
|
|
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
|
|
nvm install 18
|
|
nvm use 18
|
|
```
|
|
|
|
## Environment Setup
|
|
|
|
### 1. Clone and Setup
|
|
|
|
```bash
|
|
git clone <repository-url>
|
|
cd openrag
|
|
```
|
|
|
|
### 2. Environment Variables
|
|
|
|
Create your environment configuration:
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
Edit `.env` with your configuration:
|
|
|
|
```bash
|
|
# Required
|
|
OPENSEARCH_PASSWORD=your_secure_password
|
|
OPENAI_API_KEY=sk-your_openai_api_key
|
|
|
|
# Langflow Configuration
|
|
LANGFLOW_PUBLIC_URL=http://localhost:7860
|
|
LANGFLOW_SUPERUSER=admin
|
|
LANGFLOW_SUPERUSER_PASSWORD=your_langflow_password
|
|
LANGFLOW_SECRET_KEY=your_secret_key_min_32_chars
|
|
LANGFLOW_AUTO_LOGIN=true
|
|
LANGFLOW_NEW_USER_IS_ACTIVE=true
|
|
LANGFLOW_ENABLE_SUPERUSER_CLI=true
|
|
FLOW_ID=your_flow_id
|
|
|
|
# OAuth (Optional - for Google Drive/OneDrive connectors)
|
|
GOOGLE_OAUTH_CLIENT_ID=your_google_client_id
|
|
GOOGLE_OAUTH_CLIENT_SECRET=your_google_client_secret
|
|
MICROSOFT_GRAPH_OAUTH_CLIENT_ID=your_microsoft_client_id
|
|
MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET=your_microsoft_client_secret
|
|
|
|
# Webhooks (Optional)
|
|
WEBHOOK_BASE_URL=https://your-domain.com
|
|
|
|
# AWS S3 (Optional)
|
|
AWS_ACCESS_KEY_ID=your_aws_key
|
|
AWS_SECRET_ACCESS_KEY=your_aws_secret
|
|
```
|
|
|
|
## Development Methods
|
|
|
|
Choose your preferred development approach:
|
|
|
|
## Local Development (Non-Docker)
|
|
|
|
Best for rapid development and debugging.
|
|
|
|
### Backend Setup
|
|
|
|
```bash
|
|
# Install Python dependencies
|
|
uv sync
|
|
|
|
# Start OpenSearch (required dependency)
|
|
docker run -d \
|
|
--name opensearch-dev \
|
|
-p 9200:9200 \
|
|
-p 9600:9600 \
|
|
-e "discovery.type=single-node" \
|
|
-e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=admin123" \
|
|
opensearchproject/opensearch:3.0.0
|
|
|
|
# Start backend
|
|
cd src
|
|
uv run python main.py
|
|
```
|
|
|
|
Backend will be available at: http://localhost:8000
|
|
|
|
### Frontend Setup
|
|
|
|
```bash
|
|
# Install Node.js dependencies
|
|
cd frontend
|
|
npm install
|
|
|
|
# Start development server
|
|
npm run dev
|
|
```
|
|
|
|
Frontend will be available at: http://localhost:3000
|
|
|
|
### Langflow Setup (Optional)
|
|
|
|
```bash
|
|
# Install and run Langflow
|
|
pip install langflow
|
|
langflow run --host 0.0.0.0 --port 7860
|
|
```
|
|
|
|
Langflow will be available at: http://localhost:7860
|
|
|
|
## Docker Development
|
|
|
|
Use this for a production-like environment or when you need all services.
|
|
|
|
### Available Compose Files
|
|
|
|
- `docker-compose-dev.yml` - Development (builds from source)
|
|
- `docker-compose.yml` - Production (pre-built images)
|
|
- `docker-compose-cpu.yml` - CPU-only version
|
|
|
|
### Development with Docker
|
|
|
|
```bash
|
|
# Build and start all services
|
|
docker compose -f docker-compose-dev.yml up --build
|
|
|
|
# Or with Podman
|
|
podman compose -f docker-compose-dev.yml up --build
|
|
|
|
# Run in background
|
|
docker compose -f docker-compose-dev.yml up --build -d
|
|
|
|
# View logs
|
|
docker compose -f docker-compose-dev.yml logs -f
|
|
|
|
# Stop services
|
|
docker compose -f docker-compose-dev.yml down
|
|
```
|
|
|
|
### Service Ports
|
|
|
|
- **Frontend**: http://localhost:3000
|
|
- **Backend**: http://localhost:8000 (internal)
|
|
- **OpenSearch**: http://localhost:9200
|
|
- **OpenSearch Dashboards**: http://localhost:5601
|
|
- **Langflow**: http://localhost:7860
|
|
|
|
### Reset Development Environment
|
|
|
|
```bash
|
|
# Complete reset (removes volumes and rebuilds)
|
|
docker compose -f docker-compose-dev.yml down -v
|
|
docker compose -f docker-compose-dev.yml up --build --force-recreate --remove-orphans
|
|
```
|
|
|
|
## API Documentation
|
|
|
|
### Key Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/search` | POST | Search documents with filters |
|
|
| `/upload` | POST | Upload documents |
|
|
| `/upload_path` | POST | Upload from local path |
|
|
| `/tasks` | GET | List processing tasks |
|
|
| `/tasks/{id}` | GET | Get task status |
|
|
| `/connectors` | GET | List available connectors |
|
|
| `/auth/me` | GET | Get current user info |
|
|
| `/knowledge-filter` | POST/GET | Manage knowledge filters |
|
|
|
|
### Example API Calls
|
|
|
|
```bash
|
|
# Search all documents
|
|
curl -X POST http://localhost:8000/search \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"query": "*", "limit": 100}'
|
|
|
|
# Upload a document
|
|
curl -X POST http://localhost:8000/upload \
|
|
-F "file=@document.pdf"
|
|
|
|
# Get task status
|
|
curl http://localhost:8000/tasks/task_id_here
|
|
```
|
|
|
|
### Frontend API Proxy
|
|
|
|
The Next.js frontend proxies API calls through `/api/*` to the backend at `http://openrag-backend:8000` (in Docker) or `http://localhost:8000` (local).
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### Docker/Podman Issues
|
|
|
|
**Issue**: `docker: command not found`
|
|
```bash
|
|
# Install Docker Desktop or use Podman
|
|
brew install podman podman-desktop
|
|
podman machine init --memory 8192
|
|
podman machine start
|
|
```
|
|
|
|
**Issue**: Out of memory during build
|
|
```bash
|
|
# For Podman on macOS
|
|
podman machine stop
|
|
podman machine rm
|
|
podman machine init --memory 8192
|
|
podman machine start
|
|
```
|
|
|
|
#### Backend Issues
|
|
|
|
**Issue**: `ModuleNotFoundError` or dependency issues
|
|
```bash
|
|
# Ensure you're using the right Python version
|
|
python --version # Should be 3.13+
|
|
uv sync --reinstall
|
|
```
|
|
|
|
**Issue**: OpenSearch connection failed
|
|
```bash
|
|
# Check if OpenSearch is running
|
|
curl -k -u admin:admin123 https://localhost:9200
|
|
# If using Docker, ensure the container is running
|
|
docker ps | grep opensearch
|
|
```
|
|
|
|
**Issue**: CUDA/GPU not detected
|
|
```bash
|
|
# Check GPU availability
|
|
python -c "import torch; print(torch.cuda.is_available())"
|
|
# For CPU-only development, use docker-compose-cpu.yml
|
|
```
|
|
|
|
#### Frontend Issues
|
|
|
|
**Issue**: Next.js build failures
|
|
```bash
|
|
# Clear cache and reinstall
|
|
cd frontend
|
|
rm -rf .next node_modules package-lock.json
|
|
npm install
|
|
npm run dev
|
|
```
|
|
|
|
**Issue**: API calls failing
|
|
- Check that backend is running on port 8000
|
|
- Verify environment variables are set correctly
|
|
- Check browser network tab for CORS or proxy issues
|
|
|
|
#### Document Processing Issues
|
|
|
|
**Issue**: Docling model download failures
|
|
```bash
|
|
# Pre-download models
|
|
uv run docling-tools models download
|
|
# Or clear cache and retry
|
|
rm -rf ~/.cache/docling
|
|
```
|
|
|
|
**Issue**: EasyOCR initialization errors
|
|
```bash
|
|
# Clear EasyOCR cache
|
|
rm -rf ~/.EasyOCR
|
|
# Restart the backend to reinitialize
|
|
```
|
|
|
|
### Development Tips
|
|
|
|
1. **Hot Reloading**:
|
|
- Backend: Use `uvicorn src.main:app --reload` for auto-restart
|
|
- Frontend: `npm run dev` provides hot reloading
|
|
|
|
2. **Debugging**:
|
|
- Add `print()` statements or use `pdb.set_trace()` in Python
|
|
- Use browser dev tools for frontend debugging
|
|
- Check Docker logs: `docker compose logs -f service_name`
|
|
|
|
3. **Database Inspection**:
|
|
- Access OpenSearch Dashboards at http://localhost:5601
|
|
- Use curl to query OpenSearch directly
|
|
- Check the `documents` index for uploaded content
|
|
|
|
4. **Performance**:
|
|
- GPU processing is much faster for document processing
|
|
- Use CPU-only mode if GPU issues occur
|
|
- Monitor memory usage with `docker stats` or `htop`
|
|
|
|
### Log Locations
|
|
|
|
- **Backend**: Console output or container logs
|
|
- **Frontend**: Browser console and Next.js terminal
|
|
- **OpenSearch**: Container logs (`docker compose logs opensearch`)
|
|
- **Langflow**: Container logs (`docker compose logs langflow`)
|
|
|
|
## Contributing
|
|
|
|
### Code Style
|
|
|
|
- **Python**: Follow PEP 8, use `black` for formatting
|
|
- **TypeScript**: Use ESLint configuration in `frontend/`
|
|
- **Commits**: Use conventional commit messages
|
|
|
|
### Development Workflow
|
|
|
|
1. Create feature branch from `main`
|
|
2. Make changes and test locally
|
|
3. Run tests (if available)
|
|
4. Create pull request with description
|
|
5. Ensure all checks pass
|
|
|
|
### Testing
|
|
|
|
```bash
|
|
# Backend tests (if available)
|
|
cd src
|
|
uv run pytest
|
|
|
|
# Frontend tests (if available)
|
|
cd frontend
|
|
npm test
|
|
|
|
# Integration tests with Docker
|
|
docker compose -f docker-compose-dev.yml up --build
|
|
# Test API endpoints manually or with automated tests
|
|
```
|
|
|
|
## Additional Resources
|
|
|
|
- [OpenSearch Documentation](https://opensearch.org/docs/)
|
|
- [Langflow Documentation](https://docs.langflow.org/)
|
|
- [Next.js Documentation](https://nextjs.org/docs)
|
|
- [Starlette Documentation](https://www.starlette.io/)
|
|
- [Docling Documentation](https://ds4sd.github.io/docling/)
|
|
|
|
---
|
|
|
|
For questions or issues, please check the troubleshooting section above or create an issue in the repository.
|