openrag/DEV-README.md
2025-09-07 19:20:06 -04:00

9.3 KiB

OpenRAG Development Guide

A comprehensive guide for setting up and developing OpenRAG locally.

Table of Contents

Architecture Overview

OpenRAG consists of four main services:

  1. Backend (src/) - Python FastAPI/Starlette application with document processing, search, and chat
  2. Frontend (frontend/) - Next.js React application
  3. OpenSearch - Document storage and vector search engine
  4. Langflow - AI workflow engine for chat functionality

Key Technologies

  • Backend: Python 3.13+, Starlette, OpenAI, Docling, OpenSearch
  • Frontend: Next.js 15, React 19, TypeScript, Tailwind CSS
  • Dependencies: UV (Python), npm (Node.js)
  • Containerization: Docker/Podman with Compose

Prerequisites

System Requirements

  • Python: 3.13+ (for local development)
  • Node.js: 18+ (for frontend development)
  • Container Runtime: Docker or Podman with Compose
  • Memory: 8GB+ RAM recommended (especially for GPU workloads)

Development Tools

# Python dependency manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Node.js (via nvm recommended)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
nvm install 18
nvm use 18

Environment Setup

1. Clone and Setup

git clone <repository-url>
cd openrag

2. Environment Variables

Create your environment configuration:

cp .env.example .env

Edit .env with your configuration:

# Required
OPENSEARCH_PASSWORD=your_secure_password
OPENAI_API_KEY=sk-your_openai_api_key

# Langflow Configuration
LANGFLOW_PUBLIC_URL=http://localhost:7860
LANGFLOW_SUPERUSER=admin
LANGFLOW_SUPERUSER_PASSWORD=your_langflow_password
LANGFLOW_SECRET_KEY=your_secret_key_min_32_chars
LANGFLOW_AUTO_LOGIN=true
LANGFLOW_NEW_USER_IS_ACTIVE=true
LANGFLOW_ENABLE_SUPERUSER_CLI=true
FLOW_ID=your_flow_id

# OAuth (Optional - for Google Drive/OneDrive connectors)
GOOGLE_OAUTH_CLIENT_ID=your_google_client_id
GOOGLE_OAUTH_CLIENT_SECRET=your_google_client_secret
MICROSOFT_GRAPH_OAUTH_CLIENT_ID=your_microsoft_client_id
MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET=your_microsoft_client_secret

# Webhooks (Optional)
WEBHOOK_BASE_URL=https://your-domain.com

# AWS S3 (Optional)
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret

Development Methods

Choose your preferred development approach:

Local Development (Non-Docker)

Best for rapid development and debugging.

Backend Setup

# Install Python dependencies
uv sync

# Start OpenSearch (required dependency)
docker run -d \
  --name opensearch-dev \
  -p 9200:9200 \
  -p 9600:9600 \
  -e "discovery.type=single-node" \
  -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=admin123" \
  opensearchproject/opensearch:3.0.0

# Start backend
cd src
uv run python main.py

Backend will be available at: http://localhost:8000

Frontend Setup

# Install Node.js dependencies
cd frontend
npm install

# Start development server
npm run dev

Frontend will be available at: http://localhost:3000

Langflow Setup (Optional)

# Install and run Langflow
pip install langflow
langflow run --host 0.0.0.0 --port 7860

Langflow will be available at: http://localhost:7860

Docker Development

Use this for a production-like environment or when you need all services.

Available Compose Files

  • docker-compose-dev.yml - Development (builds from source)
  • docker-compose.yml - Production (pre-built images)
  • docker-compose-cpu.yml - CPU-only version

Development with Docker

# Build and start all services
docker compose -f docker-compose-dev.yml up --build

# Or with Podman
podman compose -f docker-compose-dev.yml up --build

# Run in background
docker compose -f docker-compose-dev.yml up --build -d

# View logs
docker compose -f docker-compose-dev.yml logs -f

# Stop services
docker compose -f docker-compose-dev.yml down

Service Ports

Reset Development Environment

# Complete reset (removes volumes and rebuilds)
docker compose -f docker-compose-dev.yml down -v
docker compose -f docker-compose-dev.yml up --build --force-recreate --remove-orphans

API Documentation

Key Endpoints

Endpoint Method Description
/search POST Search documents with filters
/upload POST Upload documents
/upload_path POST Upload from local path
/tasks GET List processing tasks
/tasks/{id} GET Get task status
/connectors GET List available connectors
/auth/me GET Get current user info
/knowledge-filter POST/GET Manage knowledge filters

Example API Calls

# Search all documents
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "*", "limit": 100}'

# Upload a document
curl -X POST http://localhost:8000/upload \
  -F "file=@document.pdf"

# Get task status
curl http://localhost:8000/tasks/task_id_here

Frontend API Proxy

The Next.js frontend proxies API calls through /api/* to the backend at http://openrag-backend:8000 (in Docker) or http://localhost:8000 (local).

Troubleshooting

Common Issues

Docker/Podman Issues

Issue: docker: command not found

# Install Docker Desktop or use Podman
brew install podman podman-desktop
podman machine init --memory 8192
podman machine start

Issue: Out of memory during build

# For Podman on macOS
podman machine stop
podman machine rm
podman machine init --memory 8192
podman machine start

Backend Issues

Issue: ModuleNotFoundError or dependency issues

# Ensure you're using the right Python version
python --version  # Should be 3.13+
uv sync --reinstall

Issue: OpenSearch connection failed

# Check if OpenSearch is running
curl -k -u admin:admin123 https://localhost:9200
# If using Docker, ensure the container is running
docker ps | grep opensearch

Issue: CUDA/GPU not detected

# Check GPU availability
python -c "import torch; print(torch.cuda.is_available())"
# For CPU-only development, use docker-compose-cpu.yml

Frontend Issues

Issue: Next.js build failures

# Clear cache and reinstall
cd frontend
rm -rf .next node_modules package-lock.json
npm install
npm run dev

Issue: API calls failing

  • Check that backend is running on port 8000
  • Verify environment variables are set correctly
  • Check browser network tab for CORS or proxy issues

Document Processing Issues

Issue: Docling model download failures

# Pre-download models
uv run docling-tools models download
# Or clear cache and retry
rm -rf ~/.cache/docling

Issue: EasyOCR initialization errors

# Clear EasyOCR cache
rm -rf ~/.EasyOCR
# Restart the backend to reinitialize

Development Tips

  1. Hot Reloading:

    • Backend: Use uvicorn src.main:app --reload for auto-restart
    • Frontend: npm run dev provides hot reloading
  2. Debugging:

    • Add print() statements or use pdb.set_trace() in Python
    • Use browser dev tools for frontend debugging
    • Check Docker logs: docker compose logs -f service_name
  3. Database Inspection:

    • Access OpenSearch Dashboards at http://localhost:5601
    • Use curl to query OpenSearch directly
    • Check the documents index for uploaded content
  4. Performance:

    • GPU processing is much faster for document processing
    • Use CPU-only mode if GPU issues occur
    • Monitor memory usage with docker stats or htop

Log Locations

  • Backend: Console output or container logs
  • Frontend: Browser console and Next.js terminal
  • OpenSearch: Container logs (docker compose logs opensearch)
  • Langflow: Container logs (docker compose logs langflow)

Contributing

Code Style

  • Python: Follow PEP 8, use black for formatting
  • TypeScript: Use ESLint configuration in frontend/
  • Commits: Use conventional commit messages

Development Workflow

  1. Create feature branch from main
  2. Make changes and test locally
  3. Run tests (if available)
  4. Create pull request with description
  5. Ensure all checks pass

Testing

# Backend tests (if available)
cd src
uv run pytest

# Frontend tests (if available)
cd frontend
npm test

# Integration tests with Docker
docker compose -f docker-compose-dev.yml up --build
# Test API endpoints manually or with automated tests

Additional Resources


For questions or issues, please check the troubleshooting section above or create an issue in the repository.