Revamp README with detailed setup and usage guide

Expanded the README to include comprehensive instructions for quick start, development workflow, environment configuration, Docker deployment, and troubleshooting. Added structured sections, command examples, and environment variable documentation to improve onboarding and developer experience.
This commit is contained in:
Edwin Jose 2025-09-10 11:59:07 -04:00
parent 552af2dd81
commit 676bda205f

280
README.md
View file

@ -1,72 +1,246 @@
## OpenRAG
# OpenRAG
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/phact/openrag)
### getting started
OpenRAG is a comprehensive Retrieval-Augmented Generation platform that enables intelligent document search and AI-powered conversations. Users can upload, process, and query documents through a chat interface backed by large language models and semantic search capabilities. The system utilizes Langflow for document ingestion, retrieval workflows, and intelligent nudges, providing a seamless RAG experience. Built with Starlette, Next.js, OpenSearch, and Langflow integration.
Set up your secrets:
## 📋 Table of Contents
cp .env.example .env
- [Quick Start](#quick-start)
- [Development](#development)
- [Configuration](#configuration)
- [Docker Deployment](#docker-deployment)
- [Troubleshooting](#troubleshooting)
Populate the values in .env
## 🚀 Quick Start
Requirements:
### Prerequisites
Docker or podman with compose installed.
- Docker or Podman with Compose installed
- Make (for development commands)
Run OpenRAG:
docker compose build
docker compose up
CPU only:
docker compose -f docker-compose-cpu.yml up
If you need to reset state:
docker compose up --build --force-recreate --remove-orphans
### Configuration
OpenRAG uses environment variables for configuration. Copy `.env.example` to `.env` and populate with your values:
### 1. Environment Setup
```bash
cp .env.example .env
# Clone and setup environment
git clone <repository-url>
cd openrag
make setup # Creates .env and installs dependencies
```
#### Key Environment Variables
### 2. Configure Environment
**Required:**
- `OPENAI_API_KEY`: Your OpenAI API key
- `OPENSEARCH_PASSWORD`: Password for OpenSearch admin user
- `LANGFLOW_SUPERUSER`: Langflow admin username
- `LANGFLOW_SUPERUSER_PASSWORD`: Langflow admin password
- `LANGFLOW_CHAT_FLOW_ID`: ID of your Langflow chat flow
- `LANGFLOW_INGEST_FLOW_ID`: ID of your Langflow ingestion flow
- `NUDGES_FLOW_ID`: ID of your Langflow nudges/suggestions flow
Edit `.env` with your API keys and credentials:
**Ingestion Configuration:**
- `DISABLE_INGEST_WITH_LANGFLOW`: Disable Langflow ingestion pipeline (default: `false`)
- `false` or unset: Uses Langflow pipeline (upload → ingest → delete)
- `true`: Uses traditional OpenRAG processor for document ingestion
```bash
# Required
OPENAI_API_KEY=your_openai_api_key
OPENSEARCH_PASSWORD=your_secure_password
LANGFLOW_SUPERUSER=admin
LANGFLOW_SUPERUSER_PASSWORD=your_secure_password
LANGFLOW_CHAT_FLOW_ID=your_chat_flow_id
LANGFLOW_INGEST_FLOW_ID=your_ingest_flow_id
NUDGES_FLOW_ID=your_nudges_flow_id
```
**Optional:**
- `LANGFLOW_PUBLIC_URL`: Public URL for Langflow (default: `http://localhost:7860`)
- `GOOGLE_OAUTH_CLIENT_ID` / `GOOGLE_OAUTH_CLIENT_SECRET`: For Google OAuth authentication
- `MICROSOFT_GRAPH_OAUTH_CLIENT_ID` / `MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET`: For Microsoft OAuth
- `WEBHOOK_BASE_URL`: Base URL for webhook endpoints
- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`: For AWS integrations
- `SESSION_SECRET`: Secret key for session management (default: auto-generated, change in production)
- `LANGFLOW_KEY`: Explicit Langflow API key (auto-generated if not provided)
- `LANGFLOW_SECRET_KEY`: Secret key for Langflow internal operations
### 3. Start OpenRAG
See `.env.example` for a complete list with descriptions, or check the docker-compose.yml files.
```bash
# Full stack with GPU support
make dev
For podman on mac you may have to increase your VM memory (`podman stats` should not show limit at only 2gb):
# Or CPU only
make dev-cpu
```
podman machine stop
podman machine rm
podman machine init --memory 8192 # example: 8 GB
podman machine start
Access the services:
- **Frontend**: http://localhost:3000
- **Backend API**: http://localhost:8000
- **Langflow**: http://localhost:7860
- **OpenSearch**: http://localhost:9200
- **OpenSearch Dashboards**: http://localhost:5601
## 🛠️ Development
### Development Commands
All development tasks are managed through the Makefile. Run `make help` to see all available commands.
#### Environment Management
```bash
# Setup development environment
make setup # Initial setup: creates .env, installs dependencies
# Start development environments
make dev # Full stack with GPU support
make dev-cpu # Full stack with CPU only
make dev-local # Infrastructure only (for local development)
make infra # Alias for dev-local
# Container management
make stop # Stop all containers
make restart # Restart all containers
make clean # Stop and remove containers/volumes
make status # Show container status
make health # Check service health
```
#### Local Development
For faster development iteration, run infrastructure in Docker and backend/frontend locally:
```bash
# Terminal 1: Start infrastructure
make dev-local
# Terminal 2: Run backend locally
make backend
# Terminal 3: Run frontend locally
make frontend
```
#### Dependency Management
```bash
make install # Install all dependencies
make install-be # Install backend dependencies (uv)
make install-fe # Install frontend dependencies (npm)
```
#### Building and Testing
```bash
# Build Docker images
make build # Build all images
make build-be # Build backend image only
make build-fe # Build frontend image only
# Testing and quality
make test # Run backend tests
make lint # Run linting checks
```
#### Debugging
```bash
# View logs
make logs # All container logs
make logs-be # Backend logs only
make logs-fe # Frontend logs only
make logs-lf # Langflow logs only
make logs-os # OpenSearch logs only
#### Database Operations
```bash
# Reset OpenSearch indices
make db-reset # Delete and recreate indices
```
## ⚙️ Configuration
### Environment Variables
OpenRAG uses environment variables for configuration. All variables should be set in your `.env` file.
#### Required Variables
| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | Your OpenAI API key |
| `OPENSEARCH_PASSWORD` | Password for OpenSearch admin user |
| `LANGFLOW_SUPERUSER` | Langflow admin username |
| `LANGFLOW_SUPERUSER_PASSWORD` | Langflow admin password |
| `LANGFLOW_CHAT_FLOW_ID` | ID of your Langflow chat flow |
| `LANGFLOW_INGEST_FLOW_ID` | ID of your Langflow ingestion flow |
| `NUDGES_FLOW_ID` | ID of your Langflow nudges/suggestions flow |
#### Ingestion Configuration
| Variable | Description |
|----------|-------------|
| `DISABLE_INGEST_WITH_LANGFLOW` | Disable Langflow ingestion pipeline (default: `false`) |
- `false` or unset: Uses Langflow pipeline (upload → ingest → delete)
- `true`: Uses traditional OpenRAG processor for document ingestion
#### Optional Variables
| Variable | Description |
|----------|-------------|
| `LANGFLOW_PUBLIC_URL` | Public URL for Langflow (default: `http://localhost:7860`) |
| `GOOGLE_OAUTH_CLIENT_ID` / `GOOGLE_OAUTH_CLIENT_SECRET` | Google OAuth authentication |
| `MICROSOFT_GRAPH_OAUTH_CLIENT_ID` / `MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET` | Microsoft OAuth |
| `WEBHOOK_BASE_URL` | Base URL for webhook endpoints |
| `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | AWS integrations |
| `SESSION_SECRET` | Session management (default: auto-generated, change in production) |
| `LANGFLOW_KEY` | Explicit Langflow API key (auto-generated if not provided) |
| `LANGFLOW_SECRET_KEY` | Secret key for Langflow internal operations |
## 🐳 Docker Deployment
### Standard Deployment
```bash
# Build and start all services
docker compose build
docker compose up -d
```
### CPU-Only Deployment
For environments without GPU support:
```bash
docker compose -f docker-compose-cpu.yml up -d
```
### Force Rebuild
If you need to reset state or rebuild everything:
```bash
docker compose up --build --force-recreate --remove-orphans
```
### Service URLs
After deployment, services are available at:
- **Frontend**: http://localhost:3000
- **Backend API**: http://localhost:8000
- **Langflow**: http://localhost:7860
- **OpenSearch**: http://localhost:9200
- **OpenSearch Dashboards**: http://localhost:5601
## 🔧 Troubleshooting
### Podman on macOS
If using Podman on macOS, you may need to increase VM memory:
```bash
podman machine stop
podman machine rm
podman machine init --memory 8192 # 8 GB example
podman machine start
```
### Common Issues
1. **OpenSearch fails to start**: Check that `OPENSEARCH_PASSWORD` is set and meets requirements
2. **Langflow connection issues**: Verify `LANGFLOW_SUPERUSER` credentials are correct
3. **Out of memory errors**: Increase Docker memory allocation or use CPU-only mode
4. **Port conflicts**: Ensure ports 3000, 7860, 8000, 9200, 5601 are available
### Development Tips
- Use `make infra` + `make backend` + `make frontend` for faster development iteration
- Run `make help` to see all available commands
- Check `.env.example` for complete environment variable documentation