* feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit
296 lines
10 KiB
Markdown
296 lines
10 KiB
Markdown
# Apache AGE: Technical Analysis & LightRAG Implementation Decision
|
|
|
|
## Executive Summary
|
|
|
|
Apache AGE (Graph Engine) is a PostgreSQL extension providing graph database capabilities within PostgreSQL. In the LightRAG multi-tenant Docker deployment, AGE support was disabled due to installation complexity in containerized environments, with graceful error handling implemented to prevent startup failures.
|
|
|
|
## What is Apache AGE?
|
|
|
|
### Overview
|
|
|
|
Apache AGE is an extension for PostgreSQL that enables property graph database functionality using the **Cypher query language** (same as Neo4j). It allows PostgreSQL to function as a hybrid relational-graph database.
|
|
|
|
**Official References:**
|
|
- [Apache AGE GitHub Repository](https://github.com/apache/incubator-age)
|
|
- [Apache AGE Documentation](https://age.apache.org/)
|
|
- [Cypher Query Language Spec](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf)
|
|
|
|
### Key Characteristics
|
|
|
|
| Aspect | Details |
|
|
|--------|---------|
|
|
| **Language** | Cypher (borrowed from Neo4j) |
|
|
| **Model** | Property Graph (nodes, edges, labels, properties) |
|
|
| **Query Syntax** | `SELECT * FROM cypher('graph_name', '...cypher_query...')` |
|
|
| **Storage** | Native PostgreSQL tables with AGE schema |
|
|
| **License** | Apache 2.0 |
|
|
| **Maturity** | Active development (incubating project) |
|
|
|
|
### Core Functions
|
|
|
|
```sql
|
|
-- Create graph
|
|
SELECT create_graph('graph_name');
|
|
|
|
-- Execute Cypher queries
|
|
SELECT * FROM cypher('graph_name', $$
|
|
MATCH (n:Label) WHERE n.property = 'value' RETURN n
|
|
$$) AS (node agtype);
|
|
|
|
-- Drop graph
|
|
SELECT drop_graph('graph_name', true);
|
|
```
|
|
|
|
## AGE in LightRAG Context
|
|
|
|
### Usage Pattern
|
|
|
|
LightRAG uses AGE for **graph storage backend** (`PGGraphStorage` class in `/lightrag/kg/postgres_impl.py`):
|
|
|
|
1. **Entity-Relation Graph Storage**: Stores knowledge graph entities (nodes) and relationships (edges)
|
|
2. **Graph Name**: `chunk_entity_relation` - primary graph for semantic relationships
|
|
3. **Node Structure**: Entities with labels (Person, Organization, Location, etc.)
|
|
4. **Edge Types**: Semantic relationships between entities
|
|
5. **Query Operations**:
|
|
- Entity discovery (finding all entities of a type)
|
|
- Relationship traversal (finding connected entities)
|
|
- Pattern matching (complex graph queries)
|
|
|
|
### Integration Points
|
|
|
|
```python
|
|
# From postgres_impl.py line 227
|
|
await connection.execute(f"select create_graph('{graph_name}')")
|
|
|
|
# Entity insertion example
|
|
# Nodes stored as property graph vertices
|
|
# Relations stored as property graph edges
|
|
# Cypher queries enable efficient graph traversals
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
Document Input
|
|
↓
|
|
Entity Extraction (LLM)
|
|
↓
|
|
AGE Graph Storage
|
|
├─ Nodes: Extracted entities
|
|
├─ Edges: Entity relationships
|
|
└─ Labels: Entity types
|
|
↓
|
|
Graph Queries (Cypher)
|
|
↓
|
|
RAG Results (enhanced with graph context)
|
|
```
|
|
|
|
## AGE vs pgVector: Complementary Technologies
|
|
|
|
### Comparison Table
|
|
|
|
| Aspect | pgVector | Apache AGE |
|
|
|--------|----------|-----------|
|
|
| **Purpose** | Vector similarity search | Graph relationships |
|
|
| **Data Structure** | Embeddings (float arrays) | Property graphs (nodes/edges) |
|
|
| **Query Type** | Similarity/semantic search | Pattern matching/traversal |
|
|
| **Algorithm** | HNSW, IVFFlat indices | Graph algorithms |
|
|
| **Use Case** | "Find semantically similar content" | "Find connected entities" |
|
|
| **LightRAG Role** | Vector retrieval & chunking | Knowledge graph structure |
|
|
|
|
### Synergistic Usage in LightRAG
|
|
|
|
```
|
|
LightRAG Hybrid Approach:
|
|
├─ pgVector: "What documents are semantically similar?"
|
|
│ └─ Chunk-level similarity search
|
|
├─ AGE Graph: "How are extracted entities related?"
|
|
│ └─ Entity relationship mapping
|
|
└─ Combined: "Get semantically similar content + its entity context"
|
|
```
|
|
|
|
## Decision: Disabling AGE in Docker Deployment
|
|
|
|
### Problem Analysis
|
|
|
|
**Installation Complexity:**
|
|
- AGE requires compilation from source within PostgreSQL environment
|
|
- Needs PostgreSQL development headers (`postgres.h`)
|
|
- Pre-built `pgvector/pgvector:pg15` image lacks AGE compilation toolchain
|
|
- Building custom image with both pgvector + AGE adds 200MB+ and significant build time
|
|
|
|
**Docker Build Attempts:**
|
|
1. **Attempt 1**: Used `pgvector/pgvector:pg15-bookworm`
|
|
- Error: pgvector extension not found
|
|
|
|
2. **Attempt 2**: Built custom image with AGE compilation
|
|
```dockerfile
|
|
RUN git clone https://github.com/apache/incubator-age.git
|
|
RUN make PG_CONFIG=/usr/lib/postgresql/15/bin/pg_config
|
|
```
|
|
- Error: `postgres.h` header files not available in slim base image
|
|
- Resolution: Requires full PostgreSQL dev package (substantial image bloat)
|
|
|
|
### Solution Implemented
|
|
|
|
**Graceful Degradation Strategy:**
|
|
|
|
```python
|
|
# File: lightrag/kg/postgres_impl.py, line 233
|
|
except (
|
|
asyncpg.exceptions.UndefinedFunctionError, # AGE not available
|
|
asyncpg.exceptions.InvalidSchemaNameError,
|
|
asyncpg.exceptions.UniqueViolationError,
|
|
):
|
|
pass # Silently continue without AGE
|
|
```
|
|
|
|
**Changes Made:**
|
|
1. Added `UndefinedFunctionError` exception handling in `configure_age()` method
|
|
2. Added exception catching in `execute()` method for AGE-specific SQL
|
|
3. System continues startup without graph functionality rather than failing
|
|
|
|
**Why This Approach:**
|
|
- ✅ Minimal image size (no custom PostgreSQL build)
|
|
- ✅ Fast deployment (no AGE compilation)
|
|
- ✅ Graceful degradation (app doesn't crash)
|
|
- ✅ Easy to enable later (reinstall AGE extension, exceptions handled)
|
|
- ✅ Development/demo-friendly
|
|
|
|
## Consequences of AGE Disablement
|
|
|
|
### Functional Impact
|
|
|
|
| Feature | Status | Mitigation |
|
|
|---------|--------|-----------|
|
|
| **Entity relationship queries** | ❌ Unavailable | Use vector similarity + metadata |
|
|
| **Graph traversal** | ❌ Disabled | LLM-based relationship inference |
|
|
| **Pattern matching** | ❌ Not supported | SQL queries on relationship tables |
|
|
| **Knowledge graph visualization** | ⚠️ Degraded | Show only extracted entities, no topology |
|
|
| **Complex relationship analysis** | ❌ Limited | Single-hop queries only |
|
|
|
|
### Performance Implications
|
|
|
|
**Without AGE:**
|
|
- Entity extraction still works (stored in SQL tables)
|
|
- Relationship metadata persisted (as JSONB in document status)
|
|
- Graph visualization shows entities but not relationships
|
|
- Pattern-based queries require application-level logic
|
|
|
|
**With AGE (if re-enabled):**
|
|
- Efficient multi-hop traversals
|
|
- Native Cypher query optimization
|
|
- Complex pattern matching
|
|
- Better knowledge graph visualization
|
|
|
|
### Recovery Path
|
|
|
|
To re-enable AGE in existing deployment:
|
|
|
|
```bash
|
|
# 1. Install AGE extension in running PostgreSQL
|
|
docker exec lightrag-postgres apt-get install -y postgresql-15-dev build-essential
|
|
cd /tmp && git clone https://github.com/apache/incubator-age.git
|
|
cd incubator-age && make && make install
|
|
|
|
# 2. Create extension in database
|
|
docker exec lightrag-postgres psql -U lightrag -d lightrag_multitenant \
|
|
-c "CREATE EXTENSION age;"
|
|
|
|
# 3. Update init-postgres.sql to include:
|
|
CREATE EXTENSION IF NOT EXISTS "age";
|
|
|
|
# 4. Restart API container (exception handling already in place)
|
|
docker restart lightrag-api
|
|
```
|
|
|
|
## Architectural Implications
|
|
|
|
### Current Architecture (AGE Disabled)
|
|
|
|
```
|
|
PostgreSQL
|
|
├─ PGKVStorage: Key-value metadata
|
|
├─ PGVectorStorage: pgVector embeddings ✅ ACTIVE
|
|
├─ PGGraphStorage: Entity relationships (SQL fallback)
|
|
└─ PGDocStatusStorage: Document processing status
|
|
```
|
|
|
|
### Alternative Architectures
|
|
|
|
**Option 1: Neo4j Integration** (graph-focused)
|
|
```
|
|
PostgreSQL Neo4j
|
|
├─ pgvector ├─ Full graph DB
|
|
├─ Metadata └─ Cypher queries
|
|
```
|
|
|
|
**Option 2: Memgraph Integration** (lightweight graph)
|
|
```
|
|
PostgreSQL Memgraph
|
|
├─ pgvector ├─ Memory-optimized
|
|
└─ Metadata └─ Graph queries
|
|
```
|
|
|
|
**Option 3: AGE Re-enabled** (current approach, future)
|
|
```
|
|
PostgreSQL (All-in-one)
|
|
├─ pgvector: embeddings ✅
|
|
├─ AGE: graph DB ⏳
|
|
└─ Metadata: standard tables ✅
|
|
```
|
|
|
|
## Technical References
|
|
|
|
### PostgreSQL Graph Extensions Landscape
|
|
|
|
| Extension | Focus | Maturity | License |
|
|
|-----------|-------|----------|---------|
|
|
| **AGE** | Cypher graphs | Incubating | Apache 2.0 |
|
|
| **PostGIS** | Spatial data | Stable | GPLv2 |
|
|
| **pggraph** | General graphs | Archived | MIT |
|
|
| **GraphQL** | API layer | Stable | Apache 2.0 |
|
|
|
|
### Related Documentation
|
|
|
|
- [PostgreSQL Extension Development](https://www.postgresql.org/docs/15/extend.html)
|
|
- [pgVector Documentation](https://github.com/pgvector/pgvector)
|
|
- [Property Graph Model (ISO/IEC 39075)](https://www.iso.org/standard/76120.html)
|
|
- [OpenCypher Language Reference](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf)
|
|
|
|
## Recommendations
|
|
|
|
### For Development/Testing
|
|
1. **Keep AGE disabled** - faster iteration, smaller images
|
|
2. **Use vector-based retrieval** - sufficient for most use cases
|
|
3. **Add Neo4j as optional sidecar** - if graph analysis needed
|
|
|
|
### For Production Deployment
|
|
1. **Evaluate AGE vs Neo4j** based on:
|
|
- Query complexity requirements
|
|
- Scale (nodes/edges count)
|
|
- Response time constraints
|
|
- Infrastructure overhead tolerance
|
|
|
|
2. **If AGE needed:**
|
|
- Build custom PostgreSQL image with AGE pre-installed
|
|
- Use multi-stage builds to minimize final image size
|
|
- Cache built layers in registry
|
|
|
|
3. **If AGE not needed:**
|
|
- Current architecture is optimal
|
|
- Implement relationship queries in application layer
|
|
- Use pgVector for semantic retrieval exclusively
|
|
|
|
## Summary
|
|
|
|
AGE provides powerful graph query capabilities but introduces deployment complexity in containerized environments. The decision to disable AGE in LightRAG's Docker deployment prioritizes **simplicity and startup speed** while maintaining **graceful error handling** for future re-enablement. The current architecture relies on pgVector for semantic retrieval and PostgreSQL for entity metadata, which covers the majority of RAG use cases without requiring a dedicated graph database.
|
|
|
|
---
|
|
|
|
**Last Updated:** November 20, 2025
|
|
**Status:** Implemented & Tested
|
|
**Related Files:**
|
|
- `lightrag/kg/postgres_impl.py` (exception handling)
|
|
- `starter/docker-compose.yml` (deployment config)
|
|
- `starter/init-postgres.sql` (schema initialization)
|