# Apache AGE: Technical Analysis & LightRAG Implementation Decision ## Executive Summary Apache AGE (Graph Engine) is a PostgreSQL extension providing graph database capabilities within PostgreSQL. In the LightRAG multi-tenant Docker deployment, AGE support was disabled due to installation complexity in containerized environments, with graceful error handling implemented to prevent startup failures. ## What is Apache AGE? ### Overview Apache AGE is an extension for PostgreSQL that enables property graph database functionality using the **Cypher query language** (same as Neo4j). It allows PostgreSQL to function as a hybrid relational-graph database. **Official References:** - [Apache AGE GitHub Repository](https://github.com/apache/incubator-age) - [Apache AGE Documentation](https://age.apache.org/) - [Cypher Query Language Spec](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf) ### Key Characteristics | Aspect | Details | |--------|---------| | **Language** | Cypher (borrowed from Neo4j) | | **Model** | Property Graph (nodes, edges, labels, properties) | | **Query Syntax** | `SELECT * FROM cypher('graph_name', '...cypher_query...')` | | **Storage** | Native PostgreSQL tables with AGE schema | | **License** | Apache 2.0 | | **Maturity** | Active development (incubating project) | ### Core Functions ```sql -- Create graph SELECT create_graph('graph_name'); -- Execute Cypher queries SELECT * FROM cypher('graph_name', $$ MATCH (n:Label) WHERE n.property = 'value' RETURN n $$) AS (node agtype); -- Drop graph SELECT drop_graph('graph_name', true); ``` ## AGE in LightRAG Context ### Usage Pattern LightRAG uses AGE for **graph storage backend** (`PGGraphStorage` class in `/lightrag/kg/postgres_impl.py`): 1. **Entity-Relation Graph Storage**: Stores knowledge graph entities (nodes) and relationships (edges) 2. **Graph Name**: `chunk_entity_relation` - primary graph for semantic relationships 3. **Node Structure**: Entities with labels (Person, Organization, Location, etc.) 4. **Edge Types**: Semantic relationships between entities 5. **Query Operations**: - Entity discovery (finding all entities of a type) - Relationship traversal (finding connected entities) - Pattern matching (complex graph queries) ### Integration Points ```python # From postgres_impl.py line 227 await connection.execute(f"select create_graph('{graph_name}')") # Entity insertion example # Nodes stored as property graph vertices # Relations stored as property graph edges # Cypher queries enable efficient graph traversals ``` ### Data Flow ``` Document Input ↓ Entity Extraction (LLM) ↓ AGE Graph Storage ├─ Nodes: Extracted entities ├─ Edges: Entity relationships └─ Labels: Entity types ↓ Graph Queries (Cypher) ↓ RAG Results (enhanced with graph context) ``` ## AGE vs pgVector: Complementary Technologies ### Comparison Table | Aspect | pgVector | Apache AGE | |--------|----------|-----------| | **Purpose** | Vector similarity search | Graph relationships | | **Data Structure** | Embeddings (float arrays) | Property graphs (nodes/edges) | | **Query Type** | Similarity/semantic search | Pattern matching/traversal | | **Algorithm** | HNSW, IVFFlat indices | Graph algorithms | | **Use Case** | "Find semantically similar content" | "Find connected entities" | | **LightRAG Role** | Vector retrieval & chunking | Knowledge graph structure | ### Synergistic Usage in LightRAG ``` LightRAG Hybrid Approach: ├─ pgVector: "What documents are semantically similar?" │ └─ Chunk-level similarity search ├─ AGE Graph: "How are extracted entities related?" │ └─ Entity relationship mapping └─ Combined: "Get semantically similar content + its entity context" ``` ## Decision: Disabling AGE in Docker Deployment ### Problem Analysis **Installation Complexity:** - AGE requires compilation from source within PostgreSQL environment - Needs PostgreSQL development headers (`postgres.h`) - Pre-built `pgvector/pgvector:pg15` image lacks AGE compilation toolchain - Building custom image with both pgvector + AGE adds 200MB+ and significant build time **Docker Build Attempts:** 1. **Attempt 1**: Used `pgvector/pgvector:pg15-bookworm` - Error: pgvector extension not found 2. **Attempt 2**: Built custom image with AGE compilation ```dockerfile RUN git clone https://github.com/apache/incubator-age.git RUN make PG_CONFIG=/usr/lib/postgresql/15/bin/pg_config ``` - Error: `postgres.h` header files not available in slim base image - Resolution: Requires full PostgreSQL dev package (substantial image bloat) ### Solution Implemented **Graceful Degradation Strategy:** ```python # File: lightrag/kg/postgres_impl.py, line 233 except ( asyncpg.exceptions.UndefinedFunctionError, # AGE not available asyncpg.exceptions.InvalidSchemaNameError, asyncpg.exceptions.UniqueViolationError, ): pass # Silently continue without AGE ``` **Changes Made:** 1. Added `UndefinedFunctionError` exception handling in `configure_age()` method 2. Added exception catching in `execute()` method for AGE-specific SQL 3. System continues startup without graph functionality rather than failing **Why This Approach:** - ✅ Minimal image size (no custom PostgreSQL build) - ✅ Fast deployment (no AGE compilation) - ✅ Graceful degradation (app doesn't crash) - ✅ Easy to enable later (reinstall AGE extension, exceptions handled) - ✅ Development/demo-friendly ## Consequences of AGE Disablement ### Functional Impact | Feature | Status | Mitigation | |---------|--------|-----------| | **Entity relationship queries** | ❌ Unavailable | Use vector similarity + metadata | | **Graph traversal** | ❌ Disabled | LLM-based relationship inference | | **Pattern matching** | ❌ Not supported | SQL queries on relationship tables | | **Knowledge graph visualization** | ⚠️ Degraded | Show only extracted entities, no topology | | **Complex relationship analysis** | ❌ Limited | Single-hop queries only | ### Performance Implications **Without AGE:** - Entity extraction still works (stored in SQL tables) - Relationship metadata persisted (as JSONB in document status) - Graph visualization shows entities but not relationships - Pattern-based queries require application-level logic **With AGE (if re-enabled):** - Efficient multi-hop traversals - Native Cypher query optimization - Complex pattern matching - Better knowledge graph visualization ### Recovery Path To re-enable AGE in existing deployment: ```bash # 1. Install AGE extension in running PostgreSQL docker exec lightrag-postgres apt-get install -y postgresql-15-dev build-essential cd /tmp && git clone https://github.com/apache/incubator-age.git cd incubator-age && make && make install # 2. Create extension in database docker exec lightrag-postgres psql -U lightrag -d lightrag_multitenant \ -c "CREATE EXTENSION age;" # 3. Update init-postgres.sql to include: CREATE EXTENSION IF NOT EXISTS "age"; # 4. Restart API container (exception handling already in place) docker restart lightrag-api ``` ## Architectural Implications ### Current Architecture (AGE Disabled) ``` PostgreSQL ├─ PGKVStorage: Key-value metadata ├─ PGVectorStorage: pgVector embeddings ✅ ACTIVE ├─ PGGraphStorage: Entity relationships (SQL fallback) └─ PGDocStatusStorage: Document processing status ``` ### Alternative Architectures **Option 1: Neo4j Integration** (graph-focused) ``` PostgreSQL Neo4j ├─ pgvector ├─ Full graph DB ├─ Metadata └─ Cypher queries ``` **Option 2: Memgraph Integration** (lightweight graph) ``` PostgreSQL Memgraph ├─ pgvector ├─ Memory-optimized └─ Metadata └─ Graph queries ``` **Option 3: AGE Re-enabled** (current approach, future) ``` PostgreSQL (All-in-one) ├─ pgvector: embeddings ✅ ├─ AGE: graph DB ⏳ └─ Metadata: standard tables ✅ ``` ## Technical References ### PostgreSQL Graph Extensions Landscape | Extension | Focus | Maturity | License | |-----------|-------|----------|---------| | **AGE** | Cypher graphs | Incubating | Apache 2.0 | | **PostGIS** | Spatial data | Stable | GPLv2 | | **pggraph** | General graphs | Archived | MIT | | **GraphQL** | API layer | Stable | Apache 2.0 | ### Related Documentation - [PostgreSQL Extension Development](https://www.postgresql.org/docs/15/extend.html) - [pgVector Documentation](https://github.com/pgvector/pgvector) - [Property Graph Model (ISO/IEC 39075)](https://www.iso.org/standard/76120.html) - [OpenCypher Language Reference](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf) ## Recommendations ### For Development/Testing 1. **Keep AGE disabled** - faster iteration, smaller images 2. **Use vector-based retrieval** - sufficient for most use cases 3. **Add Neo4j as optional sidecar** - if graph analysis needed ### For Production Deployment 1. **Evaluate AGE vs Neo4j** based on: - Query complexity requirements - Scale (nodes/edges count) - Response time constraints - Infrastructure overhead tolerance 2. **If AGE needed:** - Build custom PostgreSQL image with AGE pre-installed - Use multi-stage builds to minimize final image size - Cache built layers in registry 3. **If AGE not needed:** - Current architecture is optimal - Implement relationship queries in application layer - Use pgVector for semantic retrieval exclusively ## Summary AGE provides powerful graph query capabilities but introduces deployment complexity in containerized environments. The decision to disable AGE in LightRAG's Docker deployment prioritizes **simplicity and startup speed** while maintaining **graceful error handling** for future re-enablement. The current architecture relies on pgVector for semantic retrieval and PostgreSQL for entity metadata, which covers the majority of RAG use cases without requiring a dedicated graph database. --- **Last Updated:** November 20, 2025 **Status:** Implemented & Tested **Related Files:** - `lightrag/kg/postgres_impl.py` (exception handling) - `starter/docker-compose.yml` (deployment config) - `starter/init-postgres.sql` (schema initialization)