LightRAG/docker/postgres-age-vector/Dockerfile
clssck 48c7732edc feat: add automatic entity resolution with 3-layer matching
Implement automatic entity resolution to prevent duplicate nodes in the
knowledge graph. The system uses a 3-layer approach:

1. Case-insensitive exact matching (free, instant)
2. Fuzzy string matching >85% threshold (free, instant)
3. Vector similarity + LLM verification (for acronyms/synonyms)

Key features:
- Pre-resolution phase prevents race conditions in parallel processing
- Numeric suffix detection blocks false matches (IL-4 ≠ IL-13)
- PostgreSQL alias cache for fast lookups on subsequent ingestion
- Configurable thresholds via environment variables

Bug fixes included:
- Fix fuzzy matching false positives for numbered entities
- Fix alias cache not being populated (missing db parameter)
- Skip entity_aliases table from generic id index creation

New files:
- lightrag/entity_resolution/ - Core resolution module
- tests/test_entity_resolution/ - Unit tests
- docker/postgres-age-vector/ - Custom PG image with pgvector + AGE
- docker-compose.test.yml - Integration test environment

Configuration (env.example):
- ENTITY_RESOLUTION_ENABLED=true
- ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85
- ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5
- ENTITY_RESOLUTION_MAX_CANDIDATES=3
2025-11-27 15:35:02 +01:00

26 lines
880 B
Docker

# Start from pgvector image (has vector extension pre-built correctly)
FROM pgvector/pgvector:pg17
# Install build dependencies for AGE
RUN apt-get update && apt-get install -y \
build-essential \
git \
postgresql-server-dev-17 \
libreadline-dev \
zlib1g-dev \
flex \
bison \
&& rm -rf /var/lib/apt/lists/*
# Install Apache AGE 1.6.0 for PG17
RUN cd /tmp \
&& git clone --branch release/PG17/1.6.0 https://github.com/apache/age.git \
&& cd age \
&& make \
&& make install \
&& rm -rf /tmp/age
# Add initialization script to create extensions
RUN echo "CREATE EXTENSION IF NOT EXISTS vector;" > /docker-entrypoint-initdb.d/01-vector.sql \
&& echo "CREATE EXTENSION IF NOT EXISTS age;" > /docker-entrypoint-initdb.d/02-age.sql \
&& echo "SET search_path = ag_catalog, public;" >> /docker-entrypoint-initdb.d/02-age.sql