Commit graph

293 commits

Author SHA1 Message Date
clssck
8d099fc3ac chore: sync with upstream HKUDS/LightRAG
- Add KaTeX extensions (mhchem for chemistry, copy-tex for copying)
- Add CASCADE to AGE extension for PostgreSQL
- Remove future dependency, replace passlib with bcrypt
- Fix Jina embedding configuration and provider defaults
- Update gunicorn help text and bump API version to 0258
- Documentation and README updates
2025-12-01 21:30:19 +01:00
clssck
663ada943a chore: add citation system and enhance RAG UI components
Add citation tracking and display system across backend and frontend components.
Backend changes include citation.py for document attribution, enhanced query routes
with citation metadata, improved prompt templates, and PostgreSQL schema updates.
Frontend includes CitationMarker component, HoverCard UI, QuerySettings refinements,
and ChatMessage enhancements for displaying document sources. Update dependencies
and docker-compose test configuration for improved development workflow.
2025-12-01 17:50:00 +01:00
clssck
43af31f888 feat: add db_degree visibility and orphan connection UI
Graph Connectivity Awareness:
- Add db_degree property to all KG implementations (NetworkX, Postgres, Neo4j, Mongo, Memgraph)
- Show database degree vs visual degree in node panel with amber badge
- Add visual indicator (amber border) for nodes with hidden connections
- Add "Load X hidden connection(s)" button to expand hidden neighbors
- Add configurable "Expand Depth" setting (1-5) in graph settings
- Use global maxNodes setting for node expansion consistency

Orphan Connection UI:
- Add OrphanConnectionDialog component for manual orphan entity connection
- Add OrphanConnectionControl button in graph sidebar
- Expose /graph/orphans/connect API endpoint for frontend use

Backend Improvements:
- Add get_orphan_entities() and connect_orphan_entities() to base storage
- Add orphan connection configuration parameters
- Improve entity extraction with relationship density requirements

Frontend:
- Add graphExpandDepth and graphIncludeOrphans to settings store
- Add min_degree and include_orphans graph filtering parameters
- Update translations (en.json, zh.json)
2025-11-29 21:08:07 +01:00
clssck
d2c9e6e2ec test(lightrag): add orphan connection feature with quality validation tests
Implement automatic orphan entity connection system that identifies entities with
no relationships and creates meaningful connections via vector similarity + LLM
validation. This improves knowledge graph connectivity and retrieval quality.
Changes:
- Add orphan connection configuration parameters (thresholds, cross-connect settings)
- Implement aconnect_orphan_entities() method with 4-step validation pipeline
- Add SQL templates for efficient orphan and candidate entity queries
- Create POST /graph/orphans/connect API endpoint with configurable parameters
- Add orphan connection validation prompt for LLM-based relationship verification
- Include relationship density requirement in extraction prompts to prevent orphans
- Update docker-compose.test.yml with optimized extraction parameters
- Add quality validation test suite (run_quality_tests.py) for retrieval evaluation
- Add unit test framework (test_orphan_connection_quality.py) with test cases
- Enable auto-run of orphan connection after document processing
2025-11-28 18:23:30 +01:00
clssck
48c7732edc feat: add automatic entity resolution with 3-layer matching
Implement automatic entity resolution to prevent duplicate nodes in the
knowledge graph. The system uses a 3-layer approach:

1. Case-insensitive exact matching (free, instant)
2. Fuzzy string matching >85% threshold (free, instant)
3. Vector similarity + LLM verification (for acronyms/synonyms)

Key features:
- Pre-resolution phase prevents race conditions in parallel processing
- Numeric suffix detection blocks false matches (IL-4 ≠ IL-13)
- PostgreSQL alias cache for fast lookups on subsequent ingestion
- Configurable thresholds via environment variables

Bug fixes included:
- Fix fuzzy matching false positives for numbered entities
- Fix alias cache not being populated (missing db parameter)
- Skip entity_aliases table from generic id index creation

New files:
- lightrag/entity_resolution/ - Core resolution module
- tests/test_entity_resolution/ - Unit tests
- docker/postgres-age-vector/ - Custom PG image with pgvector + AGE
- docker-compose.test.yml - Integration test environment

Configuration (env.example):
- ENTITY_RESOLUTION_ENABLED=true
- ENTITY_RESOLUTION_FUZZY_THRESHOLD=0.85
- ENTITY_RESOLUTION_VECTOR_THRESHOLD=0.5
- ENTITY_RESOLUTION_MAX_CANDIDATES=3
2025-11-27 15:35:02 +01:00
yangdx
dbae327a17 Merge branch 'main' into dev-postgres-vchordrq 2025-11-18 22:13:27 +08:00
yangdx
3096f844fb fix(postgres): allow vchordrq.epsilon config when probes is empty
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.

This fixes the documented epsilon setting being impossible to use in the
default configuration.
2025-11-18 21:58:36 +08:00
wmsnp
d07023c962
feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic 2025-11-18 11:45:16 +08:00
yangdx
926960e957 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking
2025-11-17 12:54:33 +08:00
yangdx
3276b7a49d Fix linting 2025-11-06 20:48:51 +08:00
yangdx
155f59759b Fix node ID normalization and improve batch operation consistency
• Remove premature ID normalization
• Add lookup mapping for node resolution
• Filter results by requested nodes only
• Improve error logging with workspace
2025-11-06 20:34:53 +08:00
yangdx
807d2461d3 Remove unused chunk-based node/edge retrieval methods 2025-11-06 18:17:10 +08:00
yangdx
a97e5dad4c Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity
• Replace Cypher with native SQL queries
• Fix O(N²) to O(E) performance issue
• Add error handling for parse failures
• Use direct table access pattern
• Eliminate Cartesian product joins
2025-10-25 14:37:18 +08:00
Daniel.y
907204714b
Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization
Optimize PostgreSQL initialization performance
2025-10-21 22:17:46 +08:00
Yasiru Rangana
2f22336ace Optimize PostgreSQL initialization performance
- Batch index existence checks into single query (16+ queries -> 1 query)
- Batch timestamp column checks into single query (8 queries -> 1 query)
- Batch field length checks into single query (5 queries -> 1 query)

Performance improvement: ~70-80% faster initialization (35s -> 5-10s)

Key optimizations:
1. check_tables(): Use ANY($1) to check all indexes at once
2. _migrate_timestamp_columns(): Batch all column type checks
3. _migrate_field_lengths(): Batch all field definition checks

All changes are backward compatible with no schema or API changes.
Reduces database round-trips by batching information_schema queries.
2025-10-21 01:09:48 +11:00
yangdx
dc62c78f98 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
2025-10-20 15:24:15 +08:00
yangdx
813f4af9d7 Fix linting 2025-10-18 11:44:48 +08:00
Lucky Verma
917e41aa78 Refactor SQL queries and improve input handling in PGKVStorage and PGDocStatusStorage 2025-10-17 15:40:32 -05:00
yangdx
9be22dd666 Preserve ordering in get_by_ids methods across all storage implementations
- Fix result ordering in vector stores
- Update KV storage get_by_ids methods
- Maintain order in doc status storage
- Return None for missing IDs
2025-10-11 12:37:59 +08:00
yangdx
b3ed264707 Refactor PostgreSQL retry config to use centralized configuration
• Move retry config to ClientManager
• Remove env var parsing from PostgreSQLDB
• Add config params to test setup
2025-10-10 03:44:13 +08:00
yangdx
e758204ab2 Add PostgreSQL connection retry mechanism with comprehensive error handling
• Implement connection retry with backoff
• Add transient error detection
• Pool management with timeout guards
2025-10-10 03:06:01 +08:00
yangdx
f2c0b41e78 Make PostgreSQL statement_cache_size configuration optional
• Remove forced int conversion
• Allow None values for cache size
• Add conditional parameter setting
2025-10-07 22:57:21 +08:00
kevinnkansah
fdcb034da0 chore: distinguish settings 2025-10-06 12:01:40 +02:00
kevinnkansah
22a7b482c5 fix: renamed PostGreSQL options env variable and allowed LRU cache to be an optional env variable 2025-10-06 11:56:09 +02:00
kevinnkansah
d8a9617c0e fix: fix: asyncpg bouncer connection pool error
Prepared statement caching is disabled by setting
`statement_cache_size=0` in the `asyncpg` connection pool parameters.
This is necessary to prevent
`asyncpg.exceptions.InvalidSQLStatementNameError` when using
transaction-level connection poolers like Supabase Supavisor or
pgbouncer, which do not support prepared statements.
2025-10-06 00:36:25 +02:00
kevinnkansah
108cdbe133 feat: add options for PostGres connection 2025-10-05 23:29:04 +02:00
yangdx
457d51952e Add doc_name field to full docs storage
- Store file_path in full_docs storage
- Update PostgreSQL implementation by map file_path to doc_name
- Other storage implementation automatically handles the new field
2025-10-05 11:44:27 +08:00
yangdx
2adb8efdc7 Add duplicate document detection and skip processed files in scanning
- Add get_doc_by_file_path to all storages
- Skip processed files in scan operation
- Check duplicates in upload endpoints
- Check duplicates in text insert APIs
- Return status info in duplicate responses
2025-09-23 17:30:54 +08:00
yangdx
6b3a341977 Increase default PostgreSQL max connections from 20 to 50 2025-09-22 18:11:28 +08:00
yangdx
3296bcb553 Add high-performance label search methods to PostgreSQL graph storage
- Add get_popular_labels() method
- Add search_labels() with fuzzy matching
- Use native SQL for better performance
- Include proper scoring and ranking
2025-09-20 12:39:53 +08:00
Matt23-star
24cb11f3f5 style: ruff-format 2025-08-29 21:09:14 -07:00
Hao Feng
b860ffe510
Merge branch 'main' into main 2025-08-29 21:03:37 -07:00
yangdx
03d0fa3014 perf: add optional query_embedding parameter to avoid redundant embedding calls 2025-08-29 18:15:45 +08:00
yangdx
a923d378dd Remove deprecated ID-based filtering from vector storage queries
- Remove ids param from QueryParam
- Simplify BaseVectorStorage.query signature
- Update all vector storage implementations
- Streamline PostgreSQL query templates
- Remove ID filtering from operate.py calls
2025-08-29 17:06:48 +08:00
Matt23-star
aa1ef3f053 feat: optimize database query methods for improved performance and readability 2025-08-28 16:18:15 -07:00
Matt23-star
9804a1885b feat: refactor parameter handling in database queries to use lists for improved consistency 2025-08-28 16:17:35 -07:00
Matt23-star
015e9ae3dd Merge branch 'main' into feature/optimization
# Conflicts:
#	lightrag/kg/postgres_impl.py
2025-08-20 16:05:38 +08:00
Matt23-star
874ddda605 feat: remove unused parameter from query methods across multiple implementations 2025-08-20 15:59:05 +08:00
Matt23-star
60564cf453 fix: correct parameter usage in database query for improved reliability 2025-08-17 13:50:41 +08:00
yangdx
185b576101 Fix parameter reference and apply code formatting improvements 2025-08-17 04:02:43 +08:00
Matt23-star
a0593ec1c9 feat: enhance query performance by restructuring relationships, entities, and chunks retrieval in PostgreSQL.
Fixed: duplicate items query
2025-08-16 22:49:54 +08:00
Matt23-star
6a7e3092ea feat: optimize node and edge queries in PostgreSQL. query tables Directly 2025-08-16 22:37:48 +08:00
Matt23-star
a7da48e05c feat: add batch size parameter to node and edge retrieval methods 2025-08-16 22:35:22 +08:00
yangdx
7a7385a200 Add efficient vector retrieval by IDs to PGVectorStorage 2025-08-15 16:51:41 +08:00
yangdx
0b22ffb252 Refac: uniformly protected with the get_data_init_lock for all storage initializations 2025-08-14 03:46:19 +08:00
yangdx
fc8ca1a706 Fix: add muti-process lock for initialize and drop method for all storage 2025-08-12 04:25:09 +08:00
yangdx
ca00b9c8ee Fix: Resolve workspace isolation problem for PostgreSQL with multiple LightRAG instances 2025-08-12 01:27:05 +08:00
yangdx
16c9a81f4c feat: support config.ini for PostgreSQL vector index settings
- Add support for reading vector_index_type, hnsw_m, hnsw_ef, and ivfflat_lists from config.ini
- Maintain backward compatibility with environment variables
- Update config.ini.example with new PostgreSQL vector index options
- Follow existing configuration priority: env vars > config.ini > defaults
2025-08-08 02:55:49 +08:00
yangdx
f38e10559e Update PostgreSQL vector index configuration
- Remove FLAT index support
- Standardize on HNSW as default
- Add dimension validation
- Improve error logging
- Clean up index creation code
2025-08-08 02:21:06 +08:00
Matt23-star
727ca43d3c feat: add vector index creation functionality for PostgreSQL 2025-08-07 23:07:18 +08:00