Commit graph

289 commits

Author SHA1 Message Date
yangdx
d6019c82af Add CASCADE to AGE extension creation in PostgreSQL implementation
- Add CASCADE option to CREATE EXTENSION
- Ensure dependencies are installed
- Fix potential AGE setup issues
2025-12-02 00:17:41 +08:00
yangdx
dbae327a17 Merge branch 'main' into dev-postgres-vchordrq 2025-11-18 22:13:27 +08:00
yangdx
3096f844fb fix(postgres): allow vchordrq.epsilon config when probes is empty
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.

This fixes the documented epsilon setting being impossible to use in the
default configuration.
2025-11-18 21:58:36 +08:00
wmsnp
d07023c962
feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic 2025-11-18 11:45:16 +08:00
yangdx
926960e957 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking
2025-11-17 12:54:33 +08:00
yangdx
3276b7a49d Fix linting 2025-11-06 20:48:51 +08:00
yangdx
155f59759b Fix node ID normalization and improve batch operation consistency
• Remove premature ID normalization
• Add lookup mapping for node resolution
• Filter results by requested nodes only
• Improve error logging with workspace
2025-11-06 20:34:53 +08:00
yangdx
807d2461d3 Remove unused chunk-based node/edge retrieval methods 2025-11-06 18:17:10 +08:00
yangdx
a97e5dad4c Optimize PostgreSQL graph queries to avoid Cypher overhead and complexity
• Replace Cypher with native SQL queries
• Fix O(N²) to O(E) performance issue
• Add error handling for parse failures
• Use direct table access pattern
• Eliminate Cartesian product joins
2025-10-25 14:37:18 +08:00
Daniel.y
907204714b
Merge pull request #2237 from yrangana/feat/optimize-postgres-initialization
Optimize PostgreSQL initialization performance
2025-10-21 22:17:46 +08:00
Yasiru Rangana
2f22336ace Optimize PostgreSQL initialization performance
- Batch index existence checks into single query (16+ queries -> 1 query)
- Batch timestamp column checks into single query (8 queries -> 1 query)
- Batch field length checks into single query (5 queries -> 1 query)

Performance improvement: ~70-80% faster initialization (35s -> 5-10s)

Key optimizations:
1. check_tables(): Use ANY($1) to check all indexes at once
2. _migrate_timestamp_columns(): Batch all column type checks
3. _migrate_field_lengths(): Batch all field definition checks

All changes are backward compatible with no schema or API changes.
Reduces database round-trips by batching information_schema queries.
2025-10-21 01:09:48 +11:00
yangdx
dc62c78f98 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
2025-10-20 15:24:15 +08:00
yangdx
813f4af9d7 Fix linting 2025-10-18 11:44:48 +08:00
Lucky Verma
917e41aa78 Refactor SQL queries and improve input handling in PGKVStorage and PGDocStatusStorage 2025-10-17 15:40:32 -05:00
yangdx
9be22dd666 Preserve ordering in get_by_ids methods across all storage implementations
- Fix result ordering in vector stores
- Update KV storage get_by_ids methods
- Maintain order in doc status storage
- Return None for missing IDs
2025-10-11 12:37:59 +08:00
yangdx
b3ed264707 Refactor PostgreSQL retry config to use centralized configuration
• Move retry config to ClientManager
• Remove env var parsing from PostgreSQLDB
• Add config params to test setup
2025-10-10 03:44:13 +08:00
yangdx
e758204ab2 Add PostgreSQL connection retry mechanism with comprehensive error handling
• Implement connection retry with backoff
• Add transient error detection
• Pool management with timeout guards
2025-10-10 03:06:01 +08:00
yangdx
f2c0b41e78 Make PostgreSQL statement_cache_size configuration optional
• Remove forced int conversion
• Allow None values for cache size
• Add conditional parameter setting
2025-10-07 22:57:21 +08:00
kevinnkansah
fdcb034da0 chore: distinguish settings 2025-10-06 12:01:40 +02:00
kevinnkansah
22a7b482c5 fix: renamed PostGreSQL options env variable and allowed LRU cache to be an optional env variable 2025-10-06 11:56:09 +02:00
kevinnkansah
d8a9617c0e fix: fix: asyncpg bouncer connection pool error
Prepared statement caching is disabled by setting
`statement_cache_size=0` in the `asyncpg` connection pool parameters.
This is necessary to prevent
`asyncpg.exceptions.InvalidSQLStatementNameError` when using
transaction-level connection poolers like Supabase Supavisor or
pgbouncer, which do not support prepared statements.
2025-10-06 00:36:25 +02:00
kevinnkansah
108cdbe133 feat: add options for PostGres connection 2025-10-05 23:29:04 +02:00
yangdx
457d51952e Add doc_name field to full docs storage
- Store file_path in full_docs storage
- Update PostgreSQL implementation by map file_path to doc_name
- Other storage implementation automatically handles the new field
2025-10-05 11:44:27 +08:00
yangdx
2adb8efdc7 Add duplicate document detection and skip processed files in scanning
- Add get_doc_by_file_path to all storages
- Skip processed files in scan operation
- Check duplicates in upload endpoints
- Check duplicates in text insert APIs
- Return status info in duplicate responses
2025-09-23 17:30:54 +08:00
yangdx
6b3a341977 Increase default PostgreSQL max connections from 20 to 50 2025-09-22 18:11:28 +08:00
yangdx
3296bcb553 Add high-performance label search methods to PostgreSQL graph storage
- Add get_popular_labels() method
- Add search_labels() with fuzzy matching
- Use native SQL for better performance
- Include proper scoring and ranking
2025-09-20 12:39:53 +08:00
Matt23-star
24cb11f3f5 style: ruff-format 2025-08-29 21:09:14 -07:00
Hao Feng
b860ffe510
Merge branch 'main' into main 2025-08-29 21:03:37 -07:00
yangdx
03d0fa3014 perf: add optional query_embedding parameter to avoid redundant embedding calls 2025-08-29 18:15:45 +08:00
yangdx
a923d378dd Remove deprecated ID-based filtering from vector storage queries
- Remove ids param from QueryParam
- Simplify BaseVectorStorage.query signature
- Update all vector storage implementations
- Streamline PostgreSQL query templates
- Remove ID filtering from operate.py calls
2025-08-29 17:06:48 +08:00
Matt23-star
aa1ef3f053 feat: optimize database query methods for improved performance and readability 2025-08-28 16:18:15 -07:00
Matt23-star
9804a1885b feat: refactor parameter handling in database queries to use lists for improved consistency 2025-08-28 16:17:35 -07:00
Matt23-star
015e9ae3dd Merge branch 'main' into feature/optimization
# Conflicts:
#	lightrag/kg/postgres_impl.py
2025-08-20 16:05:38 +08:00
Matt23-star
874ddda605 feat: remove unused parameter from query methods across multiple implementations 2025-08-20 15:59:05 +08:00
Matt23-star
60564cf453 fix: correct parameter usage in database query for improved reliability 2025-08-17 13:50:41 +08:00
yangdx
185b576101 Fix parameter reference and apply code formatting improvements 2025-08-17 04:02:43 +08:00
Matt23-star
a0593ec1c9 feat: enhance query performance by restructuring relationships, entities, and chunks retrieval in PostgreSQL.
Fixed: duplicate items query
2025-08-16 22:49:54 +08:00
Matt23-star
6a7e3092ea feat: optimize node and edge queries in PostgreSQL. query tables Directly 2025-08-16 22:37:48 +08:00
Matt23-star
a7da48e05c feat: add batch size parameter to node and edge retrieval methods 2025-08-16 22:35:22 +08:00
yangdx
7a7385a200 Add efficient vector retrieval by IDs to PGVectorStorage 2025-08-15 16:51:41 +08:00
yangdx
0b22ffb252 Refac: uniformly protected with the get_data_init_lock for all storage initializations 2025-08-14 03:46:19 +08:00
yangdx
fc8ca1a706 Fix: add muti-process lock for initialize and drop method for all storage 2025-08-12 04:25:09 +08:00
yangdx
ca00b9c8ee Fix: Resolve workspace isolation problem for PostgreSQL with multiple LightRAG instances 2025-08-12 01:27:05 +08:00
yangdx
16c9a81f4c feat: support config.ini for PostgreSQL vector index settings
- Add support for reading vector_index_type, hnsw_m, hnsw_ef, and ivfflat_lists from config.ini
- Maintain backward compatibility with environment variables
- Update config.ini.example with new PostgreSQL vector index options
- Follow existing configuration priority: env vars > config.ini > defaults
2025-08-08 02:55:49 +08:00
yangdx
f38e10559e Update PostgreSQL vector index configuration
- Remove FLAT index support
- Standardize on HNSW as default
- Add dimension validation
- Improve error logging
- Clean up index creation code
2025-08-08 02:21:06 +08:00
Matt23-star
727ca43d3c feat: add vector index creation functionality for PostgreSQL 2025-08-07 23:07:18 +08:00
yangdx
cc1f7118e7 Remove deprecated cache_by_modes functionality from all storage 2025-08-05 23:20:26 +08:00
yangdx
8294d6d1b7 Remove deprecated mode field from LLM cache schema
- Drop mode column from LLM cache table
- Update primary key to exclude mode
- Remove mode from all SQL queries
- Deprecate mode-related methods
- Update schema migration logic
2025-08-05 23:18:54 +08:00
yangdx
0463963520 fix: include all query parameters in LLM cache hash key generation
- Add missing query parameters (top_k, enable_rerank, max_tokens, etc.) to cache key generation in kg_query, naive_query, and extract_keywords_only functions
- Add queryparam field to CacheData structure and PostgreSQL storage for debugging
- Update PostgreSQL schema with automatic migration for queryparam JSONB column
- Prevent incorrect cache hits between queries with different parameters

Fixes issue where different query parameters incorrectly shared the same cached results.
2025-08-05 18:03:10 +08:00
yangdx
7b3a9c09ca Fix: add missing colume to LLM cache of PostgreSQL implementation 2025-08-04 11:12:59 +08:00