LightRAG

Author	SHA1	Message	Date
clssck	69358d830d	test(lightrag,examples,api): comprehensive ruff formatting and type hints Format entire codebase with ruff and add type hints across all modules: - Apply ruff formatting to all Python files (121 files, 17K insertions) - Add type hints to function signatures throughout lightrag core and API - Update test suite with improved type annotations and docstrings - Add pyrightconfig.json for static type checking configuration - Create prompt_optimized.py and test_extraction_prompt_ab.py test files - Update ruff.toml and .gitignore for improved linting configuration - Standardize code style across examples, reproduce scripts, and utilities	2025-12-05 15:17:06 +01:00
yangdx	95e1fb1612	Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py	2025-11-17 12:54:33 +08:00
yangdx	d54d0d55d9	Standardize empty workspace handling from "_" to "" across storage * Unify empty workspace behavior by changing workspace from "_" to "" * Fixed incorrect empty workspace detection in get_all_update_flags_status()	2025-11-17 12:54:33 +08:00
yangdx	fd486bc922	Refactor storage classes to use namespace instead of final_namespace	2025-11-17 12:54:33 +08:00
yangdx	926960e957	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking	2025-11-17 12:54:33 +08:00
yangdx	a08bc72635	Fix empty dict handling after JSON sanitization • Replace truthy checks with `is not None` • Handle empty dict edge case properly • Prevent data reload failures • Add comprehensive test coverage • Fix JsonKVStorage and DocStatusStorage	2025-11-17 12:54:32 +08:00
yangdx	f289cf6225	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage	2025-11-17 12:54:32 +08:00
yangdx	dc62c78f98	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage	2025-10-20 15:24:15 +08:00
yangdx	9be22dd666	Preserve ordering in get_by_ids methods across all storage implementations - Fix result ordering in vector stores - Update KV storage get_by_ids methods - Maintain order in doc status storage - Return None for missing IDs	2025-10-11 12:37:59 +08:00
yangdx	2adb8efdc7	Add duplicate document detection and skip processed files in scanning - Add get_doc_by_file_path to all storages - Skip processed files in scan operation - Check duplicates in upload endpoints - Check duplicates in text insert APIs - Return status info in duplicate responses	2025-09-23 17:30:54 +08:00
Albert Gil López	3a64b267cb	Merge upstream/main and resolve conflicts	2025-08-21 16:56:11 +00:00
Albert Gil López	f35963c020	feat: Add clear error messages for uninitialized storage - Add StorageNotInitializedError and PipelineNotInitializedError exceptions - Update JsonDocStatusStorage to raise clear errors when not initialized - Update JsonKVStorage to raise clear errors when not initialized - Error messages now include complete initialization instructions - Helps users understand and fix initialization issues quickly Addresses feedback from issue #1933 about improving error clarity	2025-08-19 06:41:52 +00:00
yangdx	61469c0a56	Add Chinese pinyin sorting support across document operations • Replace pyuca with centralized utils function • Add pinyin sort keys for file paths • Update MongoDB indexes with zh collation • Migrate existing indexes for compatibility • Support Chinese chars in Redis/JSON storage • Keep PostgreSQL sorting order controled by Database Collate order	2025-08-17 12:45:48 +08:00
yangdx	095e0cbfa2	Refac: Add workspace infomation to all logger output for all storage type	2025-08-12 01:19:09 +08:00
yangdx	e00690b41b	Fix: workspace isolation problem for json KV storage - Use workspace+namespace as final_namespace identifier - Update all related storage operations - Maintain backward compatibility	2025-08-02 11:30:19 +08:00
yangdx	0eac1a883a	Feat: add file path sorting for document manager - Add file_path sorting support to all database backends (JSON, Redis, PostgreSQL, MongoDB) - Implement smart column header switching between "ID" and "File Name" based on display mode - Add automatic sort field switching when toggling between ID and file name display - Create composite indexes for workspace+file_path in PostgreSQL and MongoDB for better query performance - Update frontend to maintain sort state when switching display modes - Add internationalization support for "fileName" in English and Chinese locales This enhancement improves user experience by providing intuitive file-based sorting while maintaining performance through optimized database indexes.	2025-07-30 18:46:55 +08:00
yangdx	74eecc46e5	feat(pagination): Implement document list pagination backends and frontend UI - Add pagination support to BaseDocStatusStorage interface and all implementations (PostgreSQL, MongoDB, Redis, JSON) - Implement RESTful API endpoints for paginated document queries and status counts - Create reusable pagination UI components with internationalization support - Optimize performance with database-level pagination and efficient in-memory processing - Maintain backward compatibility while adding configurable page sizes (10-200 items)	2025-07-30 17:58:32 +08:00
yangdx	75de799353	Remove deprecated content field from doc status storage - Remove content field from JSON storage - Remove content field from MongoDB storage - Remove content field from Redis storage	2025-07-30 01:00:06 +08:00
yangdx	93afa7d8a7	feat: add processing time tracking to document status with metadata field - Add metadata field to DocProcessingStatus with start_time and end_time tracking - Record processing timestamps using Unix time format (seconds precision) - Update all storage backends (JSON, MongoDB, Redis, PostgreSQL) for new field support - Maintain backward compatibility with default values for existing data - Add error_msg field for better error tracking during document processing	2025-07-29 23:42:33 +08:00
yangdx	6014b9bf73	feat: add track_id support for document processing progress monitoring - Add get_docs_by_track_id() method to all storage backends (MongoDB, PostgreSQL, Redis, JSON) - Implement automatic track_id generation with upload_/insert_ prefixes - Add /track_status/{track_id} API endpoint for frontend progress queries - Create database indexes for efficient track_id lookups - Enable real-time document processing status tracking across all storage types	2025-07-29 22:24:21 +08:00
yangdx	92bbb7a1b3	Remove content fallback and standardize doc status handling - Remove content_summary fallback logic - Standardize doc status processing - Handle missing file_path consistently	2025-07-29 16:13:51 +08:00
yangdx	033098c1bc	Feat: Add WORKSPACE support to all storage types	2025-07-07 00:57:21 +08:00
yangdx	e56734cb8b	Refac: Optimize document deletion performance - Adding chunks_list to dock_status - Adding llm_cache_list to text_chunks - Implemented storage types: JsonKV and Redis	2025-07-03 04:18:25 +08:00
yangdx	bef7206192	Optimize logger info	2025-04-28 02:27:59 +08:00
yangdx	ad087073aa	Optimize logger for storage	2025-04-10 01:07:06 +08:00
yangdx	ff5c7182da	Fix update status handling bugs in drop function of json kv storage	2025-04-01 13:53:02 +08:00
yangdx	95a8ee27ed	Fix linting	2025-03-31 23:22:27 +08:00
yangdx	3d4f8f67c9	Add drop_cace_by_modes to all KV storage implementation	2025-03-31 23:10:21 +08:00
yangdx	1772e7a887	Add delete support to all storage implementation	2025-03-31 16:21:20 +08:00
yangdx	2cb64ad280	feat: Remove immediate persistence in delete operation for JsonDocStatusStorage	2025-03-31 14:46:36 +08:00
yangdx	1df4b777d7	Add drop funtions to storage implementations	2025-03-30 15:17:57 +08:00
yangdx	20de4ded30	Add default file_path for missing document paths - Set file_path to "no-file-path" if missing - Ensure consistent document data structure	2025-03-18 20:06:18 +08:00
yangdx	46610682ce	Fix data persistence issue in single-process mode In single-process mode, data updates and persistence were not working properly because the update flags were not being correctly handled between different objects.	2025-03-10 15:41:00 +08:00
yangdx	4065a7df92	Fix linting	2025-03-10 02:07:19 +08:00
yangdx	14e1b31d1c	Improved logging clarity in storage operations	2025-03-10 02:05:55 +08:00
yangdx	6b0acce644	Avoid redundant llm cache updates	2025-03-10 01:45:58 +08:00
yangdx	d2708b966d	Added update flag to avoid persistence if no data is changed for KV storage	2025-03-10 01:17:25 +08:00
yangdx	4977c718f1	Improve KV storage initialize logic	2025-03-10 00:12:35 +08:00
yangdx	c938989920	Fix llm cache save problem in json_kv storage	2025-03-09 23:33:03 +08:00
yangdx	e47883d872	Add atomic data initialization lock to prevent race conditions	2025-03-09 17:33:15 +08:00
yangdx	c854aabde0	Add process ID to log messages for better multi-process debugging clarity - Add PID to KV and Neo4j storage logs - Add PID to query context logs - Improve KV data count logging for llm cache	2025-03-09 15:25:10 +08:00
yangdx	90527875fd	Fix async issues in namespace init	2025-03-09 15:22:06 +08:00
yangdx	fd76e00c6a	Refactor storage initialization to separate object creation from data loading • Split __post_init__ and initialize() • Move data loading to initialize() • Add FastAPI lifespan integration	2025-03-01 03:48:19 +08:00
yangdx	b3328542c7	refactor: migrate synchronous locks to async locks for improved concurrency • Add UnifiedLock wrapper class • Convert with blocks to async with	2025-03-01 02:22:35 +08:00
yangdx	cd7648791a	Fix linting	2025-02-28 01:25:59 +08:00
yangdx	05cf029bcc	fix: convert multiprocessing managed dict to normal dict before JSON dump	2025-02-27 20:16:53 +08:00
yangdx	7d12715f09	Refactor shared storage to safely handle multi-process initialization and data sharing • Add namespace initialization check • Use atomic operations for shared data	2025-02-26 18:11:02 +08:00
yangdx	2c019dbc7b	Refactor storage initialization to avoid redundant intitial data loads across processes, show init logs to first load only	2025-02-26 12:28:49 +08:00
yangdx	2752a764ae	Refactor storage implementations to support both single and multi-process modes • Add shared storage management module • Support process/thread lock based on mode	2025-02-26 05:38:38 +08:00
yangdx	a642bb3190	refactor: use shared manager from main process for storage implementations.	2025-02-25 12:08:49 +08:00

1 2

64 commits