LightRAG

Author	SHA1	Message	Date
Raphaël MANSUY	1a167fb7f7	cherry-pick `cca0800e`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	107b32aa8d	cherry-pick `95e1fb16`	2025-12-04 19:14:31 +08:00
yangdx	cb5451faf8	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage (cherry picked from commit `dc62c78f98`)	2025-12-04 19:11:19 +08:00
yangdx	94ae13a037	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking (cherry picked from commit `926960e957`)	2025-12-04 19:11:17 +08:00
yangdx	961c87a6e5	Standardize empty workspace handling from "_" to "" across storage * Unify empty workspace behavior by changing workspace from "_" to "" * Fixed incorrect empty workspace detection in get_all_update_flags_status() (cherry picked from commit `d54d0d55d9`)	2025-12-04 19:09:05 +08:00
yangdx	ed79218550	Optimize JSON write with fast/slow path to reduce memory usage - Fast path for clean data (no sanitization) - Slow path sanitizes during encoding - Reload shared memory after sanitization - Custom encoder avoids deep copies - Comprehensive test coverage (cherry picked from commit `777c987371`)	2025-12-04 19:09:04 +08:00
Raphael MANSUY	fe9b8ec02a	tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency (#4 ) * feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit	2025-12-04 16:04:21 +08:00
Albert Gil López	3a64b267cb	Merge upstream/main and resolve conflicts	2025-08-21 16:56:11 +00:00
Albert Gil López	f35963c020	feat: Add clear error messages for uninitialized storage - Add StorageNotInitializedError and PipelineNotInitializedError exceptions - Update JsonDocStatusStorage to raise clear errors when not initialized - Update JsonKVStorage to raise clear errors when not initialized - Error messages now include complete initialization instructions - Helps users understand and fix initialization issues quickly Addresses feedback from issue #1933 about improving error clarity	2025-08-19 06:41:52 +00:00
yangdx	095e0cbfa2	Refac: Add workspace infomation to all logger output for all storage type	2025-08-12 01:19:09 +08:00
yangdx	cc1f7118e7	Remove deprecated cache_by_modes functionality from all storage	2025-08-05 23:20:26 +08:00
yangdx	e00690b41b	Fix: workspace isolation problem for json KV storage - Use workspace+namespace as final_namespace identifier - Update all related storage operations - Maintain backward compatibility	2025-08-02 11:30:19 +08:00
yangdx	033098c1bc	Feat: Add WORKSPACE support to all storage types	2025-07-07 00:57:21 +08:00
yangdx	6c2ae40d7d	Refac: Enhance KG rebuild stability by incorporating `create_time` into the LLM cache	2025-07-03 17:08:29 +08:00
yangdx	e56734cb8b	Refac: Optimize document deletion performance - Adding chunks_list to dock_status - Adding llm_cache_list to text_chunks - Implemented storage types: JsonKV and Redis	2025-07-03 04:18:25 +08:00
yangdx	271722405f	feat: Flatten LLM cache structure for improved recall efficiency Refactored the LLM cache to a flat Key-Value (KV) structure, replacing the previous nested format. The old structure used the 'mode' as a key and stored specific cache content as JSON nested under it. This change significantly enhances cache recall efficiency.	2025-07-02 16:11:53 +08:00
zrguo	4937de8809	Update	2025-06-22 15:12:09 +08:00
zrguo	ead82a8dbd	update delete_by_doc_id	2025-06-09 18:52:34 +08:00
yangdx	40a2357c14	Persistence cache data to disk before exiting	2025-04-28 23:16:50 +08:00
yangdx	1f18b99df0	Optimize logger info	2025-04-28 02:46:11 +08:00
yangdx	ad087073aa	Optimize logger for storage	2025-04-10 01:07:06 +08:00
yangdx	ff5c7182da	Fix update status handling bugs in drop function of json kv storage	2025-04-01 13:53:02 +08:00
yangdx	95a8ee27ed	Fix linting	2025-03-31 23:22:27 +08:00
yangdx	3d4f8f67c9	Add drop_cace_by_modes to all KV storage implementation	2025-03-31 23:10:21 +08:00
yangdx	1772e7a887	Add delete support to all storage implementation	2025-03-31 16:21:20 +08:00
yangdx	81f887ebab	feat: Remove immediate persistence in delete operation - Enhance delete implementation in JsonKVStorage by removing immediate persistence in delete operation - Update documentation for drop method to clarify persistence behavior - Add abstract delete method to BaseKVStorage	2025-03-31 14:14:32 +08:00
yangdx	1df4b777d7	Add drop funtions to storage implementations	2025-03-30 15:17:57 +08:00
yangdx	46610682ce	Fix data persistence issue in single-process mode In single-process mode, data updates and persistence were not working properly because the update flags were not being correctly handled between different objects.	2025-03-10 15:41:00 +08:00
yangdx	4065a7df92	Fix linting	2025-03-10 02:07:19 +08:00
yangdx	14e1b31d1c	Improved logging clarity in storage operations	2025-03-10 02:05:55 +08:00
yangdx	6b0acce644	Avoid redundant llm cache updates	2025-03-10 01:45:58 +08:00
yangdx	d2708b966d	Added update flag to avoid persistence if no data is changed for KV storage	2025-03-10 01:17:25 +08:00
yangdx	4977c718f1	Improve KV storage initialize logic	2025-03-10 00:12:35 +08:00
yangdx	c938989920	Fix llm cache save problem in json_kv storage	2025-03-09 23:33:03 +08:00
yangdx	e47883d872	Add atomic data initialization lock to prevent race conditions	2025-03-09 17:33:15 +08:00
yangdx	c854aabde0	Add process ID to log messages for better multi-process debugging clarity - Add PID to KV and Neo4j storage logs - Add PID to query context logs - Improve KV data count logging for llm cache	2025-03-09 15:25:10 +08:00
yangdx	90527875fd	Fix async issues in namespace init	2025-03-09 15:22:06 +08:00
zrguo	fd9f71e0ee	fix delete_by_doc_id	2025-03-04 13:22:33 +08:00
yangdx	fd76e00c6a	Refactor storage initialization to separate object creation from data loading • Split __post_init__ and initialize() • Move data loading to initialize() • Add FastAPI lifespan integration	2025-03-01 03:48:19 +08:00
yangdx	b3328542c7	refactor: migrate synchronous locks to async locks for improved concurrency • Add UnifiedLock wrapper class • Convert with blocks to async with	2025-03-01 02:22:35 +08:00
yangdx	cd7648791a	Fix linting	2025-02-28 01:25:59 +08:00
yangdx	05cf029bcc	fix: convert multiprocessing managed dict to normal dict before JSON dump	2025-02-27 20:16:53 +08:00
yangdx	64f22966a3	Fix linting	2025-02-27 19:05:51 +08:00
yangdx	f007ebf006	Refactor initialization logic for vector, KV and graph storage implementations • Add try_initialize_namespace check • Move init code out of storage locks • Reduce redundant init conditions • Simplify initialization flow • Make init thread-safer	2025-02-27 14:55:07 +08:00
yangdx	7436c06f6c	Fix linting	2025-02-26 18:11:16 +08:00
yangdx	2c019dbc7b	Refactor storage initialization to avoid redundant intitial data loads across processes, show init logs to first load only	2025-02-26 12:28:49 +08:00
yangdx	2752a764ae	Refactor storage implementations to support both single and multi-process modes • Add shared storage management module • Support process/thread lock based on mode	2025-02-26 05:38:38 +08:00
yangdx	a642bb3190	refactor: use shared manager from main process for storage implementations.	2025-02-25 12:08:49 +08:00
yangdx	087d5770b0	feat(storage): Add shared memory support for file-based storage implementations This commit adds multiprocessing shared memory support to file-based storage implementations: - JsonDocStatusStorage - JsonKVStorage - NanoVectorDBStorage - NetworkXStorage Each storage module now uses module-level global variables with multiprocessing.Manager() to ensure data consistency across multiple uvicorn workers. All processes will see updates immediately when data is modified through ainsert function.	2025-02-25 11:10:13 +08:00
Yannick Stephan	9277fe8c29	fixed return	2025-02-19 22:22:41 +01:00

1 2

74 commits