LightRAG

Author	SHA1	Message	Date
Raphaël MANSUY	d2b9a36d92	cherry-pick `69a0b74c`	2025-12-04 19:19:03 +08:00
Raphaël MANSUY	b2c2eb267d	cherry-pick `4b31942e`	2025-12-04 19:19:03 +08:00
Raphaël MANSUY	09bab5f49f	cherry-pick `78ad8873`	2025-12-04 19:18:38 +08:00
Raphaël MANSUY	de3f5f10c2	cherry-pick `8dc23eef`	2025-12-04 19:18:37 +08:00
Raphaël MANSUY	0a6e4616b2	cherry-pick `130b4959`	2025-12-04 19:18:16 +08:00
Raphaël MANSUY	201084e05a	cherry-pick `cf2a024e`	2025-12-04 19:18:14 +08:00
Raphaël MANSUY	f9f4555b48	cherry-pick `ef659a1e`	2025-12-04 19:15:05 +08:00
Raphaël MANSUY	196033bf75	cherry-pick `87de2b3e`	2025-12-04 19:15:05 +08:00
Raphaël MANSUY	57c1330b54	cherry-pick `3efb1716`	2025-12-04 19:15:05 +08:00
Raphaël MANSUY	a3fb244631	cherry-pick `2b160163`	2025-12-04 19:15:05 +08:00
Raphaël MANSUY	2a247bdda1	cherry-pick `0244699d`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	4501740849	cherry-pick `fa887d81`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	7e53eaabee	cherry-pick `e7d2803a`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	cfc9348de6	cherry-pick `95cd0ece`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	5a9677396b	cherry-pick `4438ba41`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	d3d59b0dca	cherry-pick `186c8f0e`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	aff704e58a	cherry-pick `c434879c`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	a4d6692e2d	cherry-pick `61b57cbb`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	fce5dc6be6	cherry-pick `c46c1b26`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	7fa3cab355	cherry-pick `162370b6`	2025-12-04 19:14:29 +08:00
yangdx	d0e3c8a4a3	Fix duplicate document responses to return original track_id - Return existing track_id for duplicates - Remove track_id generation in reprocess - Update reprocess response documentation - Clarify track_id behavior in comments - Update API response examples (cherry picked from commit `8d28b95966`)	2025-12-04 19:11:24 +08:00
yangdx	21fc61ecd2	Add content deduplication check for document insertion endpoints • Check content hash before insertion • Return duplicated status if exists • Use sanitized text for hash computation • Apply to both single and batch inserts • Prevent duplicate content processing (cherry picked from commit `19c16bc464`)	2025-12-04 19:11:23 +08:00
yangdx	5febb88824	Fix missing workspace parameter in update flags status call (cherry picked from commit `1745b30a5f`)	2025-12-04 19:11:18 +08:00
yangdx	94ae13a037	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking (cherry picked from commit `926960e957`)	2025-12-04 19:11:17 +08:00
yangdx	dfab175c16	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls (cherry picked from commit `52c812b9a0`)	2025-12-04 19:11:16 +08:00
BukeLy	f7b500bca2	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed (cherry picked from commit `eb52ec94d7`)	2025-12-04 19:11:16 +08:00
yangdx	95d47566c1	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety) (cherry picked from commit `a24d8181c2`)	2025-12-04 19:11:10 +08:00
Raphael MANSUY	fe9b8ec02a	tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency (#4 ) * feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit	2025-12-04 16:04:21 +08:00
yangdx	7cba458f22	Limit deprecated documents endpoint to 1000 records with fair distribution	2025-09-28 11:18:10 +08:00
yangdx	2adb8efdc7	Add duplicate document detection and skip processed files in scanning - Add get_doc_by_file_path to all storages - Skip processed files in scan operation - Check duplicates in upload endpoints - Check duplicates in text insert APIs - Return status info in duplicate responses	2025-09-23 17:30:54 +08:00
yangdx	8f6287e27e	Add path traversal security validation for file deletion operations • Add validate_file_path_security function • Prevent path traversal attacks • Validate file paths before deletion • Check both input and enqueued dirs • Log security violations	2025-09-17 01:12:44 +08:00
yangdx	17d665c9f3	Limit history messages to latest 1000 entries with truncation indicator • Limit history to 1000 latest messages • Add truncation message when needed • Show count of truncated messages • Update API documentation • Prevent memory issues with large logs	2025-09-05 12:31:36 +08:00
yangdx	9b7ed84e05	Improve document deletion error handling and message consistency - Standardize deletion log messages - Add try-catch for file operations - Improve enqueued file error handling	2025-08-20 11:01:24 +08:00
yangdx	2603e99005	Enhance file deletion to remove files from both input and enqueued dirs	2025-08-19 17:13:58 +08:00
yangdx	9ed5b93467	Add [File Extraction] prefix to error messages and logs	2025-08-19 11:33:28 +08:00
yangdx	377f1a022e	fix: reset PROCESSING/FAILED docs to PENDING at the beginging of document processing pipeline - Reset documents with PROCESSING/FAILED status to PENDING when they pass consistency checks - Update doc_status storage and clear error messages/metadata on reset	2025-08-18 00:49:52 +08:00
yangdx	add8b07a21	Improve logging messages for document processing clarity	2025-08-18 00:22:04 +08:00
yangdx	14e083a1a6	fix: replace pyuca with pypinyin for Chinese pinyin sorting and add file_path sort	2025-08-17 15:21:24 +08:00
yangdx	61469c0a56	Add Chinese pinyin sorting support across document operations • Replace pyuca with centralized utils function • Add pinyin sort keys for file paths • Update MongoDB indexes with zh collation • Migrate existing indexes for compatibility • Support Chinese chars in Redis/JSON storage • Keep PostgreSQL sorting order controled by Database Collate order	2025-08-17 12:45:48 +08:00
yangdx	cceb46b320	fix: subdirectories are no longer processed during file scans • Change rglob to glob for file scanning • Simplify error logging messages	2025-08-16 23:46:33 +08:00
yangdx	f5b0c3d38c	feat: Recording file extraction error status to document pipeline - Add apipeline_enqueue_error_documents function to LightRAG class for recording file processing errors in doc_status storage - Enhance pipeline_enqueue_file with detailed error handling for all file processing stages: * File access errors (permissions, not found) * UTF-8 encoding errors * Format-specific processing errors (PDF, DOCX, PPTX, XLSX) * Content validation errors * Unsupported file type errors This implementation ensures all file extraction failures are properly tracked and recorded in the doc_status storage system, providing better visibility into document processing issues and enabling improved error monitoring and debugging capabilities.	2025-08-16 23:08:52 +08:00
yangdx	5d00c4c7a8	feat: move processed files to __enqueued__ directory after processing with filename conflicts handling	2025-08-16 13:19:20 +08:00
yangdx	3bba5fc506	Fix linting	2025-08-14 13:03:23 +08:00
yangdx	772f981e7e	fix: check and process queued docs even when upload directory is empty	2025-08-14 12:35:39 +08:00
yangdx	fd0ae4646f	Fixes crash when processing files with UTF-8 encoding error - Fix TypeError "cannot unpack non-iterable bool object" in document processing - Change all error returns from `False` to `(False, "")` for consistency - Ensure pipeline_enqueue_file always returns tuple (bool, str) - Add missing return statement for no-content-extracted case - Improve error handling for UTF-8 encoding issues and unsupported file types	2025-08-14 05:31:38 +08:00
yangdx	c22315ea6d	refactor: remove selective LLM cache clearing functionality - Remove optional 'modes' parameter from aclear_cache() and clear_cache() methods - Replace deprecated drop_cache_by_modes() with drop() method for complete cache clearing - Update API endpoint to ignore mode-specific parameters and clear all cache - Simplify frontend clearCache() function to send empty request body This change ensures all LLM cache is cleared together.	2025-08-05 23:51:51 +08:00
yangdx	e04d8ed8a7	Improved storage drop logging with namespace details - Added namespace and workspace to drop logs	2025-08-04 00:56:39 +08:00
yangdx	7505195303	fix: add full_entities and full_relations to clear_documents storage list	2025-08-03 23:02:58 +08:00
yangdx	0eac1a883a	Feat: add file path sorting for document manager - Add file_path sorting support to all database backends (JSON, Redis, PostgreSQL, MongoDB) - Implement smart column header switching between "ID" and "File Name" based on display mode - Add automatic sort field switching when toggling between ID and file name display - Create composite indexes for workspace+file_path in PostgreSQL and MongoDB for better query performance - Update frontend to maintain sort state when switching display modes - Add internationalization support for "fileName" in English and Chinese locales This enhancement improves user experience by providing intuitive file-based sorting while maintaining performance through optimized database indexes.	2025-07-30 18:46:55 +08:00
yangdx	74eecc46e5	feat(pagination): Implement document list pagination backends and frontend UI - Add pagination support to BaseDocStatusStorage interface and all implementations (PostgreSQL, MongoDB, Redis, JSON) - Implement RESTful API endpoints for paginated document queries and status counts - Create reusable pagination UI components with internationalization support - Optimize performance with database-level pagination and efficient in-memory processing - Maintain backward compatibility while adding configurable page sizes (10-200 items)	2025-07-30 17:58:32 +08:00

1 2 3

144 commits