LightRAG

Author	SHA1	Message	Date
Raphaël MANSUY	7ce259fbb4	cherry-pick `0b5e3f9d`	2025-12-04 19:19:23 +08:00
Raphaël MANSUY	2aeee59fb9	cherry-pick `b7de694f`	2025-12-04 19:19:05 +08:00
Raphaël MANSUY	b064daa2d2	cherry-pick `702cfd29`	2025-12-04 19:19:05 +08:00
Raphaël MANSUY	d2b9a36d92	cherry-pick `69a0b74c`	2025-12-04 19:19:03 +08:00
Raphaël MANSUY	b2c2eb267d	cherry-pick `4b31942e`	2025-12-04 19:19:03 +08:00
Raphaël MANSUY	374ed095bd	cherry-pick `c9e1c6c1`	2025-12-04 19:19:00 +08:00
Raphaël MANSUY	cf0899c063	cherry-pick `9d69e8d7`	2025-12-04 19:19:00 +08:00
Raphaël MANSUY	a7857bcdde	cherry-pick `97034f06`	2025-12-04 19:18:39 +08:00
Raphaël MANSUY	2a451c4e22	cherry-pick `5155edd8`	2025-12-04 19:18:39 +08:00
Raphaël MANSUY	7544e18f09	cherry-pick `bf1897a6`	2025-12-04 19:18:39 +08:00
Raphaël MANSUY	edf9da2daa	cherry-pick `6015e8bc`	2025-12-04 19:18:38 +08:00
Raphaël MANSUY	09bab5f49f	cherry-pick `78ad8873`	2025-12-04 19:18:38 +08:00
Raphaël MANSUY	114b400905	cherry-pick `f24a2616`	2025-12-04 19:18:38 +08:00
Raphaël MANSUY	de3f5f10c2	cherry-pick `8dc23eef`	2025-12-04 19:18:37 +08:00
Raphaël MANSUY	0a6e4616b2	cherry-pick `130b4959`	2025-12-04 19:18:16 +08:00
Raphaël MANSUY	4316172fbf	cherry-pick `12facac5`	2025-12-04 19:18:15 +08:00
Raphaël MANSUY	aa12830be4	cherry-pick `f6d1fb98`	2025-12-04 19:18:15 +08:00
Raphaël MANSUY	fd109cdfcf	cherry-pick `b7c77396`	2025-12-04 19:18:15 +08:00
Raphaël MANSUY	593b277945	cherry-pick `9f44e89d`	2025-12-04 19:18:14 +08:00
Raphaël MANSUY	201084e05a	cherry-pick `cf2a024e`	2025-12-04 19:18:14 +08:00
Raphaël MANSUY	f9f4555b48	cherry-pick `ef659a1e`	2025-12-04 19:15:05 +08:00
Raphaël MANSUY	196033bf75	cherry-pick `87de2b3e`	2025-12-04 19:15:05 +08:00
Raphaël MANSUY	57c1330b54	cherry-pick `3efb1716`	2025-12-04 19:15:05 +08:00
Raphaël MANSUY	a3fb244631	cherry-pick `2b160163`	2025-12-04 19:15:05 +08:00
Raphaël MANSUY	2a247bdda1	cherry-pick `0244699d`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	4501740849	cherry-pick `fa887d81`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	7e53eaabee	cherry-pick `e7d2803a`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	cfc9348de6	cherry-pick `95cd0ece`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	5a9677396b	cherry-pick `4438ba41`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	d3d59b0dca	cherry-pick `186c8f0e`	2025-12-04 19:15:04 +08:00
Raphaël MANSUY	aff704e58a	cherry-pick `c434879c`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	a4d6692e2d	cherry-pick `61b57cbb`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	fce5dc6be6	cherry-pick `c46c1b26`	2025-12-04 19:15:03 +08:00
Raphaël MANSUY	7fa3cab355	cherry-pick `162370b6`	2025-12-04 19:14:29 +08:00
Raphaël MANSUY	84ac688666	cherry-pick `83d99e14`	2025-12-04 19:14:29 +08:00
yangdx	d0e3c8a4a3	Fix duplicate document responses to return original track_id - Return existing track_id for duplicates - Remove track_id generation in reprocess - Update reprocess response documentation - Clarify track_id behavior in comments - Update API response examples (cherry picked from commit `8d28b95966`)	2025-12-04 19:11:24 +08:00
yangdx	21fc61ecd2	Add content deduplication check for document insertion endpoints • Check content hash before insertion • Return duplicated status if exists • Use sanitized text for hash computation • Apply to both single and batch inserts • Prevent duplicate content processing (cherry picked from commit `19c16bc464`)	2025-12-04 19:11:23 +08:00
anouarbm	7ce251c319	docs: Add documentation and examples for include_chunk_content parameter Added comprehensive documentation for the new include_chunk_content parameter that enables retrieval of actual chunk text content in API responses. Documentation Updates: - Added "Include Chunk Content in References" section to API README - Explained use cases: RAG evaluation, debugging, citations, transparency - Provided JSON request/response examples - Clarified parameter interaction with include_references OpenAPI/Swagger Examples: - Added "Response with chunk content" example to /query endpoint - Shows complete reference structure with content field - Demonstrates realistic chunk text content This makes the feature discoverable through: 1. API documentation (README.md) 2. Interactive Swagger UI (http://localhost:9621/docs) 3. Code examples for developers (cherry picked from commit `963ad4c637`)	2025-12-04 19:11:20 +08:00
anouarbm	349c1945db	Optimize RAGAS evaluation with parallel execution and chunk content enrichment Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking. Key Features: - Single API call per evaluation (2x faster than before) - Parallel evaluation based on MAX_ASYNC environment variable - Chunk content enrichment in /query endpoint responses - Comprehensive benchmark statistics (moyennes) - NaN-safe metric calculations API Changes: - Added include_chunk_content parameter to QueryRequest (backward compatible) - /query endpoint enriches references with actual chunk content when requested - No breaking changes - default behavior unchanged Evaluation Improvements: - Parallel execution using asyncio.Semaphore (respects MAX_ASYNC) - Shared HTTP client with connection pooling - Proper timeout handling (3min connect, 5min read) - Debug output for context retrieval verification - Benchmark statistics with averages, min/max scores Results: - Moyenne RAGAS Score: 0.9772 - Perfect Faithfulness: 1.0000 - Perfect Context Recall: 1.0000 - Perfect Context Precision: 1.0000 - Excellent Answer Relevance: 0.9087 (cherry picked from commit `0bbef9814e`)	2025-12-04 19:11:20 +08:00
yangdx	5febb88824	Fix missing workspace parameter in update flags status call (cherry picked from commit `1745b30a5f`)	2025-12-04 19:11:18 +08:00
yangdx	94ae13a037	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking (cherry picked from commit `926960e957`)	2025-12-04 19:11:17 +08:00
yangdx	dfab175c16	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls (cherry picked from commit `52c812b9a0`)	2025-12-04 19:11:16 +08:00
BukeLy	f7b500bca2	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed (cherry picked from commit `eb52ec94d7`)	2025-12-04 19:11:16 +08:00
yangdx	322ff19f72	Remove ascii_colors dependency and fix stream handling errors • Remove ascii_colors.trace_exception calls • Add SafeStreamHandler for closed streams • Patch ascii_colors console handler • Prevent ValueError on stream close • Improve logging error handling (cherry picked from commit `0fb2925c6a`)	2025-12-04 19:11:13 +08:00
yangdx	95d47566c1	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety) (cherry picked from commit `a24d8181c2`)	2025-12-04 19:11:10 +08:00
Raphael MANSUY	fe9b8ec02a	tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency (#4 ) * feat: Implement multi-tenant architecture with tenant and knowledge base models - Added data models for tenants, knowledge bases, and related configurations. - Introduced role and permission management for users in the multi-tenant system. - Created a service layer for managing tenants and knowledge bases, including CRUD operations. - Developed a tenant-aware instance manager for LightRAG with caching and isolation features. - Added a migration script to transition existing workspace-based deployments to the new multi-tenant architecture. * chore: ignore lightrag/api/webui/assets/ directory * chore: stop tracking lightrag/api/webui/assets (ignore in .gitignore) * feat: Initialize LightRAG Multi-Tenant Stack with PostgreSQL - Added README.md for project overview, setup instructions, and architecture details. - Created docker-compose.yml to define services: PostgreSQL, Redis, LightRAG API, and Web UI. - Introduced env.example for environment variable configuration. - Implemented init-postgres.sql for PostgreSQL schema initialization with multi-tenant support. - Added reproduce_issue.py for testing default tenant access via API. * feat: Enhance TenantSelector and update related components for improved multi-tenant support * feat: Enhance testing capabilities and update documentation - Updated Makefile to include new test commands for various modes (compatibility, isolation, multi-tenant, security, coverage, and dry-run). - Modified API health check endpoint in Makefile to reflect new port configuration. - Updated QUICK_START.md and README.md to reflect changes in service URLs and ports. - Added environment variables for testing modes in env.example. - Introduced run_all_tests.sh script to automate testing across different modes. - Created conftest.py for pytest configuration, including database fixtures and mock services. - Implemented database helper functions for streamlined database operations in tests. - Added test collection hooks to skip tests based on the current MULTITENANT_MODE. * feat: Implement multi-tenant support with demo mode enabled by default - Added multi-tenant configuration to the environment and Docker setup. - Created pre-configured demo tenants (acme-corp and techstart) for testing. - Updated API endpoints to support tenant-specific data access. - Enhanced Makefile commands for better service management and database operations. - Introduced user-tenant membership system with role-based access control. - Added comprehensive documentation for multi-tenant setup and usage. - Fixed issues with document visibility in multi-tenant environments. - Implemented necessary database migrations for user memberships and legacy support. * feat(audit): Add final audit report for multi-tenant implementation - Documented overall assessment, architecture overview, test results, security findings, and recommendations. - Included detailed findings on critical security issues and architectural concerns. fix(security): Implement security fixes based on audit findings - Removed global RAG fallback and enforced strict tenant context. - Configured super-admin access and required user authentication for tenant access. - Cleared localStorage on logout and improved error handling in WebUI. chore(logs): Create task logs for audit and security fixes implementation - Documented actions, decisions, and next steps for both audit and security fixes. - Summarized test results and remaining recommendations. chore(scripts): Enhance development stack management scripts - Added scripts for cleaning, starting, and stopping the development stack. - Improved output messages and ensured graceful shutdown of services. feat(starter): Initialize PostgreSQL with AGE extension support - Created initialization scripts for PostgreSQL extensions including uuid-ossp, vector, and AGE. - Ensured successful installation and verification of extensions. * feat: Implement auto-select for first tenant and KB on initial load in WebUI - Removed WEBUI_INITIAL_STATE_FIX.md as the issue is resolved. - Added useTenantInitialization hook to automatically select the first available tenant and KB on app load. - Integrated the new hook into the Root component of the WebUI. - Updated RetrievalTesting component to ensure a KB is selected before allowing user interaction. - Created end-to-end tests for multi-tenant isolation and real service interactions. - Added scripts for starting, stopping, and cleaning the development stack. - Enhanced API and tenant routes to support tenant-specific pipeline status initialization. - Updated constants for backend URL to reflect the correct port. - Improved error handling and logging in various components. * feat: Add multi-tenant support with enhanced E2E testing scripts and client functionality * update client * Add integration and unit tests for multi-tenant API, models, security, and storage - Implement integration tests for tenant and knowledge base management endpoints in `test_tenant_api_routes.py`. - Create unit tests for tenant isolation, model validation, and role permissions in `test_tenant_models.py`. - Add security tests to enforce role-based permissions and context validation in `test_tenant_security.py`. - Develop tests for tenant-aware storage operations and context isolation in `test_tenant_storage_phase3.py`. * feat(e2e): Implement OpenAI model support and database reset functionality * Add comprehensive test suite for gpt-5-nano compatibility - Introduced tests for parameter normalization, embeddings, and entity extraction. - Implemented direct API testing for gpt-5-nano. - Validated .env configuration loading and OpenAI API connectivity. - Analyzed reasoning token overhead with various token limits. - Documented test procedures and expected outcomes in README files. - Ensured all tests pass for production readiness. * kg(postgres_impl): ensure AGE extension is loaded in session and configure graph initialization * dev: add hybrid dev helper scripts, Makefile, docker-compose.dev-db and local development docs * feat(dev): add dev helper scripts and local development documentation for hybrid setup * feat(multi-tenant): add detailed specifications and logs for multi-tenant improvements, including UX, backend handling, and ingestion pipeline * feat(migration): add generated tenant/kb columns, indexes, triggers; drop unused tables; update schema and docs * test(backward-compat): adapt tests to new StorageNameSpace/TenantService APIs (use concrete dummy storages) * chore: multi-tenant and UX updates — docs, webui, storage, tenant service adjustments * tests: stabilize integration tests + skip external services; fix multi-tenant API behavior and idempotency - gpt5_nano_compatibility: add pytest-asyncio markers, skip when OPENAI key missing, prevent module-level asyncio.run collection, add conftest - Ollama tests: add server availability check and skip markers; avoid pytest collection warnings by renaming helper classes - Graph storage tests: rename interactive test functions to avoid pytest collection - Document & Tenant routes: support external_ids for idempotency; ensure HTTPExceptions are re-raised - LightRAG core: support external_ids in apipeline_enqueue_documents and idempotent logic - Tests updated to match API changes (tenant routes & document routes) - Add logs and scripts for inspection and audit	2025-12-04 16:04:21 +08:00
yangdx	df43afc89b	Relax conversation history role validation requirements • Remove strict role value checking • Allow any non-empty string roles	2025-09-29 13:10:15 +08:00
yangdx	7cba458f22	Limit deprecated documents endpoint to 1000 records with fair distribution	2025-09-28 11:18:10 +08:00
yangdx	91be53ffd2	Fix linting	2025-09-27 22:36:38 +08:00
yangdx	e0ac05db90	Simplify query route documentation and clarify conversation history	2025-09-27 22:36:16 +08:00

1 2 3 4 5

237 commits