LightRAG

Author	SHA1	Message	Date
yangdx	f6a45245bd	Add pipeline status validation before document deletion (cherry picked from commit `9d7b7981ce`)	2025-12-04 19:11:17 +08:00
yangdx	94ae13a037	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking (cherry picked from commit `926960e957`)	2025-12-04 19:11:17 +08:00
yangdx	c01cfc3649	Fix workspace filtering logic in get_all_update_flags_status • Handle namespaces with/without prefixes • Fix workspace matching logic (cherry picked from commit `7ed0eac4c9`)	2025-12-04 19:11:16 +08:00
yangdx	50f8ddd933	Fix pipeline status namespace check to handle root case - Add check for bare "pipeline_status" - Handle namespace without prefix (cherry picked from commit `78689e8837`)	2025-12-04 19:11:16 +08:00
yangdx	dfab175c16	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls (cherry picked from commit `52c812b9a0`)	2025-12-04 19:11:16 +08:00
BukeLy	fe1576943f	fix: Add default workspace support for backward compatibility Fixes two compatibility issues in workspace isolation: 1. Problem: lightrag_server.py calls initialize_pipeline_status() without workspace parameter, causing pipeline to initialize in global namespace instead of rag's workspace. Solution: Add set_default_workspace() mechanism in shared_storage. LightRAG.initialize_storages() now sets default workspace, which initialize_pipeline_status() uses when called without parameters. 2. Problem: /health endpoint hardcoded to use "pipeline_status", cannot return workspace-specific status or support frontend workspace selection. Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now extracts workspace from header or falls back to server default, returning correct workspace-specific pipeline status. Changes: - lightrag/kg/shared_storage.py: Add set/get_default_workspace() - lightrag/lightrag.py: Call set_default_workspace() in initialize_storages() - lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper, update /health endpoint to support LIGHTRAG-WORKSPACE header Testing: - Backward compatibility: Old code works without modification - Multi-instance safety: Explicit workspace passing preserved - /health endpoint: Supports both default and header-specified workspaces Related: #2353 (cherry picked from commit `18a4870229`)	2025-12-04 19:11:16 +08:00
BukeLy	f7b500bca2	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed (cherry picked from commit `eb52ec94d7`)	2025-12-04 19:11:16 +08:00
yangdx	4cc6388742	Add auto-refresh of popular labels when pipeline completes • Monitor pipeline busy->idle transitions • Reload labels on dropdown open if needed • Add onBeforeOpen callback to AsyncSelect • Clear refresh flags after processing • Improve label sync with backend state (cherry picked from commit `58c83f9da5`)	2025-12-04 19:11:15 +08:00
yangdx	a7330f0b95	Remove redundant await call in file extraction pipeline (cherry picked from commit `c36afecba4`)	2025-12-04 19:11:15 +08:00
yangdx	537db072e0	Add Qdrant legacy collection migration with workspace support - Add QdrantMigrationError exception - Implement automatic data migration - Support workspace-based partitioning - Add migration verification logic - Update collection naming scheme (cherry picked from commit `5f4a280458`)	2025-12-04 19:11:15 +08:00
yangdx	46c13e23f0	Add confirmation dialog for pipeline cancellation (cherry picked from commit `81e3496aa4`)	2025-12-04 19:11:15 +08:00
yangdx	74d0a22020	Add pipeline cancellation feature with UI and i18n support - Add cancelPipeline API endpoint - Add cancel button to status dialog - Update status response type - Add cancellation UI translations - Handle cancellation request states (cherry picked from commit `f89b5ab101`)	2025-12-04 19:11:15 +08:00
yangdx	687d2b6b13	Improve error handling and add cancellation checks in pipeline (cherry picked from commit `77336e50b6`)	2025-12-04 19:11:15 +08:00
yangdx	a471f1ca0e	Add pipeline cancellation feature for graceful processing termination • Add cancel_pipeline API endpoint • Implement PipelineCancelledException • Add cancellation checks in main loop • Handle task cancellation gracefully • Mark cancelled docs as FAILED (cherry picked from commit `743aefc655`)	2025-12-04 19:11:15 +08:00
yangdx	37d48bafb6	Simplify skip logging and reduce pipeline status updates (cherry picked from commit `a5253244f9`)	2025-12-04 19:11:14 +08:00
yangdx	d56b4c856e	Fix trailing whitespace and update test mocking for rerank module • Remove trailing whitespace • Fix TiktokenTokenizer import patch • Add async context manager mocks • Update aiohttp.ClientSession patch • Improve test reliability (cherry picked from commit `561ba4e4b5`)	2025-12-04 19:11:14 +08:00
yangdx	f6c20faa16	Configure Dependabot schedule with specific times and timezone - Set Monday 2AM for GitHub Actions - Set Wednesday 2AM for Python deps - Set Friday 2AM for web UI deps - Use Asia/Shanghai timezone - Spread updates across weekdays (cherry picked from commit `6476021619`)	2025-12-04 19:11:14 +08:00
yangdx	a32d201f17	Refactor dependencies and add test extra in pyproject.toml • Pin httpx version in api extra • Extract test dependencies to new extra • Move httpx pin from evaluation to api • Add api dependency to evaluation extra • Separate test from evaluation concerns (cherry picked from commit `268e4ff6f1`)	2025-12-04 19:11:14 +08:00
yangdx	ea421295a6	Drop Python 3.10 and 3.11 from CI test matrix (cherry picked from commit `1f8751225d`)	2025-12-04 19:11:14 +08:00
yangdx	9068c629c6	Configure comprehensive Dependabot for Python and frontend dependencies - Add pip ecosystem with grouping - Add bun ecosystem for webui - Set weekly update schedule - Configure cooldown periods - Ignore numpy breaking changes (cherry picked from commit `0f19f80fdb`)	2025-12-04 19:11:13 +08:00
dependabot[bot]	a5ca3b13cc	Bump the github-actions group with 7 updates Bumps the github-actions group with 7 updates: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [actions/checkout](https://github.com/actions/checkout) \| `2` \| `6` \| \| [actions/setup-python](https://github.com/actions/setup-python) \| `2` \| `6` \| \| [docker/build-push-action](https://github.com/docker/build-push-action) \| `5` \| `6` \| \| [oven-sh/setup-bun](https://github.com/oven-sh/setup-bun) \| `1` \| `2` \| \| [actions/upload-artifact](https://github.com/actions/upload-artifact) \| `4` \| `5` \| \| [actions/download-artifact](https://github.com/actions/download-artifact) \| `4` \| `6` \| \| [actions/stale](https://github.com/actions/stale) \| `9` \| `10` \| Updates `actions/checkout` from 2 to 6 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v2...v6) Updates `actions/setup-python` from 2 to 6 - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v2...v6) Updates `docker/build-push-action` from 5 to 6 - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/v5...v6) Updates `oven-sh/setup-bun` from 1 to 2 - [Release notes](https://github.com/oven-sh/setup-bun/releases) - [Commits](https://github.com/oven-sh/setup-bun/compare/v1...v2) Updates `actions/upload-artifact` from 4 to 5 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...v5) Updates `actions/download-artifact` from 4 to 6 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/v4...v6) Updates `actions/stale` from 9 to 10 - [Release notes](https://github.com/actions/stale/releases) - [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/stale/compare/v9...v10) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: docker/build-push-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: oven-sh/setup-bun dependency-version: '2' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/upload-artifact dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/download-artifact dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/stale dependency-version: '10' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> (cherry picked from commit `88357675ea`)	2025-12-04 19:11:13 +08:00
Christian Clauss	ae84bd8ff5	Keep GitHub Actions up to date with GitHub's Dependabot * [Keeping your software supply chain secure with Dependabot](https://docs.github.com/en/code-security/dependabot) * [Keeping your actions up to date with Dependabot](https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot) * [Configuration options for the `dependabot.yml` file - package-ecosystem](https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#package-ecosystem) To see all GitHub Actions dependencies, type: % `git grep 'uses: ' .github/workflows/` (cherry picked from commit `90e38c20ca`)	2025-12-04 19:11:13 +08:00
yangdx	97a2c8e8b9	Add ruff as dependency to pytest and evaluation extras (cherry picked from commit `5f91063c7a`)	2025-12-04 19:11:13 +08:00
yangdx	322ff19f72	Remove ascii_colors dependency and fix stream handling errors • Remove ascii_colors.trace_exception calls • Add SafeStreamHandler for closed streams • Patch ascii_colors console handler • Prevent ValueError on stream close • Improve logging error handling (cherry picked from commit `0fb2925c6a`)	2025-12-04 19:11:13 +08:00
yangdx	fd76e0f7ce	Enhance workspace isolation test with distinct mock data and persistence • Use different mock LLM per workspace • Add persistent test directory • Create workspace-specific responses • Skip cleanup for inspection (cherry picked from commit `99262adaaa`)	2025-12-04 19:11:13 +08:00
yangdx	4da291468d	Rename test classes to prevent warning from pytest • TestResult → ExecutionResult • TestStats → ExecutionStats • Update class docstrings • Update type hints • Update variable references (cherry picked from commit `7e9c8ed1e8`)	2025-12-04 19:11:12 +08:00
yangdx	60520e0188	test: add concurrent execution to workspace isolation test • Add async sleep to mock functions • Test concurrent ainsert operations • Use asyncio.gather for parallel exec • Measure concurrent execution time (cherry picked from commit `6ae0c14438`)	2025-12-04 19:11:12 +08:00
yangdx	9cf1629117	Add pre-commit to pytest dependencies and format test code • Add pre-commit to pytest extra deps • Update lock file dependencies (cherry picked from commit `5da82bb096`)	2025-12-04 19:11:12 +08:00
yangdx	668b842862	Standardize test directory creation and remove tempfile dependency • Remove unused tempfile import • Use consistent project temp/ structure • Clean up existing directories first • Create directories with os.makedirs • Use descriptive test directory names (cherry picked from commit `4fef731f37`)	2025-12-04 19:11:12 +08:00
yangdx	660ccc7ada	Add GitHub CI workflow and test markers for offline/integration tests - Add GitHub Actions workflow for CI - Mark integration tests requiring services - Add offline test markers for isolated tests - Skip integration tests by default - Configure pytest markers and collection (cherry picked from commit `4ea2124001`)	2025-12-04 19:11:12 +08:00
yangdx	a6fc87d50e	Replace pytest group reference with explicit dependencies in evaluation • Remove pytest group dependency • Add explicit pytest>=8.4.2 • Add pytest-asyncio>=1.2.0 • Add pre-commit directly • Fix potential circular dependency (cherry picked from commit `472b498ade`)	2025-12-04 19:11:12 +08:00
yangdx	d790a660cd	Fix test to use default workspace parameter behavior (cherry picked from commit `41bf6d0283`)	2025-12-04 19:11:12 +08:00
yangdx	d011a1c0e7	Refactor test configuration to use pytest fixtures and CLI options • Add pytest command-line options • Create session-scoped fixtures • Remove hardcoded environment vars • Update test function signatures • Improve configuration priority (cherry picked from commit `1fe05df211`)	2025-12-04 19:11:12 +08:00
yangdx	97cf689dfb	Remove unused variables from workspace isolation test * Remove initial_ok check * Remove both_set verification (cherry picked from commit `cf73cb4d24`)	2025-12-04 19:11:11 +08:00
yangdx	a5b3be1f5a	Refactor pytest dependencies into separate optional group - Extract pytest deps to own group - Reference pytest group in evaluation - Add pytest config to pyproject.toml - Update uv.lock with new structure (cherry picked from commit `b7b8d15632`)	2025-12-04 19:11:11 +08:00
BukeLy	6559dc4fed	test: Add comprehensive workspace isolation test suite for PR #2366 Why this change is needed: PR #2366 introduces critical workspace isolation functionality to resolve multi-instance concurrency issues, but lacks comprehensive automated tests to validate the implementation. Without proper test coverage, we cannot ensure the feature works correctly across all scenarios mentioned in the PR. What this test suite covers: 1. Pipeline Status Isolation: Verifies different workspaces maintain independent pipeline status without interference 2. Lock Mechanism: Validates the new keyed lock system works correctly - Different workspaces can acquire locks in parallel - Same workspace locks serialize properly - No deadlocks occur 3. Backward Compatibility: Ensures legacy code without workspace parameters continues to work using default workspace 4. Multi-Workspace Concurrency: Confirms multiple LightRAG instances with different workspaces can run concurrently without data interference Testing approach: - All tests are automated and deterministic - Uses timing assertions to verify parallel vs serial lock behavior - Validates data isolation through direct namespace data inspection - Comprehensive error handling and detailed test output Test results: All 9 test cases passed successfully, confirming the workspace isolation feature is working correctly across all key scenarios. Impact: Provides confidence that PR #2366's workspace isolation feature is production-ready and won't introduce regressions. (cherry picked from commit `4742fc8efa`)	2025-12-04 19:11:11 +08:00
BukeLy	f1fa1cd340	test: Enhance E2E workspace isolation detection with content verification Add specific content assertions to detect cross-contamination between workspaces. Previously only checked that workspaces had different data, now verifies: - Each workspace contains only its own text content - Each workspace does NOT contain the other workspace's content - Cross-contamination would be immediately detected This ensures the test can find problems, not just pass. Changes: - Add assertions for "Artificial Intelligence" and "Machine Learning" in project_a - Add assertions for "Deep Learning" and "Neural Networks" in project_b - Add negative assertions to verify data leakage doesn't occur - Add detailed output messages showing what was verified Testing: - pytest tests/test_workspace_isolation.py::test_lightrag_end_to_end_workspace_isolation - Test passes with proper content isolation verified (cherry picked from commit `3ec736932e`)	2025-12-04 19:11:11 +08:00
BukeLy	f2771cc953	test: Add real integration and E2E tests for workspace isolation Implemented two critical test scenarios: Test 10 - JsonKVStorage Integration Test: - Instantiate two JsonKVStorage instances with different workspaces - Write different data to each instance (entity1, entity2) - Read back and verify complete data isolation - Verify workspace directories are created correctly - Result: Data correctly isolated, no mixing between workspaces Test 11 - LightRAG End-to-End Test: - Instantiate two LightRAG instances with different workspaces - Insert different documents to each instance - Verify workspace directory structure (project_a/, project_b/) - Verify file separation and data isolation - Result: All 8 storage files created separately per workspace - Document data correctly isolated between workspaces Test Results: 23/23 passed - 19 unit tests - 2 integration tests (JsonKVStorage data + file structure) - 2 E2E tests (LightRAG file structure + data isolation) Coverage: 100% - Unit, Integration, and E2E validated (cherry picked from commit `3e759f46d1`)	2025-12-04 19:11:11 +08:00
BukeLy	00cf52b0bf	test: Convert test_workspace_isolation.py to pytest style Why this change is needed: The test file was using a custom TestResults class for tracking test execution and results, which is not standard practice for pytest-based test suites. This makes the tests harder to integrate with CI/CD pipelines and reduces compatibility with pytest plugins and tooling. How it solves it: - Removed custom TestResults class and manual result tracking - Added @pytest.mark.asyncio decorator to all async test functions - Converted all results.add() calls to standard pytest assert statements - Added pytest fixture (setup_shared_data) for common test setup - Removed custom main() runner (pytest handles test discovery/execution) - Kept all test logic, assertions, and debugging print statements intact Impact: - All 11 test functions maintain identical behavior and coverage - Tests now follow pytest conventions and integrate with pytest ecosystem - Test output is cleaner and more informative with pytest's reporting - Easier to run selective tests using pytest's filtering options Testing: Verified by running: uv run pytest tests/test_workspace_isolation.py -v Result: All 11 tests passed in 2.41s (cherry picked from commit `288498ccdc`)	2025-12-04 19:11:11 +08:00
BukeLy	d5a67ea888	docs: Update test file docstring to reflect all 11 test scenarios Previous docstring mentioned only 4 scenarios but the file actually contains 11 comprehensive test cases. Updated to list all scenarios: 1. Pipeline Status Isolation 2. Lock Mechanism (Parallel/Serial) 3. Backward Compatibility 4. Multi-Workspace Concurrency 5. NamespaceLock Re-entrance Protection 6. Different Namespace Lock Isolation 7. Error Handling 8. Update Flags Workspace Isolation 9. Empty Workspace Standardization 10. JsonKVStorage Workspace Isolation 11. LightRAG End-to-End Workspace Isolation This makes the file header accurately describe its contents. (cherry picked from commit `1a1837028a`)	2025-12-04 19:11:11 +08:00
yangdx	9cf7476dd4	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety) (cherry picked from commit `c246eff725`)	2025-12-04 19:11:10 +08:00
yangdx	95d47566c1	Improve docling integration with macOS compatibility and CLI flag - Add --docling CLI flag for easier setup - Add numpy version constraints - Exclude docling on macOS (fork-safety) (cherry picked from commit `a24d8181c2`)	2025-12-04 19:11:10 +08:00
yangdx	033ee5c0f5	Refactor keyword_extraction from kwargs to explicit parameter • Add keyword_extraction param to functions • Remove kwargs.pop() calls • Update function signatures • Improve parameter documentation • Make parameter handling consistent (cherry picked from commit `2f16065256`)	2025-12-04 19:11:09 +08:00
yangdx	35dd68d767	Refine gitignore to only exclude root-level test files (cherry picked from commit `a790f081dc`)	2025-12-04 19:11:09 +08:00
anouarbm	8650307e65	feat(evaluation): Add sample documents for reproducible RAGAS testing Add 5 markdown documents that users can index to reproduce evaluation results. Changes: - Add sample_documents/ folder with 5 markdown files covering LightRAG features - Update sample_dataset.json with 3 improved, specific test questions - Shorten and correct evaluation README (removed outdated info about mock responses) - Add sample_documents reference with expected ~95% RAGAS score Test Results with sample documents: - Average RAGAS Score: 95.28% - Faithfulness: 100%, Answer Relevance: 96.67% - Context Recall: 88.89%, Context Precision: 95.56% (cherry picked from commit `a172cf893d`)	2025-12-04 19:11:09 +08:00
yangdx	cc33728c10	Improve Langfuse integration and stream response cleanup handling • Check env vars before enabling Langfuse • Move imports after env check logic • Handle wrapper client aclose() issues • Add debug logs for cleanup failures (cherry picked from commit `10f6e6955f`)	2025-12-04 19:11:08 +08:00
anouarbm	ccdd3c2786	fixed ruff format of csv path (cherry picked from commit `b12b693a81`)	2025-12-04 19:11:08 +08:00
anouarbm	949bfc4228	fix: Apply ruff formatting and rename test_dataset to sample_dataset Lint Fixes (ruff): - Sort imports alphabetically (I001) - Add blank line after import traceback (E302) - Add trailing comma to dict literals (COM812) - Reformat writer.writerow for readability (E501) Rename test_dataset.json → sample_dataset.json: - Avoids .gitignore pattern conflict (test_* is ignored) - More descriptive name - it's a sample/template, not actual test data - Updated all references in eval_rag_quality.py and README.md Resolves lint-and-format CI check failure. Addresses reviewer feedback about test dataset naming. (cherry picked from commit `5cdb4b0ef2`)	2025-12-04 19:11:08 +08:00
anouarbm	a934becfcc	feat: add optional Langfuse observability integration This contribution adds optional Langfuse support for LLM observability and tracing. Langfuse provides a drop-in replacement for the OpenAI client that automatically tracks all LLM interactions without requiring code changes. Features: - Optional Langfuse integration with graceful fallback - Automatic LLM request/response tracing - Token usage tracking - Latency metrics - Error tracking - Zero code changes required for existing functionality Implementation: - Modified lightrag/llm/openai.py to conditionally use Langfuse's AsyncOpenAI - Falls back to standard OpenAI client if Langfuse is not installed - Logs observability status on import Configuration: To enable Langfuse tracing, install the observability extras and set environment variables: ```bash pip install lightrag-hku[observability] export LANGFUSE_PUBLIC_KEY="your_public_key" export LANGFUSE_SECRET_KEY="your_secret_key" export LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted instance ``` If Langfuse is not installed or environment variables are not set, LightRAG will use the standard OpenAI client without any functionality changes. Changes: - Modified lightrag/llm/openai.py (added optional Langfuse import) - Updated pyproject.toml with optional 'observability' dependencies Dependencies (optional): - langfuse>=3.8.1 (cherry picked from commit `626b42bc40`)	2025-12-04 19:11:08 +08:00
xiaojunxiang	355aa2593c	fix(docs): correct typo "acivate" → "activate" (cherry picked from commit `9e5004e24f`)	2025-12-04 19:11:08 +08:00

1 2 3 4 5 ...

5344 commits