LightRAG

Author	SHA1	Message	Date
Raphaël MANSUY	f7f9a9e6cf	fix: sync all core modules with upstream after Wave 1	2025-12-04 19:13:48 +08:00
yangdx	d0e3c8a4a3	Fix duplicate document responses to return original track_id - Return existing track_id for duplicates - Remove track_id generation in reprocess - Update reprocess response documentation - Clarify track_id behavior in comments - Update API response examples (cherry picked from commit `8d28b95966`)	2025-12-04 19:11:24 +08:00
yangdx	7e591a81c0	Clean up duplicate dependencies in package.json and lock file • Remove duplicate katex entries • Remove duplicate lucide-react entries • Remove duplicate mermaid entries • Remove duplicate @types/bun entries • Fix trailing commas in JSON (cherry picked from commit `459e4ddc09`)	2025-12-04 19:11:23 +08:00
yangdx	21fc61ecd2	Add content deduplication check for document insertion endpoints • Check content hash before insertion • Return duplicated status if exists • Use sanitized text for hash computation • Apply to both single and batch inserts • Prevent duplicate content processing (cherry picked from commit `19c16bc464`)	2025-12-04 19:11:23 +08:00
yangdx	f13d30206f	Fix relation deduplication logic and standardize log message prefixes (cherry picked from commit `a25003c336`)	2025-12-04 19:11:23 +08:00
yangdx	2ea1fccf1a	Refactor deduplication calculation and remove unused variables (cherry picked from commit `1154c5683f`)	2025-12-04 19:11:23 +08:00
DivinesLight	f742ba0220	Quick fix to limit source_id ballooning while inserting nodes (cherry picked from commit `7871600d8a`)	2025-12-04 19:11:23 +08:00
DivinesLight	b9fc6f19dd	Quick fix to limit source_id ballooning while inserting nodes (cherry picked from commit `54f0a7d1ca`)	2025-12-04 19:11:23 +08:00
yangdx	429cd6a66f	Fix top_n behavior with chunking to limit documents not chunks - Disable API-level top_n when chunking - Apply top_n to aggregated documents - Add comprehensive test coverage (cherry picked from commit `9009abed3e`)	2025-12-04 19:11:22 +08:00
copilot-swe-agent[bot]	85f21aecd5	Fix chunking infinite loop when overlap_tokens >= max_tokens Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com> (cherry picked from commit `1d6ea0c5f7`)	2025-12-04 19:11:22 +08:00
netbrah	b65ef37569	Add Cohere reranker config, chunking, and tests (cherry picked from commit `a05bbf105e`)	2025-12-04 19:11:22 +08:00
yangdx	8a8bdba8f4	Add comprehensive chunking tests with multi-token tokenizer edge cases • Add MultiTokenCharacterTokenizer for testing • Test token vs character counting accuracy • Verify delimiter splitting precision • Test overlap with distinctive content • Add decode content preservation tests (cherry picked from commit `fec7c67f45`)	2025-12-04 19:11:22 +08:00
yangdx	7f7574c8b7	Add token limit validation for character-only chunking - Add ChunkTokenLimitExceededError exception - Validate chunks against token limits - Include chunk preview in error messages - Add comprehensive test coverage - Log warnings for oversized chunks (cherry picked from commit `f988a22652`)	2025-12-04 19:11:22 +08:00
yangdx	c50a1357a6	Fix ChunkTokenLimitExceededError message formatting - Prevent passes two separate string objects to __init__ - Maintain same error output (cherry picked from commit `6fea68bff9`)	2025-12-04 19:11:22 +08:00
yangdx	326acbf19b	Add comprehensive tests for chunking with recursive splitting - Test recursive split mode - Add edge case coverage - Test parameter combinations - Verify chunk order indexing - Add integration test scenarios (cherry picked from commit `5733292557`)	2025-12-04 19:11:21 +08:00
yangdx	6e3ff18570	Adjust chunking parameters to match the default environment variable settings (cherry picked from commit `e77340d4a1`)	2025-12-04 19:11:21 +08:00
EightyOliveira	b8dc5de81a	refactor(chunking): rename params and improve docstring for chunking_by_token_size (cherry picked from commit `dacca334e0`)	2025-12-04 19:11:21 +08:00
yangdx	d769a446d1	Support async chunking functions in LightRAG processing pipeline - Add Awaitable and Union type imports - Update chunking_func type annotation - Handle coroutine results with await - Add return type validation - Update docstring for async support (cherry picked from commit `940bec0b31`)	2025-12-04 19:11:21 +08:00
Tong Da	877f2c01d3	easier version: detect chunking_func result is coroutine or not (cherry picked from commit `245df75d9c`)	2025-12-04 19:11:21 +08:00
Tong Da	8a43e16f6e	support async chunking func to improve processing performance when a heavy `chunking_func` is passed in by user (cherry picked from commit `7740500693`)	2025-12-04 19:11:20 +08:00
yangdx	70ba7cd787	Fix: Remove redundant entity/relation chunk deletions (cherry picked from commit `ea141e2779`)	2025-12-04 19:11:20 +08:00
yangdx	211dbc3f78	Remove unused chunk-based node/edge retrieval methods (cherry picked from commit `807d2461d3`)	2025-12-04 19:11:20 +08:00
yangdx	ce702ccb2f	Add workspace parameter and remove chunk-based query unit tests - Add workspace param to test storage init - Remove get_nodes_by_chunk_ids tests - Remove get_edges_by_chunk_ids tests - Clean up batch operations test function (cherry picked from commit `6b0f9795be`)	2025-12-04 19:11:20 +08:00
anouarbm	7ce251c319	docs: Add documentation and examples for include_chunk_content parameter Added comprehensive documentation for the new include_chunk_content parameter that enables retrieval of actual chunk text content in API responses. Documentation Updates: - Added "Include Chunk Content in References" section to API README - Explained use cases: RAG evaluation, debugging, citations, transparency - Provided JSON request/response examples - Clarified parameter interaction with include_references OpenAPI/Swagger Examples: - Added "Response with chunk content" example to /query endpoint - Shows complete reference structure with content field - Demonstrates realistic chunk text content This makes the feature discoverable through: 1. API documentation (README.md) 2. Interactive Swagger UI (http://localhost:9621/docs) 3. Code examples for developers (cherry picked from commit `963ad4c637`)	2025-12-04 19:11:20 +08:00
anouarbm	349c1945db	Optimize RAGAS evaluation with parallel execution and chunk content enrichment Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking. Key Features: - Single API call per evaluation (2x faster than before) - Parallel evaluation based on MAX_ASYNC environment variable - Chunk content enrichment in /query endpoint responses - Comprehensive benchmark statistics (moyennes) - NaN-safe metric calculations API Changes: - Added include_chunk_content parameter to QueryRequest (backward compatible) - /query endpoint enriches references with actual chunk content when requested - No breaking changes - default behavior unchanged Evaluation Improvements: - Parallel execution using asyncio.Semaphore (respects MAX_ASYNC) - Shared HTTP client with connection pooling - Proper timeout handling (3min connect, 5min read) - Debug output for context retrieval verification - Benchmark statistics with averages, min/max scores Results: - Moyenne RAGAS Score: 0.9772 - Perfect Faithfulness: 1.0000 - Perfect Context Recall: 1.0000 - Perfect Context Precision: 1.0000 - Excellent Answer Relevance: 0.9087 (cherry picked from commit `0bbef9814e`)	2025-12-04 19:11:20 +08:00
yangdx	8f16f6fe31	Fix entity and relationship deletion when no chunk references remain (cherry picked from commit `c81a56a113`)	2025-12-04 19:11:19 +08:00
yangdx	17a9771cfb	Add chunk tracking support to entity merge functionality - Pass chunk storages to merge function - Merge relation chunk tracking data - Merge entity chunk tracking data - Delete old chunk tracking records - Persist chunk storage updates (cherry picked from commit `2c09adb8d3`)	2025-12-04 19:11:19 +08:00
yangdx	450f969430	Add chunk tracking cleanup to entity/relation deletion and creation • Clean up chunk storage on delete • Track chunks in create operations • Normalize relation keys consistently (cherry picked from commit `a3370b024d`)	2025-12-04 19:11:19 +08:00
yangdx	7e0f12c28e	Enhance entity/relation editing with chunk tracking synchronization • Add chunk storage sync to edit ops • Implement incremental chunk ID updates • Support entity renaming migrations • Normalize relation keys consistently • Preserve chunk references on edits (cherry picked from commit `3fbd704bf9`)	2025-12-04 19:11:19 +08:00
yangdx	488f67e5b2	Fix entity and relation chunk cleanup in deletion pipeline • Delete from entity_chunks storage • Delete from relation_chunks storage (cherry picked from commit `29bf593663`)	2025-12-04 19:11:19 +08:00
yangdx	cb5451faf8	Add entity/relation chunk tracking with configurable source ID limits - Add entity_chunks & relation_chunks storage - Implement KEEP/FIFO limit strategies - Update env.example with new settings - Add migration for chunk tracking data - Support all KV storage (cherry picked from commit `dc62c78f98`)	2025-12-04 19:11:19 +08:00
yangdx	7248e09fc4	Allow related chunks missing in knowledge graph queries (cherry picked from commit `35cd567c9e`)	2025-12-04 19:11:18 +08:00
yangdx	851b45f726	Add pipeline status lock function for legacy compatibility - Add get_pipeline_status_lock function - Return NamespaceLock for consistency - Support workspace parameter - Enable logging option - Legacy code compatibility (cherry picked from commit `93d445dfdd`)	2025-12-04 19:11:18 +08:00
yangdx	402d2f9a98	Fix namespace parsing when workspace contains colons • Use rsplit instead of split • Handle colons in workspace names (cherry picked from commit `f8dd2e0724`)	2025-12-04 19:11:18 +08:00
yangdx	6ba35f81df	Fix: auto-acquire pipeline when idle in document deletion • Track if we acquired the pipeline lock • Auto-acquire pipeline when idle • Only release if we acquired it • Prevent concurrent deletion conflicts • Improve deletion job validation (cherry picked from commit `4048fc4b89`)	2025-12-04 19:11:18 +08:00
yangdx	7e7c86601e	Improve workspace isolation tests with better parallelism checks and cleanup • Add finalize_share_data cleanup • Refactor lock timing measurement • Add timeline overlap validation • Include purpose/scope documentation • Fix tokenizer integration (cherry picked from commit `21ad990e36`)	2025-12-04 19:11:18 +08:00
yangdx	5febb88824	Fix missing workspace parameter in update flags status call (cherry picked from commit `1745b30a5f`)	2025-12-04 19:11:18 +08:00
yangdx	dc4c10c346	Fix NamespaceLock context variable timing to prevent lock bricking * Acquire lock before setting ContextVar * Prevent state corruption on cancellation * Fix permanent lock brick scenario * Store context only after success * Handle acquisition failure properly (cherry picked from commit `e8383df3b8`)	2025-12-04 19:11:17 +08:00
yangdx	87561f8b28	Remove manual initialize_pipeline_status() calls across codebase - Auto-init pipeline status in storages - Remove redundant import statements - Simplify initialization pattern - Update docs and examples (cherry picked from commit `cdd53ee875`)	2025-12-04 19:11:17 +08:00
yangdx	1e7bd654d8	Fix NamespaceLock concurrent coroutine safety with ContextVar - Use ContextVar for per-coroutine storage - Prevent state interference between coroutines - Add re-entrance protection check (cherry picked from commit `b6a5a90eaf`)	2025-12-04 19:11:17 +08:00
yangdx	f6a45245bd	Add pipeline status validation before document deletion (cherry picked from commit `9d7b7981ce`)	2025-12-04 19:11:17 +08:00
yangdx	94ae13a037	Refactor workspace handling to use default workspace and namespace locks - Remove DB-specific workspace configs - Add default workspace auto-setting - Replace global locks with namespace locks - Simplify pipeline status management - Remove redundant graph DB locking (cherry picked from commit `926960e957`)	2025-12-04 19:11:17 +08:00
yangdx	c01cfc3649	Fix workspace filtering logic in get_all_update_flags_status • Handle namespaces with/without prefixes • Fix workspace matching logic (cherry picked from commit `7ed0eac4c9`)	2025-12-04 19:11:16 +08:00
yangdx	50f8ddd933	Fix pipeline status namespace check to handle root case - Add check for bare "pipeline_status" - Handle namespace without prefix (cherry picked from commit `78689e8837`)	2025-12-04 19:11:16 +08:00
yangdx	dfab175c16	Fix workspace isolation for pipeline status across all operations - Fix final_namespace error in get_namespace_data() - Fix get_workspace_from_request return type - Add workspace param to pipeline status calls (cherry picked from commit `52c812b9a0`)	2025-12-04 19:11:16 +08:00
BukeLy	fe1576943f	fix: Add default workspace support for backward compatibility Fixes two compatibility issues in workspace isolation: 1. Problem: lightrag_server.py calls initialize_pipeline_status() without workspace parameter, causing pipeline to initialize in global namespace instead of rag's workspace. Solution: Add set_default_workspace() mechanism in shared_storage. LightRAG.initialize_storages() now sets default workspace, which initialize_pipeline_status() uses when called without parameters. 2. Problem: /health endpoint hardcoded to use "pipeline_status", cannot return workspace-specific status or support frontend workspace selection. Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now extracts workspace from header or falls back to server default, returning correct workspace-specific pipeline status. Changes: - lightrag/kg/shared_storage.py: Add set/get_default_workspace() - lightrag/lightrag.py: Call set_default_workspace() in initialize_storages() - lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper, update /health endpoint to support LIGHTRAG-WORKSPACE header Testing: - Backward compatibility: Old code works without modification - Multi-instance safety: Explicit workspace passing preserved - /health endpoint: Supports both default and header-specified workspaces Related: #2353 (cherry picked from commit `18a4870229`)	2025-12-04 19:11:16 +08:00
BukeLy	f7b500bca2	feat: Add workspace isolation support for pipeline status Problem: In multi-tenant scenarios, different workspaces share a single global pipeline_status namespace, causing pipelines from different tenants to block each other, severely impacting concurrent processing performance. Solution: - Extended get_namespace_data() to recognize workspace-specific pipeline namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern) - Added workspace parameter to initialize_pipeline_status() for per-tenant isolated pipeline namespaces - Updated all 7 call sites to use workspace-aware locks: * lightrag.py: process_document_queue(), aremove_document() * document_routes.py: background_delete_documents(), clear_documents(), cancel_pipeline(), get_pipeline_status(), delete_documents() Impact: - Different workspaces can process documents concurrently without blocking - Backward compatible: empty workspace defaults to "pipeline_status" - Maintains fail-fast: uninitialized pipeline raises clear error - Expected N× performance improvement for N concurrent tenants Bug fixes: - Fixed AttributeError by using self.workspace instead of self.global_config - Fixed pipeline status endpoint to show workspace-specific status - Fixed delete endpoint to check workspace-specific busy flag Code changes: 4 files, 141 insertions(+), 28 deletions(-) Testing: All syntax checks passed, comprehensive workspace isolation tests completed (cherry picked from commit `eb52ec94d7`)	2025-12-04 19:11:16 +08:00
yangdx	4cc6388742	Add auto-refresh of popular labels when pipeline completes • Monitor pipeline busy->idle transitions • Reload labels on dropdown open if needed • Add onBeforeOpen callback to AsyncSelect • Clear refresh flags after processing • Improve label sync with backend state (cherry picked from commit `58c83f9da5`)	2025-12-04 19:11:15 +08:00
yangdx	a7330f0b95	Remove redundant await call in file extraction pipeline (cherry picked from commit `c36afecba4`)	2025-12-04 19:11:15 +08:00
yangdx	537db072e0	Add Qdrant legacy collection migration with workspace support - Add QdrantMigrationError exception - Implement automatic data migration - Support workspace-based partitioning - Add migration verification logic - Update collection naming scheme (cherry picked from commit `5f4a280458`)	2025-12-04 19:11:15 +08:00

1 2 3 4 5 ...

5384 commits