Commit graph

3431 commits

Author SHA1 Message Date
anouarbm
7ce251c319 docs: Add documentation and examples for include_chunk_content parameter
Added comprehensive documentation for the new include_chunk_content parameter
that enables retrieval of actual chunk text content in API responses.

Documentation Updates:
- Added "Include Chunk Content in References" section to API README
- Explained use cases: RAG evaluation, debugging, citations, transparency
- Provided JSON request/response examples
- Clarified parameter interaction with include_references

OpenAPI/Swagger Examples:
- Added "Response with chunk content" example to /query endpoint
- Shows complete reference structure with content field
- Demonstrates realistic chunk text content

This makes the feature discoverable through:
1. API documentation (README.md)
2. Interactive Swagger UI (http://localhost:9621/docs)
3. Code examples for developers

(cherry picked from commit 963ad4c637)
2025-12-04 19:11:20 +08:00
anouarbm
349c1945db Optimize RAGAS evaluation with parallel execution and chunk content enrichment
Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking.

Key Features:
- Single API call per evaluation (2x faster than before)
- Parallel evaluation based on MAX_ASYNC environment variable
- Chunk content enrichment in /query endpoint responses
- Comprehensive benchmark statistics (moyennes)
- NaN-safe metric calculations

API Changes:
- Added include_chunk_content parameter to QueryRequest (backward compatible)
- /query endpoint enriches references with actual chunk content when requested
- No breaking changes - default behavior unchanged

Evaluation Improvements:
- Parallel execution using asyncio.Semaphore (respects MAX_ASYNC)
- Shared HTTP client with connection pooling
- Proper timeout handling (3min connect, 5min read)
- Debug output for context retrieval verification
- Benchmark statistics with averages, min/max scores

Results:
- Moyenne RAGAS Score: 0.9772
- Perfect Faithfulness: 1.0000
- Perfect Context Recall: 1.0000
- Perfect Context Precision: 1.0000
- Excellent Answer Relevance: 0.9087

(cherry picked from commit 0bbef9814e)
2025-12-04 19:11:20 +08:00
yangdx
8f16f6fe31 Fix entity and relationship deletion when no chunk references remain
(cherry picked from commit c81a56a113)
2025-12-04 19:11:19 +08:00
yangdx
17a9771cfb Add chunk tracking support to entity merge functionality
- Pass chunk storages to merge function
- Merge relation chunk tracking data
- Merge entity chunk tracking data
- Delete old chunk tracking records
- Persist chunk storage updates

(cherry picked from commit 2c09adb8d3)
2025-12-04 19:11:19 +08:00
yangdx
450f969430 Add chunk tracking cleanup to entity/relation deletion and creation
• Clean up chunk storage on delete
• Track chunks in create operations
• Normalize relation keys consistently

(cherry picked from commit a3370b024d)
2025-12-04 19:11:19 +08:00
yangdx
7e0f12c28e Enhance entity/relation editing with chunk tracking synchronization
• Add chunk storage sync to edit ops
• Implement incremental chunk ID updates
• Support entity renaming migrations
• Normalize relation keys consistently
• Preserve chunk references on edits

(cherry picked from commit 3fbd704bf9)
2025-12-04 19:11:19 +08:00
yangdx
488f67e5b2 Fix entity and relation chunk cleanup in deletion pipeline
• Delete from entity_chunks storage
• Delete from relation_chunks storage

(cherry picked from commit 29bf593663)
2025-12-04 19:11:19 +08:00
yangdx
cb5451faf8 Add entity/relation chunk tracking with configurable source ID limits
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage

(cherry picked from commit dc62c78f98)
2025-12-04 19:11:19 +08:00
yangdx
851b45f726 Add pipeline status lock function for legacy compatibility
- Add get_pipeline_status_lock function
- Return NamespaceLock for consistency
- Support workspace parameter
- Enable logging option
- Legacy code compatibility

(cherry picked from commit 93d445dfdd)
2025-12-04 19:11:18 +08:00
yangdx
402d2f9a98 Fix namespace parsing when workspace contains colons
• Use rsplit instead of split
• Handle colons in workspace names

(cherry picked from commit f8dd2e0724)
2025-12-04 19:11:18 +08:00
yangdx
6ba35f81df Fix: auto-acquire pipeline when idle in document deletion
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation

(cherry picked from commit 4048fc4b89)
2025-12-04 19:11:18 +08:00
yangdx
5febb88824 Fix missing workspace parameter in update flags status call
(cherry picked from commit 1745b30a5f)
2025-12-04 19:11:18 +08:00
yangdx
dc4c10c346 Fix NamespaceLock context variable timing to prevent lock bricking
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly

(cherry picked from commit e8383df3b8)
2025-12-04 19:11:17 +08:00
yangdx
87561f8b28 Remove manual initialize_pipeline_status() calls across codebase
- Auto-init pipeline status in storages
- Remove redundant import statements
- Simplify initialization pattern
- Update docs and examples

(cherry picked from commit cdd53ee875)
2025-12-04 19:11:17 +08:00
yangdx
1e7bd654d8 Fix NamespaceLock concurrent coroutine safety with ContextVar
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check

(cherry picked from commit b6a5a90eaf)
2025-12-04 19:11:17 +08:00
yangdx
f6a45245bd Add pipeline status validation before document deletion
(cherry picked from commit 9d7b7981ce)
2025-12-04 19:11:17 +08:00
yangdx
94ae13a037 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking

(cherry picked from commit 926960e957)
2025-12-04 19:11:17 +08:00
yangdx
c01cfc3649 Fix workspace filtering logic in get_all_update_flags_status
• Handle namespaces with/without prefixes
• Fix workspace matching logic

(cherry picked from commit 7ed0eac4c9)
2025-12-04 19:11:16 +08:00
yangdx
50f8ddd933 Fix pipeline status namespace check to handle root case
- Add check for bare "pipeline_status"
- Handle namespace without prefix

(cherry picked from commit 78689e8837)
2025-12-04 19:11:16 +08:00
yangdx
dfab175c16 Fix workspace isolation for pipeline status across all operations
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls

(cherry picked from commit 52c812b9a0)
2025-12-04 19:11:16 +08:00
BukeLy
fe1576943f fix: Add default workspace support for backward compatibility
Fixes two compatibility issues in workspace isolation:

1. Problem: lightrag_server.py calls initialize_pipeline_status()
   without workspace parameter, causing pipeline to initialize in
   global namespace instead of rag's workspace.

   Solution: Add set_default_workspace() mechanism in shared_storage.
   LightRAG.initialize_storages() now sets default workspace, which
   initialize_pipeline_status() uses when called without parameters.

2. Problem: /health endpoint hardcoded to use "pipeline_status",
   cannot return workspace-specific status or support frontend
   workspace selection.

   Solution: Add LIGHTRAG-WORKSPACE header support. Endpoint now
   extracts workspace from header or falls back to server default,
   returning correct workspace-specific pipeline status.

Changes:
- lightrag/kg/shared_storage.py: Add set/get_default_workspace()
- lightrag/lightrag.py: Call set_default_workspace() in initialize_storages()
- lightrag/api/lightrag_server.py: Add get_workspace_from_request() helper,
  update /health endpoint to support LIGHTRAG-WORKSPACE header

Testing:
- Backward compatibility: Old code works without modification
- Multi-instance safety: Explicit workspace passing preserved
- /health endpoint: Supports both default and header-specified workspaces

Related: #2353
(cherry picked from commit 18a4870229)
2025-12-04 19:11:16 +08:00
BukeLy
f7b500bca2 feat: Add workspace isolation support for pipeline status
Problem:
In multi-tenant scenarios, different workspaces share a single global
pipeline_status namespace, causing pipelines from different tenants to
block each other, severely impacting concurrent processing performance.

Solution:
- Extended get_namespace_data() to recognize workspace-specific pipeline
  namespaces with pattern "{workspace}:pipeline" (following GraphDB pattern)
- Added workspace parameter to initialize_pipeline_status() for per-tenant
  isolated pipeline namespaces
- Updated all 7 call sites to use workspace-aware locks:
  * lightrag.py: process_document_queue(), aremove_document()
  * document_routes.py: background_delete_documents(), clear_documents(),
    cancel_pipeline(), get_pipeline_status(), delete_documents()

Impact:
- Different workspaces can process documents concurrently without blocking
- Backward compatible: empty workspace defaults to "pipeline_status"
- Maintains fail-fast: uninitialized pipeline raises clear error
- Expected N× performance improvement for N concurrent tenants

Bug fixes:
- Fixed AttributeError by using self.workspace instead of self.global_config
- Fixed pipeline status endpoint to show workspace-specific status
- Fixed delete endpoint to check workspace-specific busy flag

Code changes: 4 files, 141 insertions(+), 28 deletions(-)

Testing: All syntax checks passed, comprehensive workspace isolation tests completed
(cherry picked from commit eb52ec94d7)
2025-12-04 19:11:16 +08:00
yangdx
a7330f0b95 Remove redundant await call in file extraction pipeline
(cherry picked from commit c36afecba4)
2025-12-04 19:11:15 +08:00
yangdx
537db072e0 Add Qdrant legacy collection migration with workspace support
- Add QdrantMigrationError exception
- Implement automatic data migration
- Support workspace-based partitioning
- Add migration verification logic
- Update collection naming scheme

(cherry picked from commit 5f4a280458)
2025-12-04 19:11:15 +08:00
yangdx
687d2b6b13 Improve error handling and add cancellation checks in pipeline
(cherry picked from commit 77336e50b6)
2025-12-04 19:11:15 +08:00
yangdx
a471f1ca0e Add pipeline cancellation feature for graceful processing termination
• Add cancel_pipeline API endpoint
• Implement PipelineCancelledException
• Add cancellation checks in main loop
• Handle task cancellation gracefully
• Mark cancelled docs as FAILED

(cherry picked from commit 743aefc655)
2025-12-04 19:11:15 +08:00
yangdx
37d48bafb6 Simplify skip logging and reduce pipeline status updates
(cherry picked from commit a5253244f9)
2025-12-04 19:11:14 +08:00
yangdx
d56b4c856e Fix trailing whitespace and update test mocking for rerank module
• Remove trailing whitespace
• Fix TiktokenTokenizer import patch
• Add async context manager mocks
• Update aiohttp.ClientSession patch
• Improve test reliability

(cherry picked from commit 561ba4e4b5)
2025-12-04 19:11:14 +08:00
yangdx
322ff19f72 Remove ascii_colors dependency and fix stream handling errors
• Remove ascii_colors.trace_exception calls
• Add SafeStreamHandler for closed streams
• Patch ascii_colors console handler
• Prevent ValueError on stream close
• Improve logging error handling

(cherry picked from commit 0fb2925c6a)
2025-12-04 19:11:13 +08:00
yangdx
9cf7476dd4 Improve docling integration with macOS compatibility and CLI flag
- Add --docling CLI flag for easier setup
- Add numpy version constraints
- Exclude docling on macOS (fork-safety)

(cherry picked from commit c246eff725)
2025-12-04 19:11:10 +08:00
yangdx
95d47566c1 Improve docling integration with macOS compatibility and CLI flag
- Add --docling CLI flag for easier setup
- Add numpy version constraints
- Exclude docling on macOS (fork-safety)

(cherry picked from commit a24d8181c2)
2025-12-04 19:11:10 +08:00
yangdx
033ee5c0f5 Refactor keyword_extraction from kwargs to explicit parameter
• Add keyword_extraction param to functions
• Remove kwargs.pop() calls
• Update function signatures
• Improve parameter documentation
• Make parameter handling consistent

(cherry picked from commit 2f16065256)
2025-12-04 19:11:09 +08:00
anouarbm
8650307e65 feat(evaluation): Add sample documents for reproducible RAGAS testing
Add 5 markdown documents that users can index to reproduce evaluation results.

Changes:
- Add sample_documents/ folder with 5 markdown files covering LightRAG features
- Update sample_dataset.json with 3 improved, specific test questions
- Shorten and correct evaluation README (removed outdated info about mock responses)
- Add sample_documents reference with expected ~95% RAGAS score

Test Results with sample documents:
- Average RAGAS Score: 95.28%
- Faithfulness: 100%, Answer Relevance: 96.67%
- Context Recall: 88.89%, Context Precision: 95.56%

(cherry picked from commit a172cf893d)
2025-12-04 19:11:09 +08:00
yangdx
cc33728c10 Improve Langfuse integration and stream response cleanup handling
• Check env vars before enabling Langfuse
• Move imports after env check logic
• Handle wrapper client aclose() issues
• Add debug logs for cleanup failures

(cherry picked from commit 10f6e6955f)
2025-12-04 19:11:08 +08:00
anouarbm
ccdd3c2786 fixed ruff format of csv path
(cherry picked from commit b12b693a81)
2025-12-04 19:11:08 +08:00
anouarbm
949bfc4228 fix: Apply ruff formatting and rename test_dataset to sample_dataset
**Lint Fixes (ruff)**:
- Sort imports alphabetically (I001)
- Add blank line after import traceback (E302)
- Add trailing comma to dict literals (COM812)
- Reformat writer.writerow for readability (E501)

**Rename test_dataset.json → sample_dataset.json**:
- Avoids .gitignore pattern conflict (test_* is ignored)
- More descriptive name - it's a sample/template, not actual test data
- Updated all references in eval_rag_quality.py and README.md

Resolves lint-and-format CI check failure.
Addresses reviewer feedback about test dataset naming.

(cherry picked from commit 5cdb4b0ef2)
2025-12-04 19:11:08 +08:00
anouarbm
a934becfcc feat: add optional Langfuse observability integration
This contribution adds optional Langfuse support for LLM observability and tracing.
Langfuse provides a drop-in replacement for the OpenAI client that automatically
tracks all LLM interactions without requiring code changes.

Features:
- Optional Langfuse integration with graceful fallback
- Automatic LLM request/response tracing
- Token usage tracking
- Latency metrics
- Error tracking
- Zero code changes required for existing functionality

Implementation:
- Modified lightrag/llm/openai.py to conditionally use Langfuse's AsyncOpenAI
- Falls back to standard OpenAI client if Langfuse is not installed
- Logs observability status on import

Configuration:
To enable Langfuse tracing, install the observability extras and set environment variables:

```bash
pip install lightrag-hku[observability]

export LANGFUSE_PUBLIC_KEY="your_public_key"
export LANGFUSE_SECRET_KEY="your_secret_key"
export LANGFUSE_HOST="https://cloud.langfuse.com"  # or your self-hosted instance
```

If Langfuse is not installed or environment variables are not set, LightRAG
will use the standard OpenAI client without any functionality changes.

Changes:
- Modified lightrag/llm/openai.py (added optional Langfuse import)
- Updated pyproject.toml with optional 'observability' dependencies

Dependencies (optional):
- langfuse>=3.8.1

(cherry picked from commit 626b42bc40)
2025-12-04 19:11:08 +08:00
xiaojunxiang
355aa2593c fix(docs): correct typo "acivate" → "activate"
(cherry picked from commit 9e5004e24f)
2025-12-04 19:11:08 +08:00
Raphaël MANSUY
ed73def994 fix: sync core modules with upstream for compatibility 2025-12-04 19:10:46 +08:00
yangdx
7ce3680ca5 Add retry decorators to Neo4j read operations for resilience
(cherry picked from commit 7aaa51cda9)
2025-12-04 19:09:08 +08:00
yangdx
00d51f9dba Fix dimension type comparison in Milvus vector field validation
• Convert dimensions to int for comparison
• Handle string vs int type mismatches

(cherry picked from commit 0fa9a2eee3)
2025-12-04 19:09:08 +08:00
yangdx
0594a5a049 Update pymilvus dependency from 2.5.2 to >=2.6.2
(cherry picked from commit baab992431)
2025-12-04 19:09:07 +08:00
yangdx
de011c99a4 Add CASCADE to AGE extension creation in PostgreSQL implementation
- Add CASCADE option to CREATE EXTENSION
- Ensure dependencies are installed
- Fix potential AGE setup issues

(cherry picked from commit d6019c82af)
2025-12-04 19:09:07 +08:00
yangdx
bd93f13012 Refactor: Extract retry decorator to reduce code duplication in Neo4J storage
• Define READ_RETRY_EXCEPTIONS constant
• Create reusable READ_RETRY decorator
• Replace 11 duplicate retry decorators
• Improve code maintainability
• Add missing retry to edge_degrees_batch

(cherry picked from commit 8c4d7a00ad)
2025-12-04 19:09:07 +08:00
copilot-swe-agent[bot]
b28a701532 Improve edge case handling for max_tokens=1
Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com>
(cherry picked from commit 8835fc244a)
2025-12-04 19:09:07 +08:00
wmsnp
ae5cd9262b fix: add logger to configure_vchordrq() and format code
(cherry picked from commit f4bf5d279c)
2025-12-04 19:09:06 +08:00
wmsnp
3954bb6579 feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic
(cherry picked from commit d07023c962)
2025-12-04 19:09:06 +08:00
yangdx
1cbe0ba885 Reduce log level and improve workspace mismatch message clarity
• Change warning to info level
• Simplify workspace mismatch wording

(cherry picked from commit 6cef8df159)
2025-12-04 19:09:06 +08:00
yangdx
0ac858d3e2 fix(postgres): allow vchordrq.epsilon config when probes is empty
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.

This fixes the documented epsilon setting being impossible to use in the
default configuration.

(cherry picked from commit 3096f844fb)
2025-12-04 19:09:06 +08:00
yangdx
5bd1320a1d Refactor storage classes to use namespace instead of final_namespace
(cherry picked from commit fd486bc922)
2025-12-04 19:09:06 +08:00