Raphaël MANSUY
2fbc5972f8
cherry-pick 39b49e92
2025-12-04 19:14:27 +08:00
Raphaël MANSUY
1c00dbfa56
cherry-pick 2fb57e76
2025-12-04 19:14:27 +08:00
Raphaël MANSUY
c83a76786a
cherry-pick 14a6c24e
2025-12-04 19:14:27 +08:00
Raphaël MANSUY
3558adae47
cherry-pick 05852e1a
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
da7683a001
cherry-pick de4ed736
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
395b76cdc9
cherry-pick a624a950
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
f389b0d63a
cherry-pick 0b2a15c4
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
3c8507358c
cherry-pick 03cc6262
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
cca946f437
cherry-pick d94aae9c
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
0166a38d01
cherry-pick ce28f30c
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
cacea8ab56
cherry-pick 33a1482f
2025-12-04 19:14:26 +08:00
Raphaël MANSUY
e11e30be0e
cherry-pick 01b07b2b
2025-12-04 19:14:25 +08:00
Raphaël MANSUY
56b8806256
cherry-pick 9c057060
2025-12-04 19:14:25 +08:00
Raphaël MANSUY
89f8048df5
cherry-pick 7b8223da
2025-12-04 19:14:25 +08:00
Raphaël MANSUY
b57cd0cae2
cherry-pick 6a29b5da
2025-12-04 19:14:25 +08:00
Raphaël MANSUY
264ba4e172
cherry-pick 6d1ae404
2025-12-04 19:14:25 +08:00
Raphaël MANSUY
c7173baf3d
cherry-pick ec40b17e
2025-12-04 19:14:25 +08:00
Raphaël MANSUY
f7f9a9e6cf
fix: sync all core modules with upstream after Wave 1
2025-12-04 19:13:48 +08:00
yangdx
d0e3c8a4a3
Fix duplicate document responses to return original track_id
...
- Return existing track_id for duplicates
- Remove track_id generation in reprocess
- Update reprocess response documentation
- Clarify track_id behavior in comments
- Update API response examples
(cherry picked from commit 8d28b95966 )
2025-12-04 19:11:24 +08:00
yangdx
7e591a81c0
Clean up duplicate dependencies in package.json and lock file
...
• Remove duplicate katex entries
• Remove duplicate lucide-react entries
• Remove duplicate mermaid entries
• Remove duplicate @types/bun entries
• Fix trailing commas in JSON
(cherry picked from commit 459e4ddc09 )
2025-12-04 19:11:23 +08:00
yangdx
21fc61ecd2
Add content deduplication check for document insertion endpoints
...
• Check content hash before insertion
• Return duplicated status if exists
• Use sanitized text for hash computation
• Apply to both single and batch inserts
• Prevent duplicate content processing
(cherry picked from commit 19c16bc464 )
2025-12-04 19:11:23 +08:00
yangdx
f13d30206f
Fix relation deduplication logic and standardize log message prefixes
...
(cherry picked from commit a25003c336 )
2025-12-04 19:11:23 +08:00
yangdx
2ea1fccf1a
Refactor deduplication calculation and remove unused variables
...
(cherry picked from commit 1154c5683f )
2025-12-04 19:11:23 +08:00
DivinesLight
f742ba0220
Quick fix to limit source_id ballooning while inserting nodes
...
(cherry picked from commit 7871600d8a )
2025-12-04 19:11:23 +08:00
DivinesLight
b9fc6f19dd
Quick fix to limit source_id ballooning while inserting nodes
...
(cherry picked from commit 54f0a7d1ca )
2025-12-04 19:11:23 +08:00
yangdx
429cd6a66f
Fix top_n behavior with chunking to limit documents not chunks
...
- Disable API-level top_n when chunking
- Apply top_n to aggregated documents
- Add comprehensive test coverage
(cherry picked from commit 9009abed3e )
2025-12-04 19:11:22 +08:00
copilot-swe-agent[bot]
85f21aecd5
Fix chunking infinite loop when overlap_tokens >= max_tokens
...
Co-authored-by: netbrah <162479981+netbrah@users.noreply.github.com>
(cherry picked from commit 1d6ea0c5f7 )
2025-12-04 19:11:22 +08:00
netbrah
b65ef37569
Add Cohere reranker config, chunking, and tests
...
(cherry picked from commit a05bbf105e )
2025-12-04 19:11:22 +08:00
yangdx
8a8bdba8f4
Add comprehensive chunking tests with multi-token tokenizer edge cases
...
• Add MultiTokenCharacterTokenizer for testing
• Test token vs character counting accuracy
• Verify delimiter splitting precision
• Test overlap with distinctive content
• Add decode content preservation tests
(cherry picked from commit fec7c67f45 )
2025-12-04 19:11:22 +08:00
yangdx
7f7574c8b7
Add token limit validation for character-only chunking
...
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks
(cherry picked from commit f988a22652 )
2025-12-04 19:11:22 +08:00
yangdx
c50a1357a6
Fix ChunkTokenLimitExceededError message formatting
...
- Prevent passes two separate string objects to __init__
- Maintain same error output
(cherry picked from commit 6fea68bff9 )
2025-12-04 19:11:22 +08:00
yangdx
326acbf19b
Add comprehensive tests for chunking with recursive splitting
...
- Test recursive split mode
- Add edge case coverage
- Test parameter combinations
- Verify chunk order indexing
- Add integration test scenarios
(cherry picked from commit 5733292557 )
2025-12-04 19:11:21 +08:00
yangdx
6e3ff18570
Adjust chunking parameters to match the default environment variable settings
...
(cherry picked from commit e77340d4a1 )
2025-12-04 19:11:21 +08:00
EightyOliveira
b8dc5de81a
refactor(chunking): rename params and improve docstring for chunking_by_token_size
...
(cherry picked from commit dacca334e0 )
2025-12-04 19:11:21 +08:00
yangdx
d769a446d1
Support async chunking functions in LightRAG processing pipeline
...
- Add Awaitable and Union type imports
- Update chunking_func type annotation
- Handle coroutine results with await
- Add return type validation
- Update docstring for async support
(cherry picked from commit 940bec0b31 )
2025-12-04 19:11:21 +08:00
Tong Da
877f2c01d3
easier version: detect chunking_func result is coroutine or not
...
(cherry picked from commit 245df75d9c )
2025-12-04 19:11:21 +08:00
Tong Da
8a43e16f6e
support async chunking func to improve processing performance when a heavy chunking_func is passed in by user
...
(cherry picked from commit 7740500693 )
2025-12-04 19:11:20 +08:00
yangdx
70ba7cd787
Fix: Remove redundant entity/relation chunk deletions
...
(cherry picked from commit ea141e2779 )
2025-12-04 19:11:20 +08:00
yangdx
211dbc3f78
Remove unused chunk-based node/edge retrieval methods
...
(cherry picked from commit 807d2461d3 )
2025-12-04 19:11:20 +08:00
yangdx
ce702ccb2f
Add workspace parameter and remove chunk-based query unit tests
...
- Add workspace param to test storage init
- Remove get_nodes_by_chunk_ids tests
- Remove get_edges_by_chunk_ids tests
- Clean up batch operations test function
(cherry picked from commit 6b0f9795be )
2025-12-04 19:11:20 +08:00
anouarbm
7ce251c319
docs: Add documentation and examples for include_chunk_content parameter
...
Added comprehensive documentation for the new include_chunk_content parameter
that enables retrieval of actual chunk text content in API responses.
Documentation Updates:
- Added "Include Chunk Content in References" section to API README
- Explained use cases: RAG evaluation, debugging, citations, transparency
- Provided JSON request/response examples
- Clarified parameter interaction with include_references
OpenAPI/Swagger Examples:
- Added "Response with chunk content" example to /query endpoint
- Shows complete reference structure with content field
- Demonstrates realistic chunk text content
This makes the feature discoverable through:
1. API documentation (README.md)
2. Interactive Swagger UI (http://localhost:9621/docs )
3. Code examples for developers
(cherry picked from commit 963ad4c637 )
2025-12-04 19:11:20 +08:00
anouarbm
349c1945db
Optimize RAGAS evaluation with parallel execution and chunk content enrichment
...
Added efficient RAG evaluation system with optimized API calls and comprehensive benchmarking.
Key Features:
- Single API call per evaluation (2x faster than before)
- Parallel evaluation based on MAX_ASYNC environment variable
- Chunk content enrichment in /query endpoint responses
- Comprehensive benchmark statistics (moyennes)
- NaN-safe metric calculations
API Changes:
- Added include_chunk_content parameter to QueryRequest (backward compatible)
- /query endpoint enriches references with actual chunk content when requested
- No breaking changes - default behavior unchanged
Evaluation Improvements:
- Parallel execution using asyncio.Semaphore (respects MAX_ASYNC)
- Shared HTTP client with connection pooling
- Proper timeout handling (3min connect, 5min read)
- Debug output for context retrieval verification
- Benchmark statistics with averages, min/max scores
Results:
- Moyenne RAGAS Score: 0.9772
- Perfect Faithfulness: 1.0000
- Perfect Context Recall: 1.0000
- Perfect Context Precision: 1.0000
- Excellent Answer Relevance: 0.9087
(cherry picked from commit 0bbef9814e )
2025-12-04 19:11:20 +08:00
yangdx
8f16f6fe31
Fix entity and relationship deletion when no chunk references remain
...
(cherry picked from commit c81a56a113 )
2025-12-04 19:11:19 +08:00
yangdx
17a9771cfb
Add chunk tracking support to entity merge functionality
...
- Pass chunk storages to merge function
- Merge relation chunk tracking data
- Merge entity chunk tracking data
- Delete old chunk tracking records
- Persist chunk storage updates
(cherry picked from commit 2c09adb8d3 )
2025-12-04 19:11:19 +08:00
yangdx
450f969430
Add chunk tracking cleanup to entity/relation deletion and creation
...
• Clean up chunk storage on delete
• Track chunks in create operations
• Normalize relation keys consistently
(cherry picked from commit a3370b024d )
2025-12-04 19:11:19 +08:00
yangdx
7e0f12c28e
Enhance entity/relation editing with chunk tracking synchronization
...
• Add chunk storage sync to edit ops
• Implement incremental chunk ID updates
• Support entity renaming migrations
• Normalize relation keys consistently
• Preserve chunk references on edits
(cherry picked from commit 3fbd704bf9 )
2025-12-04 19:11:19 +08:00
yangdx
488f67e5b2
Fix entity and relation chunk cleanup in deletion pipeline
...
• Delete from entity_chunks storage
• Delete from relation_chunks storage
(cherry picked from commit 29bf593663 )
2025-12-04 19:11:19 +08:00
yangdx
cb5451faf8
Add entity/relation chunk tracking with configurable source ID limits
...
- Add entity_chunks & relation_chunks storage
- Implement KEEP/FIFO limit strategies
- Update env.example with new settings
- Add migration for chunk tracking data
- Support all KV storage
(cherry picked from commit dc62c78f98 )
2025-12-04 19:11:19 +08:00
yangdx
7248e09fc4
Allow related chunks missing in knowledge graph queries
...
(cherry picked from commit 35cd567c9e )
2025-12-04 19:11:18 +08:00
yangdx
851b45f726
Add pipeline status lock function for legacy compatibility
...
- Add get_pipeline_status_lock function
- Return NamespaceLock for consistency
- Support workspace parameter
- Enable logging option
- Legacy code compatibility
(cherry picked from commit 93d445dfdd )
2025-12-04 19:11:18 +08:00