yangdx
6fea68bff9
Fix ChunkTokenLimitExceededError message formatting
...
- Prevent passes two separate string objects to __init__
- Maintain same error output
2025-11-19 18:50:45 +08:00
yangdx
f988a22652
Add token limit validation for character-only chunking
...
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks
2025-11-19 18:32:43 +08:00
yangdx
95cd0ece74
Fix DOCX table extraction by escaping special characters in cells
...
- Add escape_cell() function
- Escape backslashes first
- Handle tabs and newlines
- Preserve tab-delimited format
- Prevent double-escaping issues
2025-11-19 09:54:35 +08:00
yangdx
87de2b3e9e
Update XLSX extraction documentation to reflect current implementation
2025-11-19 04:26:41 +08:00
yangdx
0244699d81
Optimize XLSX extraction by using sheet.max_column instead of two-pass scan
...
• Remove two-pass row scanning approach
• Use built-in sheet.max_column property
• Simplify column width detection logic
• Improve memory efficiency
• Maintain column alignment preservation
2025-11-19 04:02:39 +08:00
yangdx
2b16016312
Optimize XLSX extraction to avoid storing all rows in memory
...
• Remove intermediate row storage
• Use iterator twice instead of list()
• Preserve column alignment logic
• Reduce memory footprint
• Maintain same output format
2025-11-19 03:48:36 +08:00
yangdx
ef659a1e09
Preserve column alignment in XLSX extraction with two-pass processing
...
• Two-pass approach for consistent width
• Maintain tabular structure integrity
• Determine max columns first pass
• Extract with alignment second pass
• Prevent column misalignment issues
2025-11-19 03:34:22 +08:00
yangdx
3efb1716b4
Enhance XLSX extraction with structured tab-delimited format and escaping
...
- Add clear sheet separators
- Escape special characters
- Trim trailing empty columns
- Preserve row structure
- Single-pass optimization
2025-11-19 03:06:29 +08:00
yangdx
e7d2803a65
Remove text stripping in DOCX extraction to preserve whitespace
...
• Keep original paragraph spacing
• Preserve cell whitespace in tables
• Maintain document formatting
• Don't strip leading/trailing spaces
2025-11-19 02:12:27 +08:00
yangdx
186c8f0e16
Preserve blank paragraphs in DOCX extraction to maintain spacing
...
• Remove text emptiness check
• Always append paragraph text
• Maintain document formatting
• Preserve original spacing
2025-11-19 02:03:10 +08:00
yangdx
fa887d811b
Fix table column structure preservation in DOCX extraction
...
• Always append cell text to maintain columns
• Preserve empty cells in table structure
• Check for any content before adding rows
• Use tab separation for proper alignment
• Improve table formatting consistency
2025-11-19 01:52:02 +08:00
yangdx
4438ba41a3
Enhance DOCX extraction to preserve document order with tables
...
• Include tables in extracted content
• Maintain original document order
• Add spacing around tables
• Use tabs to separate table cells
• Process all body elements sequentially
2025-11-19 01:31:33 +08:00
yangdx
d16c7840ab
Bump API version to 0256
2025-11-18 23:15:31 +08:00
yangdx
e77340d4a1
Adjust chunking parameters to match the default environment variable settings
2025-11-18 23:14:50 +08:00
yangdx
1bfa1f81cb
Merge branch 'main' into fix_chunk_comment
2025-11-18 22:38:50 +08:00
yangdx
9c10c87554
Fix linting
2025-11-18 22:38:43 +08:00
yangdx
dbae327a17
Merge branch 'main' into dev-postgres-vchordrq
2025-11-18 22:13:27 +08:00
yangdx
3096f844fb
fix(postgres): allow vchordrq.epsilon config when probes is empty
...
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.
This fixes the documented epsilon setting being impossible to use in the
default configuration.
2025-11-18 21:58:36 +08:00
EightyOliveira
dacca334e0
refactor(chunking): rename params and improve docstring for chunking_by_token_size
2025-11-18 15:46:28 +08:00
yangdx
702cfd2981
Fix document deletion concurrency control and validation logic
...
• Clarify job naming for single vs batch deletion
• Update job name validation in busy pipeline check
2025-11-18 13:59:24 +08:00
yangdx
4048fc4b89
Fix: auto-acquire pipeline when idle in document deletion
...
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation
2025-11-18 13:25:13 +08:00
yangdx
1745b30a5f
Fix missing workspace parameter in update flags status call
2025-11-18 12:55:48 +08:00
yangdx
f8dd2e0724
Fix namespace parsing when workspace contains colons
...
• Use rsplit instead of split
• Handle colons in workspace names
2025-11-18 12:23:05 +08:00
wmsnp
d07023c962
feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic
2025-11-18 11:45:16 +08:00
yangdx
6cef8df159
Reduce log level and improve workspace mismatch message clarity
...
• Change warning to info level
• Simplify workspace mismatch wording
2025-11-18 08:25:21 +08:00
yangdx
ddc76f0c80
Merge branch 'main' into workspace-isolation
2025-11-17 17:08:07 +08:00
yangdx
9262f66d13
Bump API version to 0255
2025-11-17 17:07:18 +08:00
yangdx
393f880311
Improve LightRAG initialization checker tool with better usage docs
...
• Add workspace param to get_namespace_data
• Update docstring with proper usage example
• Simplify demo to show correct workflow
• Remove confusing before/after comparison
• Clarify tool should run after init
2025-11-17 15:42:54 +08:00
yangdx
9d7b7981ce
Add pipeline status validation before document deletion
2025-11-17 14:58:10 +08:00
yangdx
98e964dfc4
Fix initialization instructions in check_lightrag_setup function
2025-11-17 14:27:26 +08:00
yangdx
6d6716e9f8
Add _default_workspace to shared storage finalization
...
- Add _default_workspace to global vars
- Set _default_workspace to None on cleanup
- Ensure complete resource cleanup
- Fix missing workspace finalization
2025-11-17 13:46:46 +08:00
yangdx
f1d8f18c80
Merge branch 'main' into workspace-isolation
2025-11-17 13:01:33 +08:00
yangdx
cdd53ee875
Remove manual initialize_pipeline_status() calls across codebase
...
- Auto-init pipeline status in storages
- Remove redundant import statements
- Simplify initialization pattern
- Update docs and examples
2025-11-17 12:54:33 +08:00
yangdx
e22ac52ebc
Auto-initialize pipeline status in LightRAG.initialize_storages()
...
• Remove manual initialize_pipeline_status calls
• Auto-init in initialize_storages method
• Update error messages for clarity
• Warn on workspace conflicts
2025-11-17 12:54:33 +08:00
yangdx
e8383df3b8
Fix NamespaceLock context variable timing to prevent lock bricking
...
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
2025-11-17 12:54:33 +08:00
yangdx
95e1fb1612
Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py
2025-11-17 12:54:33 +08:00
yangdx
7ed0eac4c9
Fix workspace filtering logic in get_all_update_flags_status
...
• Handle namespaces with/without prefixes
• Fix workspace matching logic
2025-11-17 12:54:33 +08:00
yangdx
78689e8837
Fix pipeline status namespace check to handle root case
...
- Add check for bare "pipeline_status"
- Handle namespace without prefix
2025-11-17 12:54:33 +08:00
yangdx
d54d0d55d9
Standardize empty workspace handling from "_" to "" across storage
...
* Unify empty workspace behavior by changing workspace from "_" to ""
* Fixed incorrect empty workspace detection in get_all_update_flags_status()
2025-11-17 12:54:33 +08:00
yangdx
b6a5a90eaf
Fix NamespaceLock concurrent coroutine safety with ContextVar
...
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check
2025-11-17 12:54:33 +08:00
yangdx
fd486bc922
Refactor storage classes to use namespace instead of final_namespace
2025-11-17 12:54:33 +08:00
yangdx
01814bfc7a
Fix missing function call parentheses in get_all_update_flags_status
2025-11-17 12:54:33 +08:00
yangdx
7deb9a64b9
Refactor namespace lock to support reusable async context manager
...
• Add NamespaceLock class wrapper
• Fix lock re-entrance issues
• Enable concurrent lock usage
• Fresh context per async with block
• Update get_namespace_lock API
2025-11-17 12:54:33 +08:00
yangdx
52c812b9a0
Fix workspace isolation for pipeline status across all operations
...
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls
2025-11-17 12:54:33 +08:00
yangdx
926960e957
Refactor workspace handling to use default workspace and namespace locks
...
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking
2025-11-17 12:54:33 +08:00
yangdx
ec05d89c2a
Add macOS fork safety check for Gunicorn multi-worker mode
...
• Check OBJC_DISABLE_INITIALIZE_FORK_SAFETY
• Prevent NumPy/Accelerate crashes
• Show detailed error message
• Provide multiple fix options
• Exit early if misconfigured
2025-11-17 12:54:33 +08:00
yangdx
e5addf4d94
Improve embedding config priority and add debug logging
...
• Fix embedding_dim priority logic
• Add final config logging
2025-11-17 12:54:32 +08:00
yangdx
2fb57e767d
Fix embedding token limit initialization order
...
* Capture max_token_size before decorator
* Apply wrapper after capturing attribute
* Prevent decorator from stripping dataclass
* Ensure token limit is properly set
2025-11-17 12:54:32 +08:00
yangdx
6b2af2b579
Refactor embedding function creation with proper attribute inheritance
...
- Extract max_token_size from providers
- Avoid double-wrapping EmbeddingFunc
- Improve configuration priority logic
- Add comprehensive debug logging
- Return complete EmbeddingFunc instance
2025-11-17 12:54:32 +08:00
yangdx
f0254773c6
Convert embedding_token_limit from property to field with __post_init__
...
• Remove property decorator
• Add field with init=False
• Set value in __post_init__ method
• embedding_token_limit is now in config dictionary
2025-11-17 12:54:32 +08:00