Commit graph

3740 commits

Author SHA1 Message Date
yangdx
2b16016312 Optimize XLSX extraction to avoid storing all rows in memory
• Remove intermediate row storage
• Use iterator twice instead of list()
• Preserve column alignment logic
• Reduce memory footprint
• Maintain same output format
2025-11-19 03:48:36 +08:00
yangdx
ef659a1e09 Preserve column alignment in XLSX extraction with two-pass processing
• Two-pass approach for consistent width
• Maintain tabular structure integrity
• Determine max columns first pass
• Extract with alignment second pass
• Prevent column misalignment issues
2025-11-19 03:34:22 +08:00
yangdx
3efb1716b4 Enhance XLSX extraction with structured tab-delimited format and escaping
- Add clear sheet separators
- Escape special characters
- Trim trailing empty columns
- Preserve row structure
- Single-pass optimization
2025-11-19 03:06:29 +08:00
yangdx
e7d2803a65 Remove text stripping in DOCX extraction to preserve whitespace
• Keep original paragraph spacing
• Preserve cell whitespace in tables
• Maintain document formatting
• Don't strip leading/trailing spaces
2025-11-19 02:12:27 +08:00
yangdx
186c8f0e16 Preserve blank paragraphs in DOCX extraction to maintain spacing
• Remove text emptiness check
• Always append paragraph text
• Maintain document formatting
• Preserve original spacing
2025-11-19 02:03:10 +08:00
yangdx
fa887d811b Fix table column structure preservation in DOCX extraction
• Always append cell text to maintain columns
• Preserve empty cells in table structure
• Check for any content before adding rows
• Use tab separation for proper alignment
• Improve table formatting consistency
2025-11-19 01:52:02 +08:00
yangdx
4438ba41a3 Enhance DOCX extraction to preserve document order with tables
• Include tables in extracted content
• Maintain original document order
• Add spacing around tables
• Use tabs to separate table cells
• Process all body elements sequentially
2025-11-19 01:31:33 +08:00
yangdx
d16c7840ab Bump API version to 0256 2025-11-18 23:15:31 +08:00
yangdx
e77340d4a1 Adjust chunking parameters to match the default environment variable settings 2025-11-18 23:14:50 +08:00
yangdx
1bfa1f81cb Merge branch 'main' into fix_chunk_comment 2025-11-18 22:38:50 +08:00
yangdx
9c10c87554 Fix linting 2025-11-18 22:38:43 +08:00
yangdx
dbae327a17 Merge branch 'main' into dev-postgres-vchordrq 2025-11-18 22:13:27 +08:00
yangdx
3096f844fb fix(postgres): allow vchordrq.epsilon config when probes is empty
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.

This fixes the documented epsilon setting being impossible to use in the
default configuration.
2025-11-18 21:58:36 +08:00
EightyOliveira
dacca334e0 refactor(chunking): rename params and improve docstring for chunking_by_token_size 2025-11-18 15:46:28 +08:00
yangdx
702cfd2981 Fix document deletion concurrency control and validation logic
• Clarify job naming for single vs batch deletion
• Update job name validation in busy pipeline check
2025-11-18 13:59:24 +08:00
yangdx
4048fc4b89 Fix: auto-acquire pipeline when idle in document deletion
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation
2025-11-18 13:25:13 +08:00
yangdx
1745b30a5f Fix missing workspace parameter in update flags status call 2025-11-18 12:55:48 +08:00
yangdx
f8dd2e0724 Fix namespace parsing when workspace contains colons
• Use rsplit instead of split
• Handle colons in workspace names
2025-11-18 12:23:05 +08:00
wmsnp
d07023c962
feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic 2025-11-18 11:45:16 +08:00
yangdx
6cef8df159 Reduce log level and improve workspace mismatch message clarity
• Change warning to info level
• Simplify workspace mismatch wording
2025-11-18 08:25:21 +08:00
yangdx
ddc76f0c80 Merge branch 'main' into workspace-isolation 2025-11-17 17:08:07 +08:00
yangdx
9262f66d13 Bump API version to 0255 2025-11-17 17:07:18 +08:00
yangdx
393f880311 Improve LightRAG initialization checker tool with better usage docs
• Add workspace param to get_namespace_data
• Update docstring with proper usage example
• Simplify demo to show correct workflow
• Remove confusing before/after comparison
• Clarify tool should run after init
2025-11-17 15:42:54 +08:00
yangdx
9d7b7981ce Add pipeline status validation before document deletion 2025-11-17 14:58:10 +08:00
yangdx
98e964dfc4 Fix initialization instructions in check_lightrag_setup function 2025-11-17 14:27:26 +08:00
yangdx
6d6716e9f8 Add _default_workspace to shared storage finalization
- Add _default_workspace to global vars
- Set _default_workspace to None on cleanup
- Ensure complete resource cleanup
- Fix missing workspace finalization
2025-11-17 13:46:46 +08:00
yangdx
f1d8f18c80 Merge branch 'main' into workspace-isolation 2025-11-17 13:01:33 +08:00
yangdx
cdd53ee875 Remove manual initialize_pipeline_status() calls across codebase
- Auto-init pipeline status in storages
- Remove redundant import statements
- Simplify initialization pattern
- Update docs and examples
2025-11-17 12:54:33 +08:00
yangdx
e22ac52ebc Auto-initialize pipeline status in LightRAG.initialize_storages()
• Remove manual initialize_pipeline_status calls
• Auto-init in initialize_storages method
• Update error messages for clarity
• Warn on workspace conflicts
2025-11-17 12:54:33 +08:00
yangdx
e8383df3b8 Fix NamespaceLock context variable timing to prevent lock bricking
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
2025-11-17 12:54:33 +08:00
yangdx
95e1fb1612 Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
yangdx
7ed0eac4c9 Fix workspace filtering logic in get_all_update_flags_status
• Handle namespaces with/without prefixes
• Fix workspace matching logic
2025-11-17 12:54:33 +08:00
yangdx
78689e8837 Fix pipeline status namespace check to handle root case
- Add check for bare "pipeline_status"
- Handle namespace without prefix
2025-11-17 12:54:33 +08:00
yangdx
d54d0d55d9 Standardize empty workspace handling from "_" to "" across storage
* Unify empty workspace behavior by changing workspace from "_" to ""
* Fixed incorrect empty workspace detection in get_all_update_flags_status()
2025-11-17 12:54:33 +08:00
yangdx
b6a5a90eaf Fix NamespaceLock concurrent coroutine safety with ContextVar
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check
2025-11-17 12:54:33 +08:00
yangdx
fd486bc922 Refactor storage classes to use namespace instead of final_namespace 2025-11-17 12:54:33 +08:00
yangdx
01814bfc7a Fix missing function call parentheses in get_all_update_flags_status 2025-11-17 12:54:33 +08:00
yangdx
7deb9a64b9 Refactor namespace lock to support reusable async context manager
• Add NamespaceLock class wrapper
• Fix lock re-entrance issues
• Enable concurrent lock usage
• Fresh context per async with block
• Update get_namespace_lock API
2025-11-17 12:54:33 +08:00
yangdx
52c812b9a0 Fix workspace isolation for pipeline status across all operations
- Fix final_namespace error in get_namespace_data()
- Fix get_workspace_from_request return type
- Add workspace param to pipeline status calls
2025-11-17 12:54:33 +08:00
yangdx
926960e957 Refactor workspace handling to use default workspace and namespace locks
- Remove DB-specific workspace configs
- Add default workspace auto-setting
- Replace global locks with namespace locks
- Simplify pipeline status management
- Remove redundant graph DB locking
2025-11-17 12:54:33 +08:00
yangdx
ec05d89c2a Add macOS fork safety check for Gunicorn multi-worker mode
• Check OBJC_DISABLE_INITIALIZE_FORK_SAFETY
• Prevent NumPy/Accelerate crashes
• Show detailed error message
• Provide multiple fix options
• Exit early if misconfigured
2025-11-17 12:54:33 +08:00
yangdx
e5addf4d94 Improve embedding config priority and add debug logging
• Fix embedding_dim priority logic
• Add final config logging
2025-11-17 12:54:32 +08:00
yangdx
2fb57e767d Fix embedding token limit initialization order
* Capture max_token_size before decorator
* Apply wrapper after capturing attribute
* Prevent decorator from stripping dataclass
* Ensure token limit is properly set
2025-11-17 12:54:32 +08:00
yangdx
6b2af2b579 Refactor embedding function creation with proper attribute inheritance
- Extract max_token_size from providers
- Avoid double-wrapping EmbeddingFunc
- Improve configuration priority logic
- Add comprehensive debug logging
- Return complete EmbeddingFunc instance
2025-11-17 12:54:32 +08:00
yangdx
f0254773c6 Convert embedding_token_limit from property to field with __post_init__
• Remove property decorator
• Add field with init=False
• Set value in __post_init__ method
• embedding_token_limit is now in config dictionary
2025-11-17 12:54:32 +08:00
yangdx
14a6c24ed7 Add configurable embedding token limit with validation
- Add EMBEDDING_TOKEN_LIMIT env var
- Set max_token_size on embedding func
- Add token limit property to LightRAG
- Validate summary length vs limit
- Log warning when limit exceeded
2025-11-17 12:54:32 +08:00
yangdx
f5b48587ed Improve Bedrock error handling with retry logic and custom exceptions
• Add specific exception types
• Implement proper retry mechanism
• Better error classification
• Enhanced logging and validation
• Enable embedding retry decorator
2025-11-17 12:54:32 +08:00
yangdx
77221564b0 Add max_token_size parameter to embedding function decorators
- Add max_token_size=8192 to all embed funcs
- Move siliconcloud to deprecated folder
- Import wrap_embedding_func_with_attrs
- Update EmbeddingFunc docstring
- Fix langfuse import type annotation
2025-11-17 12:54:32 +08:00
yangdx
8283c86bce Refactor exception handling in MemgraphStorage label methods 2025-11-17 12:54:32 +08:00
yangdx
423e4e927a Fix null reference errors in graph database error handling
- Initialize result vars to None
- Add null checks before consume calls
- Prevent crashes in except blocks
- Apply fix to both Neo4J and Memgraph
2025-11-17 12:54:32 +08:00