yangdx
b46c152306
Fix linting
2025-11-21 17:16:44 +08:00
yangdx
b709f8f869
Consolidate Azure OpenAI implementation into main OpenAI module
...
• Unified OpenAI/Azure client creation
• Azure module now re-exports functions
• Backward compatibility maintained
• Reduced code duplication
2025-11-21 17:12:33 +08:00
yangdx
66d6c7dd6f
Refactor main function to provide sync CLI entry point
2025-11-21 13:11:55 +08:00
yangdx
02fdceb959
Update OpenAI client to use stable API and bump minimum version to 2.0.0
...
- Remove beta prefix from completions.parse
- Update OpenAI dependency to >=2.0.0
- Fix whitespace formatting
- Update all requirement files
- Clean up pyproject.toml dependencies
2025-11-21 12:55:44 +08:00
yangdx
9f69c5bf85
feat: Support structured output parsed from OpenAI
...
Added support for structured output (JSON mode) from the OpenAI API in `openai.py` and `azure_openai.py`.
When `response_format` is used to request structured data, the new logic checks for the `message.parsed` attribute. If it exists, it's serialized into a JSON string as the final content. If not, the code falls back to the existing `message.content` handling, ensuring backward compatibility.
2025-11-21 12:46:31 +08:00
yangdx
c9e1c86e81
Refactor keyword extraction handling to centralize response format logic
...
• Move response format to core function
• Remove duplicate format assignments
• Standardize keyword extraction flow
• Clean up redundant parameter handling
• Improve Azure OpenAI compatibility
2025-11-21 12:10:04 +08:00
yangdx
46ce6d9a13
Fix Azure OpenAI embedding model parameter fallback
...
- Use model param if provided
- Fall back to deployment name
- Fix embedding API call
- Improve parameter handling
2025-11-20 18:20:22 +08:00
Amritpal Singh
30e86fa331
use deployment variable which extracted value from .env file or have default value
2025-11-20 09:00:27 +00:00
yangdx
b7de694f48
Add comprehensive error logging across API routes
...
- Add error logs to Ollama API endpoints
- Replace logging with unified logger
- Log streaming query errors
- Add data query error logging
- Include stack traces for debugging
2025-11-19 22:50:06 +08:00
yangdx
0fb2925c6a
Remove ascii_colors dependency and fix stream handling errors
...
• Remove ascii_colors.trace_exception calls
• Add SafeStreamHandler for closed streams
• Patch ascii_colors console handler
• Prevent ValueError on stream close
• Improve logging error handling
2025-11-19 21:38:17 +08:00
yangdx
6fea68bff9
Fix ChunkTokenLimitExceededError message formatting
...
- Prevent passes two separate string objects to __init__
- Maintain same error output
2025-11-19 18:50:45 +08:00
yangdx
f988a22652
Add token limit validation for character-only chunking
...
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks
2025-11-19 18:32:43 +08:00
yangdx
95cd0ece74
Fix DOCX table extraction by escaping special characters in cells
...
- Add escape_cell() function
- Escape backslashes first
- Handle tabs and newlines
- Preserve tab-delimited format
- Prevent double-escaping issues
2025-11-19 09:54:35 +08:00
yangdx
87de2b3e9e
Update XLSX extraction documentation to reflect current implementation
2025-11-19 04:26:41 +08:00
yangdx
0244699d81
Optimize XLSX extraction by using sheet.max_column instead of two-pass scan
...
• Remove two-pass row scanning approach
• Use built-in sheet.max_column property
• Simplify column width detection logic
• Improve memory efficiency
• Maintain column alignment preservation
2025-11-19 04:02:39 +08:00
yangdx
2b16016312
Optimize XLSX extraction to avoid storing all rows in memory
...
• Remove intermediate row storage
• Use iterator twice instead of list()
• Preserve column alignment logic
• Reduce memory footprint
• Maintain same output format
2025-11-19 03:48:36 +08:00
yangdx
ef659a1e09
Preserve column alignment in XLSX extraction with two-pass processing
...
• Two-pass approach for consistent width
• Maintain tabular structure integrity
• Determine max columns first pass
• Extract with alignment second pass
• Prevent column misalignment issues
2025-11-19 03:34:22 +08:00
yangdx
3efb1716b4
Enhance XLSX extraction with structured tab-delimited format and escaping
...
- Add clear sheet separators
- Escape special characters
- Trim trailing empty columns
- Preserve row structure
- Single-pass optimization
2025-11-19 03:06:29 +08:00
yangdx
e7d2803a65
Remove text stripping in DOCX extraction to preserve whitespace
...
• Keep original paragraph spacing
• Preserve cell whitespace in tables
• Maintain document formatting
• Don't strip leading/trailing spaces
2025-11-19 02:12:27 +08:00
yangdx
186c8f0e16
Preserve blank paragraphs in DOCX extraction to maintain spacing
...
• Remove text emptiness check
• Always append paragraph text
• Maintain document formatting
• Preserve original spacing
2025-11-19 02:03:10 +08:00
yangdx
fa887d811b
Fix table column structure preservation in DOCX extraction
...
• Always append cell text to maintain columns
• Preserve empty cells in table structure
• Check for any content before adding rows
• Use tab separation for proper alignment
• Improve table formatting consistency
2025-11-19 01:52:02 +08:00
yangdx
4438ba41a3
Enhance DOCX extraction to preserve document order with tables
...
• Include tables in extracted content
• Maintain original document order
• Add spacing around tables
• Use tabs to separate table cells
• Process all body elements sequentially
2025-11-19 01:31:33 +08:00
yangdx
d16c7840ab
Bump API version to 0256
2025-11-18 23:15:31 +08:00
yangdx
e77340d4a1
Adjust chunking parameters to match the default environment variable settings
2025-11-18 23:14:50 +08:00
yangdx
1bfa1f81cb
Merge branch 'main' into fix_chunk_comment
2025-11-18 22:38:50 +08:00
yangdx
9c10c87554
Fix linting
2025-11-18 22:38:43 +08:00
yangdx
dbae327a17
Merge branch 'main' into dev-postgres-vchordrq
2025-11-18 22:13:27 +08:00
yangdx
3096f844fb
fix(postgres): allow vchordrq.epsilon config when probes is empty
...
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.
This fixes the documented epsilon setting being impossible to use in the
default configuration.
2025-11-18 21:58:36 +08:00
EightyOliveira
dacca334e0
refactor(chunking): rename params and improve docstring for chunking_by_token_size
2025-11-18 15:46:28 +08:00
yangdx
702cfd2981
Fix document deletion concurrency control and validation logic
...
• Clarify job naming for single vs batch deletion
• Update job name validation in busy pipeline check
2025-11-18 13:59:24 +08:00
yangdx
4048fc4b89
Fix: auto-acquire pipeline when idle in document deletion
...
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation
2025-11-18 13:25:13 +08:00
yangdx
1745b30a5f
Fix missing workspace parameter in update flags status call
2025-11-18 12:55:48 +08:00
yangdx
f8dd2e0724
Fix namespace parsing when workspace contains colons
...
• Use rsplit instead of split
• Handle colons in workspace names
2025-11-18 12:23:05 +08:00
wmsnp
d07023c962
feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic
2025-11-18 11:45:16 +08:00
yangdx
6cef8df159
Reduce log level and improve workspace mismatch message clarity
...
• Change warning to info level
• Simplify workspace mismatch wording
2025-11-18 08:25:21 +08:00
yangdx
ddc76f0c80
Merge branch 'main' into workspace-isolation
2025-11-17 17:08:07 +08:00
yangdx
9262f66d13
Bump API version to 0255
2025-11-17 17:07:18 +08:00
yangdx
393f880311
Improve LightRAG initialization checker tool with better usage docs
...
• Add workspace param to get_namespace_data
• Update docstring with proper usage example
• Simplify demo to show correct workflow
• Remove confusing before/after comparison
• Clarify tool should run after init
2025-11-17 15:42:54 +08:00
yangdx
9d7b7981ce
Add pipeline status validation before document deletion
2025-11-17 14:58:10 +08:00
yangdx
98e964dfc4
Fix initialization instructions in check_lightrag_setup function
2025-11-17 14:27:26 +08:00
yangdx
6d6716e9f8
Add _default_workspace to shared storage finalization
...
- Add _default_workspace to global vars
- Set _default_workspace to None on cleanup
- Ensure complete resource cleanup
- Fix missing workspace finalization
2025-11-17 13:46:46 +08:00
yangdx
f1d8f18c80
Merge branch 'main' into workspace-isolation
2025-11-17 13:01:33 +08:00
yangdx
cdd53ee875
Remove manual initialize_pipeline_status() calls across codebase
...
- Auto-init pipeline status in storages
- Remove redundant import statements
- Simplify initialization pattern
- Update docs and examples
2025-11-17 12:54:33 +08:00
yangdx
e22ac52ebc
Auto-initialize pipeline status in LightRAG.initialize_storages()
...
• Remove manual initialize_pipeline_status calls
• Auto-init in initialize_storages method
• Update error messages for clarity
• Warn on workspace conflicts
2025-11-17 12:54:33 +08:00
yangdx
e8383df3b8
Fix NamespaceLock context variable timing to prevent lock bricking
...
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
2025-11-17 12:54:33 +08:00
yangdx
95e1fb1612
Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py
2025-11-17 12:54:33 +08:00
yangdx
7ed0eac4c9
Fix workspace filtering logic in get_all_update_flags_status
...
• Handle namespaces with/without prefixes
• Fix workspace matching logic
2025-11-17 12:54:33 +08:00
yangdx
78689e8837
Fix pipeline status namespace check to handle root case
...
- Add check for bare "pipeline_status"
- Handle namespace without prefix
2025-11-17 12:54:33 +08:00
yangdx
d54d0d55d9
Standardize empty workspace handling from "_" to "" across storage
...
* Unify empty workspace behavior by changing workspace from "_" to ""
* Fixed incorrect empty workspace detection in get_all_update_flags_status()
2025-11-17 12:54:33 +08:00
yangdx
b6a5a90eaf
Fix NamespaceLock concurrent coroutine safety with ContextVar
...
- Use ContextVar for per-coroutine storage
- Prevent state interference between coroutines
- Add re-entrance protection check
2025-11-17 12:54:33 +08:00