Commit graph

3707 commits

Author SHA1 Message Date
yangdx
45f4f82392 Refactor Azure OpenAI client creation to support client_configs merging
- Handle None client_configs case
- Merge configs with explicit params
- Override client_configs with params
- Use dict unpacking for client init
- Maintain parameter precedence
2025-11-21 19:14:16 +08:00
yangdx
0c4cba3860 Fix double decoration in azure_openai_embed and document decorator usage
• Remove redundant @retry decorator
• Call openai_embed.func directly
• Add detailed decorator documentation
• Prevent double parameter injection
• Fix EmbeddingFunc wrapping issues
2025-11-21 18:03:53 +08:00
yangdx
b46c152306 Fix linting 2025-11-21 17:16:44 +08:00
yangdx
b709f8f869 Consolidate Azure OpenAI implementation into main OpenAI module
• Unified OpenAI/Azure client creation
• Azure module now re-exports functions
• Backward compatibility maintained
• Reduced code duplication
2025-11-21 17:12:33 +08:00
yangdx
66d6c7dd6f Refactor main function to provide sync CLI entry point 2025-11-21 13:11:55 +08:00
yangdx
02fdceb959 Update OpenAI client to use stable API and bump minimum version to 2.0.0
- Remove beta prefix from completions.parse
- Update OpenAI dependency to >=2.0.0
- Fix whitespace formatting
- Update all requirement files
- Clean up pyproject.toml dependencies
2025-11-21 12:55:44 +08:00
yangdx
9f69c5bf85 feat: Support structured output parsed from OpenAI
Added support for structured output (JSON mode) from the OpenAI API in `openai.py` and `azure_openai.py`.

When `response_format` is used to request structured data, the new logic checks for the `message.parsed` attribute. If it exists, it's serialized into a JSON string as the final content. If not, the code falls back to the existing `message.content` handling, ensuring backward compatibility.
2025-11-21 12:46:31 +08:00
yangdx
c9e1c86e81 Refactor keyword extraction handling to centralize response format logic
• Move response format to core function
• Remove duplicate format assignments
• Standardize keyword extraction flow
• Clean up redundant parameter handling
• Improve Azure OpenAI compatibility
2025-11-21 12:10:04 +08:00
yangdx
46ce6d9a13 Fix Azure OpenAI embedding model parameter fallback
- Use model param if provided
- Fall back to deployment name
- Fix embedding API call
- Improve parameter handling
2025-11-20 18:20:22 +08:00
Amritpal Singh
30e86fa331 use deployment variable which extracted value from .env file or have default value 2025-11-20 09:00:27 +00:00
yangdx
b7de694f48 Add comprehensive error logging across API routes
- Add error logs to Ollama API endpoints
- Replace logging with unified logger
- Log streaming query errors
- Add data query error logging
- Include stack traces for debugging
2025-11-19 22:50:06 +08:00
yangdx
0fb2925c6a Remove ascii_colors dependency and fix stream handling errors
• Remove ascii_colors.trace_exception calls
• Add SafeStreamHandler for closed streams
• Patch ascii_colors console handler
• Prevent ValueError on stream close
• Improve logging error handling
2025-11-19 21:38:17 +08:00
yangdx
6fea68bff9 Fix ChunkTokenLimitExceededError message formatting
- Prevent passes two separate string objects to __init__
- Maintain same error output
2025-11-19 18:50:45 +08:00
yangdx
f988a22652 Add token limit validation for character-only chunking
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks
2025-11-19 18:32:43 +08:00
yangdx
95cd0ece74 Fix DOCX table extraction by escaping special characters in cells
- Add escape_cell() function
- Escape backslashes first
- Handle tabs and newlines
- Preserve tab-delimited format
- Prevent double-escaping issues
2025-11-19 09:54:35 +08:00
yangdx
87de2b3e9e Update XLSX extraction documentation to reflect current implementation 2025-11-19 04:26:41 +08:00
yangdx
0244699d81 Optimize XLSX extraction by using sheet.max_column instead of two-pass scan
• Remove two-pass row scanning approach
• Use built-in sheet.max_column property
• Simplify column width detection logic
• Improve memory efficiency
• Maintain column alignment preservation
2025-11-19 04:02:39 +08:00
yangdx
2b16016312 Optimize XLSX extraction to avoid storing all rows in memory
• Remove intermediate row storage
• Use iterator twice instead of list()
• Preserve column alignment logic
• Reduce memory footprint
• Maintain same output format
2025-11-19 03:48:36 +08:00
yangdx
ef659a1e09 Preserve column alignment in XLSX extraction with two-pass processing
• Two-pass approach for consistent width
• Maintain tabular structure integrity
• Determine max columns first pass
• Extract with alignment second pass
• Prevent column misalignment issues
2025-11-19 03:34:22 +08:00
yangdx
3efb1716b4 Enhance XLSX extraction with structured tab-delimited format and escaping
- Add clear sheet separators
- Escape special characters
- Trim trailing empty columns
- Preserve row structure
- Single-pass optimization
2025-11-19 03:06:29 +08:00
yangdx
e7d2803a65 Remove text stripping in DOCX extraction to preserve whitespace
• Keep original paragraph spacing
• Preserve cell whitespace in tables
• Maintain document formatting
• Don't strip leading/trailing spaces
2025-11-19 02:12:27 +08:00
yangdx
186c8f0e16 Preserve blank paragraphs in DOCX extraction to maintain spacing
• Remove text emptiness check
• Always append paragraph text
• Maintain document formatting
• Preserve original spacing
2025-11-19 02:03:10 +08:00
yangdx
fa887d811b Fix table column structure preservation in DOCX extraction
• Always append cell text to maintain columns
• Preserve empty cells in table structure
• Check for any content before adding rows
• Use tab separation for proper alignment
• Improve table formatting consistency
2025-11-19 01:52:02 +08:00
yangdx
4438ba41a3 Enhance DOCX extraction to preserve document order with tables
• Include tables in extracted content
• Maintain original document order
• Add spacing around tables
• Use tabs to separate table cells
• Process all body elements sequentially
2025-11-19 01:31:33 +08:00
yangdx
d16c7840ab Bump API version to 0256 2025-11-18 23:15:31 +08:00
yangdx
e77340d4a1 Adjust chunking parameters to match the default environment variable settings 2025-11-18 23:14:50 +08:00
yangdx
1bfa1f81cb Merge branch 'main' into fix_chunk_comment 2025-11-18 22:38:50 +08:00
yangdx
9c10c87554 Fix linting 2025-11-18 22:38:43 +08:00
yangdx
dbae327a17 Merge branch 'main' into dev-postgres-vchordrq 2025-11-18 22:13:27 +08:00
yangdx
3096f844fb fix(postgres): allow vchordrq.epsilon config when probes is empty
Previously, configure_vchordrq would fail silently when probes was empty
(the default), preventing epsilon from being configured. Now each parameter
is handled independently with conditional execution, and configuration
errors fail-fast instead of being swallowed.

This fixes the documented epsilon setting being impossible to use in the
default configuration.
2025-11-18 21:58:36 +08:00
EightyOliveira
dacca334e0 refactor(chunking): rename params and improve docstring for chunking_by_token_size 2025-11-18 15:46:28 +08:00
yangdx
702cfd2981 Fix document deletion concurrency control and validation logic
• Clarify job naming for single vs batch deletion
• Update job name validation in busy pipeline check
2025-11-18 13:59:24 +08:00
yangdx
4048fc4b89 Fix: auto-acquire pipeline when idle in document deletion
• Track if we acquired the pipeline lock
• Auto-acquire pipeline when idle
• Only release if we acquired it
• Prevent concurrent deletion conflicts
• Improve deletion job validation
2025-11-18 13:25:13 +08:00
yangdx
1745b30a5f Fix missing workspace parameter in update flags status call 2025-11-18 12:55:48 +08:00
yangdx
f8dd2e0724 Fix namespace parsing when workspace contains colons
• Use rsplit instead of split
• Handle colons in workspace names
2025-11-18 12:23:05 +08:00
wmsnp
d07023c962
feat(postgres_impl): add vchordrq vector index support and unify vector index creation logic 2025-11-18 11:45:16 +08:00
yangdx
6cef8df159 Reduce log level and improve workspace mismatch message clarity
• Change warning to info level
• Simplify workspace mismatch wording
2025-11-18 08:25:21 +08:00
yangdx
ddc76f0c80 Merge branch 'main' into workspace-isolation 2025-11-17 17:08:07 +08:00
yangdx
9262f66d13 Bump API version to 0255 2025-11-17 17:07:18 +08:00
yangdx
393f880311 Improve LightRAG initialization checker tool with better usage docs
• Add workspace param to get_namespace_data
• Update docstring with proper usage example
• Simplify demo to show correct workflow
• Remove confusing before/after comparison
• Clarify tool should run after init
2025-11-17 15:42:54 +08:00
yangdx
9d7b7981ce Add pipeline status validation before document deletion 2025-11-17 14:58:10 +08:00
yangdx
98e964dfc4 Fix initialization instructions in check_lightrag_setup function 2025-11-17 14:27:26 +08:00
yangdx
6d6716e9f8 Add _default_workspace to shared storage finalization
- Add _default_workspace to global vars
- Set _default_workspace to None on cleanup
- Ensure complete resource cleanup
- Fix missing workspace finalization
2025-11-17 13:46:46 +08:00
yangdx
f1d8f18c80 Merge branch 'main' into workspace-isolation 2025-11-17 13:01:33 +08:00
yangdx
cdd53ee875 Remove manual initialize_pipeline_status() calls across codebase
- Auto-init pipeline status in storages
- Remove redundant import statements
- Simplify initialization pattern
- Update docs and examples
2025-11-17 12:54:33 +08:00
yangdx
e22ac52ebc Auto-initialize pipeline status in LightRAG.initialize_storages()
• Remove manual initialize_pipeline_status calls
• Auto-init in initialize_storages method
• Update error messages for clarity
• Warn on workspace conflicts
2025-11-17 12:54:33 +08:00
yangdx
e8383df3b8 Fix NamespaceLock context variable timing to prevent lock bricking
* Acquire lock before setting ContextVar
* Prevent state corruption on cancellation
* Fix permanent lock brick scenario
* Store context only after success
* Handle acquisition failure properly
2025-11-17 12:54:33 +08:00
yangdx
95e1fb1612 Remove final_namespace attribute for in-memory storage and use namespace in clean_llm_query_cache.py 2025-11-17 12:54:33 +08:00
yangdx
7ed0eac4c9 Fix workspace filtering logic in get_all_update_flags_status
• Handle namespaces with/without prefixes
• Fix workspace matching logic
2025-11-17 12:54:33 +08:00
yangdx
78689e8837 Fix pipeline status namespace check to handle root case
- Add check for bare "pipeline_status"
- Handle namespace without prefix
2025-11-17 12:54:33 +08:00