yangdx
fafa1791f4
Fix Azure OpenAI model parameter to use deployment name consistently
...
- Use deployment name for Azure API calls
- Fix model param in embed function
- Consistent api_model logic
- Prevent Azure model name conflicts
2025-11-21 23:41:52 +08:00
Daniel.y
021b637dc3
Merge pull request #2403 from danielaskdd/azure-cot-handling
...
Refact: Consolidate Azure OpenAI and OpenAI implementations
2025-11-21 19:36:12 +08:00
yangdx
ac9f2574a5
Improve Azure OpenAI wrapper functions with full parameter support
...
• Add missing parameters to wrappers
• Update docstrings for clarity
• Ensure API consistency
• Fix parameter forwarding
• Maintain backward compatibility
2025-11-21 19:24:32 +08:00
yangdx
45f4f82392
Refactor Azure OpenAI client creation to support client_configs merging
...
- Handle None client_configs case
- Merge configs with explicit params
- Override client_configs with params
- Use dict unpacking for client init
- Maintain parameter precedence
2025-11-21 19:14:16 +08:00
yangdx
0c4cba3860
Fix double decoration in azure_openai_embed and document decorator usage
...
• Remove redundant @retry decorator
• Call openai_embed.func directly
• Add detailed decorator documentation
• Prevent double parameter injection
• Fix EmbeddingFunc wrapping issues
2025-11-21 18:03:53 +08:00
yangdx
b46c152306
Fix linting
2025-11-21 17:16:44 +08:00
yangdx
b709f8f869
Consolidate Azure OpenAI implementation into main OpenAI module
...
• Unified OpenAI/Azure client creation
• Azure module now re-exports functions
• Backward compatibility maintained
• Reduced code duplication
2025-11-21 17:12:33 +08:00
yangdx
66d6c7dd6f
Refactor main function to provide sync CLI entry point
2025-11-21 13:11:55 +08:00
Daniel.y
8777895efc
Merge pull request #2401 from danielaskdd/fix-openai-keyword-extraction
...
Refactor: Centralize keyword_extraction parameter handling in OpenAI LLM implementations
2025-11-21 13:08:15 +08:00
yangdx
1e477e95ef
Add lightrag-clean-llmqc console script entry point
...
- Add clean_llm_query_cache tool
- New console script for cache cleanup
- Extend CLI tool availability
2025-11-21 12:59:49 +08:00
yangdx
02fdceb959
Update OpenAI client to use stable API and bump minimum version to 2.0.0
...
- Remove beta prefix from completions.parse
- Update OpenAI dependency to >=2.0.0
- Fix whitespace formatting
- Update all requirement files
- Clean up pyproject.toml dependencies
2025-11-21 12:55:44 +08:00
yangdx
9f69c5bf85
feat: Support structured output parsed from OpenAI
...
Added support for structured output (JSON mode) from the OpenAI API in `openai.py` and `azure_openai.py`.
When `response_format` is used to request structured data, the new logic checks for the `message.parsed` attribute. If it exists, it's serialized into a JSON string as the final content. If not, the code falls back to the existing `message.content` handling, ensuring backward compatibility.
2025-11-21 12:46:31 +08:00
yangdx
c9e1c86e81
Refactor keyword extraction handling to centralize response format logic
...
• Move response format to core function
• Remove duplicate format assignments
• Standardize keyword extraction flow
• Clean up redundant parameter handling
• Improve Azure OpenAI compatibility
2025-11-21 12:10:04 +08:00
yangdx
46ce6d9a13
Fix Azure OpenAI embedding model parameter fallback
...
- Use model param if provided
- Fall back to deployment name
- Fix embedding API call
- Improve parameter handling
2025-11-20 18:20:22 +08:00
Daniel.y
cc78e2df10
Merge pull request #2395 from Amrit75/issue-2394
...
issue-2394: use deployment variable instead of model for embeddings API call
2025-11-20 18:10:49 +08:00
Amritpal Singh
30e86fa331
use deployment variable which extracted value from .env file or have default value
2025-11-20 09:00:27 +00:00
yangdx
ecea93992a
Fix lingting
2025-11-20 13:03:31 +08:00
yangdx
1d2f534f3d
Fix linting
2025-11-20 13:02:25 +08:00
yangdx
72ece7343a
Remove obsolete config file and paging design doc
2025-11-20 13:00:13 +08:00
yangdx
1e415cff95
Update postgreSQL docker image link
2025-11-20 12:34:49 +08:00
yangdx
3c85e4882c
Update README
2025-11-20 10:50:02 +08:00
Daniel.y
d52adb64d7
Merge pull request #2390 from danielaskdd/fix-pytest-logging-error
...
Fix: Remove redundant exception logging to eliminate pytest shutdown errors
2025-11-19 23:09:30 +08:00
yangdx
b7de694f48
Add comprehensive error logging across API routes
...
- Add error logs to Ollama API endpoints
- Replace logging with unified logger
- Log streaming query errors
- Add data query error logging
- Include stack traces for debugging
2025-11-19 22:50:06 +08:00
yangdx
0fb2925c6a
Remove ascii_colors dependency and fix stream handling errors
...
• Remove ascii_colors.trace_exception calls
• Add SafeStreamHandler for closed streams
• Patch ascii_colors console handler
• Prevent ValueError on stream close
• Improve logging error handling
2025-11-19 21:38:17 +08:00
Daniel.y
f72f435cef
Merge pull request #2389 from danielaskdd/fix-chunk-size
...
Fix: Add chunk token limit validation with detailed error reporting
2025-11-19 20:34:11 +08:00
yangdx
fec7c67f45
Add comprehensive chunking tests with multi-token tokenizer edge cases
...
• Add MultiTokenCharacterTokenizer for testing
• Test token vs character counting accuracy
• Verify delimiter splitting precision
• Test overlap with distinctive content
• Add decode content preservation tests
2025-11-19 19:31:36 +08:00
yangdx
5733292557
Add comprehensive tests for chunking with recursive splitting
...
- Test recursive split mode
- Add edge case coverage
- Test parameter combinations
- Verify chunk order indexing
- Add integration test scenarios
2025-11-19 19:08:50 +08:00
yangdx
6fea68bff9
Fix ChunkTokenLimitExceededError message formatting
...
- Prevent passes two separate string objects to __init__
- Maintain same error output
2025-11-19 18:50:45 +08:00
yangdx
f988a22652
Add token limit validation for character-only chunking
...
- Add ChunkTokenLimitExceededError exception
- Validate chunks against token limits
- Include chunk preview in error messages
- Add comprehensive test coverage
- Log warnings for oversized chunks
2025-11-19 18:32:43 +08:00
yangdx
5cc916861f
Expand AGENTS.md with testing controls and automation guidelines
...
- Add pytest marker and CLI toggle docs
- Document automation workflow rules
- Clarify integration test setup
- Add agent-specific best practices
- Update testing command examples
2025-11-19 11:30:54 +08:00
Daniel.y
af4d2a3dcc
Merge pull request #2386 from danielaskdd/excel-optimization
...
Feat: Enhance XLSX Extraction by Adding Separators and Escape Special Characters
2025-11-19 10:26:32 +08:00
yangdx
95cd0ece74
Fix DOCX table extraction by escaping special characters in cells
...
- Add escape_cell() function
- Escape backslashes first
- Handle tabs and newlines
- Preserve tab-delimited format
- Prevent double-escaping issues
2025-11-19 09:54:35 +08:00
yangdx
87de2b3e9e
Update XLSX extraction documentation to reflect current implementation
2025-11-19 04:26:41 +08:00
yangdx
0244699d81
Optimize XLSX extraction by using sheet.max_column instead of two-pass scan
...
• Remove two-pass row scanning approach
• Use built-in sheet.max_column property
• Simplify column width detection logic
• Improve memory efficiency
• Maintain column alignment preservation
2025-11-19 04:02:39 +08:00
yangdx
2b16016312
Optimize XLSX extraction to avoid storing all rows in memory
...
• Remove intermediate row storage
• Use iterator twice instead of list()
• Preserve column alignment logic
• Reduce memory footprint
• Maintain same output format
2025-11-19 03:48:36 +08:00
yangdx
ef659a1e09
Preserve column alignment in XLSX extraction with two-pass processing
...
• Two-pass approach for consistent width
• Maintain tabular structure integrity
• Determine max columns first pass
• Extract with alignment second pass
• Prevent column misalignment issues
2025-11-19 03:34:22 +08:00
yangdx
3efb1716b4
Enhance XLSX extraction with structured tab-delimited format and escaping
...
- Add clear sheet separators
- Escape special characters
- Trim trailing empty columns
- Preserve row structure
- Single-pass optimization
2025-11-19 03:06:29 +08:00
Daniel.y
efbbaaf7f9
Merge pull request #2383 from danielaskdd/doc-table
...
Feat: Enhanced DOCX Extraction with Table Content Support
2025-11-19 02:26:02 +08:00
yangdx
e7d2803a65
Remove text stripping in DOCX extraction to preserve whitespace
...
• Keep original paragraph spacing
• Preserve cell whitespace in tables
• Maintain document formatting
• Don't strip leading/trailing spaces
2025-11-19 02:12:27 +08:00
yangdx
186c8f0e16
Preserve blank paragraphs in DOCX extraction to maintain spacing
...
• Remove text emptiness check
• Always append paragraph text
• Maintain document formatting
• Preserve original spacing
2025-11-19 02:03:10 +08:00
yangdx
fa887d811b
Fix table column structure preservation in DOCX extraction
...
• Always append cell text to maintain columns
• Preserve empty cells in table structure
• Check for any content before adding rows
• Use tab separation for proper alignment
• Improve table formatting consistency
2025-11-19 01:52:02 +08:00
yangdx
4438ba41a3
Enhance DOCX extraction to preserve document order with tables
...
• Include tables in extracted content
• Maintain original document order
• Add spacing around tables
• Use tabs to separate table cells
• Process all body elements sequentially
2025-11-19 01:31:33 +08:00
yangdx
d16c7840ab
Bump API version to 0256
2025-11-18 23:15:31 +08:00
yangdx
e77340d4a1
Adjust chunking parameters to match the default environment variable settings
2025-11-18 23:14:50 +08:00
yangdx
24423c9215
Merge branch 'fix_chunk_comment'
2025-11-18 22:47:23 +08:00
yangdx
1bfa1f81cb
Merge branch 'main' into fix_chunk_comment
2025-11-18 22:38:50 +08:00
yangdx
9c10c87554
Fix linting
2025-11-18 22:38:43 +08:00
yangdx
9109509b1a
Merge branch 'dev-postgres-vchordrq'
2025-11-18 22:25:35 +08:00
yangdx
dbae327a17
Merge branch 'main' into dev-postgres-vchordrq
2025-11-18 22:13:27 +08:00
yangdx
b583b8a59d
Merge branch 'feature/postgres-vchordrq-indexes' into dev-postgres-vchordrq
2025-11-18 22:05:48 +08:00