* Use OpenAI structured output API for response validation
Replace prompt-based schema injection with native json_schema response format. This improves token efficiency and reliability by having OpenAI enforce the schema directly instead of embedding it in the prompt message.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add type ignore for response_format to fix pyright error
* Increase OpenAIGenericClient max_tokens to 16K and update docs
- Set default max_tokens to 16384 (16K) for OpenAIGenericClient to better support local models
- Add documentation note clarifying OpenAIGenericClient should be used for Ollama and LM Studio
- Previous default was 8192 (8K)
* Refactor max_tokens override to use constructor parameter pattern
- Add max_tokens parameter to __init__ with 16K default
- Override self.max_tokens after super().__init__() instead of mutating config
- Consistent with OpenAIBaseClient and AnthropicClient patterns
- Avoids unintended config mutation side effects
---------
Co-authored-by: Claude <noreply@anthropic.com>