fix: remove double quotes from llmconfig str params (#1758)

<!-- .github/pull_request_template.md -->

## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
Recently a few cases cryptic errors like in issue #1721 have occurred
across cognee use cases.

Debugging #1721 however, I found out that if LLM_API_KEY happens to have
`"` quotation marks as part of it's value, for example, when already
part of the ENV

<img width="1014" height="507" alt="Screenshot 2025-11-07 at 16 58 22"
src="https://github.com/user-attachments/assets/54b7cbb0-5bdc-4b40-b2b1-aed6c5d3d886"
/>

Then it makes it's way into Cognee and gets treated as part of the API
key.

By default, we do not do sanitization nor cleanup.

While most of the time quotation marks get handled for us:
1. `export KEY="VALUE"` will strip it
2. python dotenv will strip it if read from `.env`

But issues like https://github.com/docker/cli/issues/3630 and #1721
demonstrate that we have to have some handling on our end instead of
assuming it's stripped.

## This PR

This PR sets up a list of string params we want to strip + some that we
may want to.

We may want to avoid doing this for all params, which is why I went with
selective approach.

TODO: add testing

## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):

## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->

## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Configuration values with surrounding quotes are now automatically
normalized and cleaned during system initialization, ensuring consistent
and predictable data handling across all configuration parameters.

* **Tests**
* Added comprehensive unit tests to validate automatic quote removal
from configuration values, covering various scenarios including quoted,
unquoted, empty, and edge cases with mixed and internal quotes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
Vasilije 2025-12-08 05:10:23 +01:00 committed by GitHub
commit 7a3138edf8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 81 additions and 0 deletions

View file

@ -74,6 +74,41 @@ class LLMConfig(BaseSettings):
model_config = SettingsConfigDict(env_file=".env", extra="allow")
@model_validator(mode="after")
def strip_quotes_from_strings(self) -> "LLMConfig":
"""
Strip surrounding quotes from specific string fields that often come from
environment variables with extra quotes (e.g., via Docker's --env-file).
Only applies to known config keys where quotes are invalid or cause issues.
"""
string_fields_to_strip = [
"llm_api_key",
"llm_endpoint",
"llm_api_version",
"baml_llm_api_key",
"baml_llm_endpoint",
"baml_llm_api_version",
"fallback_api_key",
"fallback_endpoint",
"fallback_model",
"llm_provider",
"llm_model",
"baml_llm_provider",
"baml_llm_model",
]
cls = self.__class__
for field_name in string_fields_to_strip:
if field_name not in cls.model_fields:
continue
value = getattr(self, field_name, None)
if isinstance(value, str) and len(value) >= 2:
if value[0] == value[-1] and value[0] in ("'", '"'):
setattr(self, field_name, value[1:-1])
return self
def model_post_init(self, __context) -> None:
"""Initialize the BAML registry after the model is created."""
# Check if BAML is selected as structured output framework but not available

View file

@ -0,0 +1,46 @@
import pytest
from cognee.infrastructure.llm.config import LLMConfig
def test_strip_quotes_from_strings():
"""
Test if the LLMConfig.strip_quotes_from_strings model validator behaves as expected.
"""
config = LLMConfig(
# Strings with surrounding double quotes ("value" → value)
llm_api_key='"double_value"',
# Strings with surrounding single quotes ('value' → value)
llm_endpoint="'single_value'",
# Strings without quotes (value → value)
llm_api_version="no_quotes_value",
# Empty quoted strings ("" → empty string)
fallback_model='""',
# None values (should remain None)
baml_llm_api_key=None,
# Mixed quotes ("value' → unchanged)
fallback_endpoint="\"mixed_quote'",
# Strings with internal quotes ("internal\"quotes" → internal"quotes")
baml_llm_model='"internal"quotes"',
)
# Strings with surrounding double quotes ("value" → value)
assert config.llm_api_key == "double_value"
# Strings with surrounding single quotes ('value' → value)
assert config.llm_endpoint == "single_value"
# Strings without quotes (value → value)
assert config.llm_api_version == "no_quotes_value"
# Empty quoted strings ("" → empty string)
assert config.fallback_model == ""
# None values (should remain None)
assert config.baml_llm_api_key is None
# Mixed quotes ("value' → unchanged)
assert config.fallback_endpoint == "\"mixed_quote'"
# Strings with internal quotes ("internal\"quotes" → internal"quotes")
assert config.baml_llm_model == 'internal"quotes'