graphiti/graphiti_core/prompts/prompt_helpers.py
HUGO SON ce9ef3ca79
Add support for non-ASCII characters in LLM prompts (#805)
* Add support for non-ASCII characters in LLM prompts

- Add ensure_ascii parameter to Graphiti class (default: True)
- Create to_prompt_json helper function for consistent JSON serialization
- Update all prompt files to use new helper function
- Preserve Korean/Japanese/Chinese characters when ensure_ascii=False
- Maintain backward compatibility with existing behavior

Fixes issue where non-ASCII characters were escaped as unicode sequences
in prompts, making them unreadable in LLM logs and potentially affecting
model understanding.

* Remove unused json imports after replacing with to_prompt_json helper

- Fix ruff lint errors (F401) for unused json imports
- All prompt files now use to_prompt_json helper instead of json.dumps
- Maintains clean code style and passes lint checks

* Fix ensure_ascii propagation to all LLM calls

- Add ensure_ascii parameter to maintenance operation functions that were missing it
- Update function signatures in node_operations, community_operations, temporal_operations, and edge_operations
- Ensure all llm_client.generate_response calls receive proper ensure_ascii context
- Fix hardcoded ensure_ascii: True values that prevented non-ASCII character preservation
- Maintain backward compatibility with default ensure_ascii=True
- Complete the fix for issue #804 ensuring Korean/Japanese/Chinese characters are properly handled in LLM prompts
2025-08-08 11:07:32 -04:00

24 lines
795 B
Python

import json
from typing import Any
DO_NOT_ESCAPE_UNICODE = '\nDo not escape unicode characters.\n'
def to_prompt_json(data: Any, ensure_ascii: bool = True, indent: int = 2) -> str:
"""
Serialize data to JSON for use in prompts.
Args:
data: The data to serialize
ensure_ascii: If True, escape non-ASCII characters. If False, preserve them.
indent: Number of spaces for indentation
Returns:
JSON string representation of the data
Notes:
When ensure_ascii=False, non-ASCII characters (e.g., Korean, Japanese, Chinese)
are preserved in their original form in the prompt, making them readable
in LLM logs and improving model understanding.
"""
return json.dumps(data, ensure_ascii=ensure_ascii, indent=indent)