feat: enhance LLM output format tolerance for bracket processing

- Expand bracket tolerance to support additional characters: < > " '
- Implement symmetric handling for both leading and trailing characters
- Replace simple string matching with robust regex-based pattern detection
- Maintain full backward compatibility with existing bracket formats
This commit is contained in:
yangdx 2025-09-10 18:10:06 +08:00
parent 00de0a4be8
commit 02e7462645
2 changed files with 25 additions and 8 deletions

View file

@ -876,22 +876,39 @@ async def _process_extraction_result(
) )
for record in records: for record in records:
# Remove outer brackets (support English and Chinese brackets) # Remove outer brackets (support English and Chinese brackets with enhanced tolerance)
record = record.strip() record = record.strip()
# Define allowed leading and trailing characters
leading_trailing_chars = r'[`<>"\']*'
# Handle leading characters before left bracket
if record.startswith("(") or record.startswith(""): if record.startswith("(") or record.startswith(""):
record = record[1:] record = record[1:]
else: else:
if record.startswith("`(") or record.startswith("`"): # Check for leading characters + left bracket pattern
record = record[2:] leading_bracket_pattern = r'^' + leading_trailing_chars + r'([(])'
match = re.search(leading_bracket_pattern, record)
if match:
# Extract content from the left bracket position
bracket_pos = match.start(1)
record = record[bracket_pos + 1:]
else: else:
logger.warning( logger.warning(
f"{chunk_key}: Record starting bracket can not be found in extraction result" f"{chunk_key}: Record starting bracket can not be found in extraction result"
) )
# Handle trailing characters after right bracket
if record.endswith(")") or record.endswith(""): if record.endswith(")") or record.endswith(""):
record = record[:-1] record = record[:-1]
else: else:
if record.endswith(")`") or record.endswith("`"): # Check for right bracket + trailing characters pattern
record = record[:-2] trailing_bracket_pattern = r'([)])' + leading_trailing_chars + r'$'
match = re.search(trailing_bracket_pattern, record)
if match:
# Extract content up to the right bracket position
bracket_pos = match.start(1)
record = record[:bracket_pos]
else: else:
logger.warning( logger.warning(
f"{chunk_key}: Record ending bracket can not be found in extraction result" f"{chunk_key}: Record ending bracket can not be found in extraction result"

View file

@ -28,8 +28,8 @@ You are a Knowledge Graph Specialist responsible for extracting entities and rel
5. **Relationship Order:** Prioritize relationships based on their significance to the intended meaning of input text, and output more crucial relationships first. 5. **Relationship Order:** Prioritize relationships based on their significance to the intended meaning of input text, and output more crucial relationships first.
6. **Avoid Pronouns:** For entity names and all descriptions, explicitly name the subject or object instead of using pronouns; avoid pronouns such as `this document`, `our company`, `I`, `you`, and `he/she`. 6. **Avoid Pronouns:** For entity names and all descriptions, explicitly name the subject or object instead of using pronouns; avoid pronouns such as `this document`, `our company`, `I`, `you`, and `he/she`.
7. **Undirectional Relationship:** Treat relationships as undirected; swapping the source and target entities does not constitute a new relationship. Avoid outputting duplicate relationships. 7. **Undirectional Relationship:** Treat relationships as undirected; swapping the source and target entities does not constitute a new relationship. Avoid outputting duplicate relationships.
8. **Language:** Output entity names, keywords and descriptions in {language}. 8. **Language:** Output entity names, keywords and descriptions in {language}. Proper nouns, such as personal names, should not be translated. Please keep them in their original language.
9. **Delimiter:** Use `{record_delimiter}` as the entity or relationship list delimiter; output `{completion_delimiter}` when all the entities and relationships are extracted. 9. **Delimiter:** Use {record_delimiter} as the entity or relationship list delimiter; output {completion_delimiter} when all the entities and relationships are extracted.
---Examples--- ---Examples---
{examples} {examples}
@ -49,7 +49,7 @@ Extract entities and relationships from the input text to be Processed.
---Instructions--- ---Instructions---
1. Output entities and relationships, prioritized by their relevance to the input text's core meaning. 1. Output entities and relationships, prioritized by their relevance to the input text's core meaning.
2. Output `{completion_delimiter}` when all the entities and relationships are extracted. 2. Output `{completion_delimiter}` when all the entities and relationships are extracted.
3. Ensure the output language is {language}. 3. Ensure the output language is {language}. Proper nouns, such as personal names, should not be translated. Please keep them in their original language.
<Output> <Output>
""" """