feat: enhance LLM output format tolerance for bracket processing
- Expand bracket tolerance to support additional characters: < > " ' - Implement symmetric handling for both leading and trailing characters - Replace simple string matching with robust regex-based pattern detection - Maintain full backward compatibility with existing bracket formats
This commit is contained in:
parent
00de0a4be8
commit
02e7462645
2 changed files with 25 additions and 8 deletions
|
|
@ -876,22 +876,39 @@ async def _process_extraction_result(
|
|||
)
|
||||
|
||||
for record in records:
|
||||
# Remove outer brackets (support English and Chinese brackets)
|
||||
# Remove outer brackets (support English and Chinese brackets with enhanced tolerance)
|
||||
record = record.strip()
|
||||
|
||||
# Define allowed leading and trailing characters
|
||||
leading_trailing_chars = r'[`<>"\']*'
|
||||
|
||||
# Handle leading characters before left bracket
|
||||
if record.startswith("(") or record.startswith("("):
|
||||
record = record[1:]
|
||||
else:
|
||||
if record.startswith("`(") or record.startswith("`("):
|
||||
record = record[2:]
|
||||
# Check for leading characters + left bracket pattern
|
||||
leading_bracket_pattern = r'^' + leading_trailing_chars + r'([((])'
|
||||
match = re.search(leading_bracket_pattern, record)
|
||||
if match:
|
||||
# Extract content from the left bracket position
|
||||
bracket_pos = match.start(1)
|
||||
record = record[bracket_pos + 1:]
|
||||
else:
|
||||
logger.warning(
|
||||
f"{chunk_key}: Record starting bracket can not be found in extraction result"
|
||||
)
|
||||
|
||||
# Handle trailing characters after right bracket
|
||||
if record.endswith(")") or record.endswith(")"):
|
||||
record = record[:-1]
|
||||
else:
|
||||
if record.endswith(")`") or record.endswith(")`"):
|
||||
record = record[:-2]
|
||||
# Check for right bracket + trailing characters pattern
|
||||
trailing_bracket_pattern = r'([))])' + leading_trailing_chars + r'$'
|
||||
match = re.search(trailing_bracket_pattern, record)
|
||||
if match:
|
||||
# Extract content up to the right bracket position
|
||||
bracket_pos = match.start(1)
|
||||
record = record[:bracket_pos]
|
||||
else:
|
||||
logger.warning(
|
||||
f"{chunk_key}: Record ending bracket can not be found in extraction result"
|
||||
|
|
|
|||
|
|
@ -28,8 +28,8 @@ You are a Knowledge Graph Specialist responsible for extracting entities and rel
|
|||
5. **Relationship Order:** Prioritize relationships based on their significance to the intended meaning of input text, and output more crucial relationships first.
|
||||
6. **Avoid Pronouns:** For entity names and all descriptions, explicitly name the subject or object instead of using pronouns; avoid pronouns such as `this document`, `our company`, `I`, `you`, and `he/she`.
|
||||
7. **Undirectional Relationship:** Treat relationships as undirected; swapping the source and target entities does not constitute a new relationship. Avoid outputting duplicate relationships.
|
||||
8. **Language:** Output entity names, keywords and descriptions in {language}.
|
||||
9. **Delimiter:** Use `{record_delimiter}` as the entity or relationship list delimiter; output `{completion_delimiter}` when all the entities and relationships are extracted.
|
||||
8. **Language:** Output entity names, keywords and descriptions in {language}. Proper nouns, such as personal names, should not be translated. Please keep them in their original language.
|
||||
9. **Delimiter:** Use {record_delimiter} as the entity or relationship list delimiter; output {completion_delimiter} when all the entities and relationships are extracted.
|
||||
|
||||
---Examples---
|
||||
{examples}
|
||||
|
|
@ -49,7 +49,7 @@ Extract entities and relationships from the input text to be Processed.
|
|||
---Instructions---
|
||||
1. Output entities and relationships, prioritized by their relevance to the input text's core meaning.
|
||||
2. Output `{completion_delimiter}` when all the entities and relationships are extracted.
|
||||
3. Ensure the output language is {language}.
|
||||
3. Ensure the output language is {language}. Proper nouns, such as personal names, should not be translated. Please keep them in their original language.
|
||||
|
||||
<Output>
|
||||
"""
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue