fix: Remove Jon Doe enitity reference due to hallucination issues (#1939)
<!-- .github/pull_request_template.md -->
## Description
<!--
Please provide a clear, human-generated description of the changes in
this PR.
DO NOT use AI-generated descriptions. We want to understand your thought
process and reasoning.
-->
## Acceptance Criteria
<!--
* Key requirements to the new feature or modification;
* Proof that the changes work and meet the requirements;
* Include instructions on how to verify the changes. Describe how to
test it locally;
* Proof that it's sufficiently tested.
-->
## Type of Change
<!-- Please check the relevant option -->
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [ ] Code refactoring
- [ ] Performance improvement
- [ ] Other (please specify):
## Screenshots/Videos (if applicable)
<!-- Add screenshots or videos to help explain your changes -->
## Pre-submission Checklist
<!-- Please check all boxes that apply before submitting your PR -->
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [ ] **This PR contains minimal changes necessary to address the
issue/feature**
- [ ] My code follows the project's coding standards and style
guidelines
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added necessary documentation (if applicable)
- [ ] All new and existing tests pass
- [ ] I have searched existing PRs to ensure this change hasn't been
submitted already
- [ ] I have linked any relevant issues in the description
- [ ] My commits have clear and descriptive messages
## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Clarifies and tightens coreference resolution guidance across
knowledge-graph prompt templates.
>
> - Updates coreference rules to emphasize using the most complete,
human-readable identifiers consistently (`generate_graph_prompt*.txt`)
> - Tweaks examples, notably replacing the John Doe example with a
generic "X" case in the one-shot prompt
> - Minor wording/formatting cleanups; no code changes or logic
modifications
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
8499258272. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Refined entity resolution guidance in knowledge graph generation
prompts to use more generic instructions, improving flexibility and
consistency in how entities are identified throughout the system.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
commit
f03ab671e6
5 changed files with 7 additions and 7 deletions
|
|
@ -19,8 +19,8 @@ The aim is to achieve simplicity and clarity in the knowledge graph.
|
||||||
- **Naming Convention**: Use snake_case for relationship names, e.g., `acted_in`.
|
- **Naming Convention**: Use snake_case for relationship names, e.g., `acted_in`.
|
||||||
# 3. Coreference Resolution
|
# 3. Coreference Resolution
|
||||||
- **Maintain Entity Consistency**: When extracting entities, it's vital to ensure consistency.
|
- **Maintain Entity Consistency**: When extracting entities, it's vital to ensure consistency.
|
||||||
If an entity, such as "John Doe", is mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
|
If an entity, is mentioned multiple times in the text but is referred to by different names or pronouns,
|
||||||
always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "John Doe" as the Persons ID.
|
always use the most complete identifier for that entity throughout the knowledge graph.
|
||||||
Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
|
Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
|
||||||
# 4. Strict Compliance
|
# 4. Strict Compliance
|
||||||
Adhere to the rules strictly. Non-compliance will result in termination
|
Adhere to the rules strictly. Non-compliance will result in termination
|
||||||
|
|
|
||||||
|
|
@ -22,7 +22,7 @@ You are an advanced algorithm designed to extract structured information to buil
|
||||||
3. **Coreference Resolution**:
|
3. **Coreference Resolution**:
|
||||||
- Maintain one consistent node ID for each real-world entity.
|
- Maintain one consistent node ID for each real-world entity.
|
||||||
- Resolve aliases, acronyms, and pronouns to the most complete form.
|
- Resolve aliases, acronyms, and pronouns to the most complete form.
|
||||||
- *Example*: Always use "John Doe" even if later referred to as "Doe" or "he".
|
- *Example*: Always use full identifier even if later referred to as in a similar but slightly different way
|
||||||
|
|
||||||
**Property & Data Guidelines**:
|
**Property & Data Guidelines**:
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -42,10 +42,10 @@ You are an advanced algorithm designed to extract structured information from un
|
||||||
- **Rule**: Resolve all aliases, acronyms, and pronouns to one canonical identifier.
|
- **Rule**: Resolve all aliases, acronyms, and pronouns to one canonical identifier.
|
||||||
|
|
||||||
> **One-Shot Example**:
|
> **One-Shot Example**:
|
||||||
> **Input**: "John Doe is an author. Later, Doe published a book. He is well-known."
|
> **Input**: "X is an author. Later, Doe published a book. He is well-known."
|
||||||
> **Output Node**:
|
> **Output Node**:
|
||||||
> ```
|
> ```
|
||||||
> John Doe (Person)
|
> X (Person)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
|
||||||
|
|
@ -15,7 +15,7 @@ You are an advanced algorithm that extracts structured data into a knowledge gra
|
||||||
- Properties are key-value pairs; do not use escaped quotes.
|
- Properties are key-value pairs; do not use escaped quotes.
|
||||||
|
|
||||||
3. **Coreference Resolution**
|
3. **Coreference Resolution**
|
||||||
- Use a single, complete identifier for each entity (e.g., always "John Doe" not "Joe" or "he").
|
- Use a single, complete identifier for each entity
|
||||||
|
|
||||||
4. **Relationship Labels**:
|
4. **Relationship Labels**:
|
||||||
- Use descriptive, lowercase, snake_case names for edges.
|
- Use descriptive, lowercase, snake_case names for edges.
|
||||||
|
|
|
||||||
|
|
@ -26,7 +26,7 @@ Use **basic atomic types** for node labels. Always prefer general types over spe
|
||||||
- Good: "Alan Turing", "Google Inc.", "World War II"
|
- Good: "Alan Turing", "Google Inc.", "World War II"
|
||||||
- Bad: "Entity_001", "1234", "he", "they"
|
- Bad: "Entity_001", "1234", "he", "they"
|
||||||
- Never use numeric or autogenerated IDs.
|
- Never use numeric or autogenerated IDs.
|
||||||
- Prioritize **most complete form** of entity names for consistency (e.g., always use "John Doe" instead of "John" or "he").
|
- Prioritize **most complete form** of entity names for consistency
|
||||||
|
|
||||||
2. Dates, Numbers, and Properties
|
2. Dates, Numbers, and Properties
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue